Google has entered into a $60 million dollar per year deal with Reddit to mine Reddit’s user content to enrich Google’s AI engine. Under the terms of the deal, Google will be used as the search engine within Reddit.
This development will be of further concern to people who have been watching how AI developers have strategically mined content for use in training their engines. Developers using GitHub for source control of their projects were suspicious when Microsoft purchased GitHub several years back, wondering if their source code – some open source, yes, but some privately owned for internal projects – would be scanned and used for Microsoft’s enrichment. Turns out that fear was well founded because that’s EXACTLY what Microsoft did to create CoPilot.
The open source development world has been driven for over 20 years by a “forum” model where developers ask / answer questions about designs, techniques and specific problems with code by communicating directly with each other with little or no moderation. Many commercial companies adopted this approach for providing support for commercial, closed source products. As first generation content platforms like Google Blogger and StackExchange grew somewhat stale, use of Reddit for this purpose has skyrocketed over the last 3-5 years, making Reddit one of the best sources of tech support on current generations of software libraries and related technologies.
Now once again, all of that fresh support content generated by technical professionals in aid of other peers will be mined and transformed into data that can undermine their livelihood.
More interestingly, this deal coincides with Reddit announcing it plans to go public after its most recent quarter generated $18 million in income on $250 milliion in revenue. A 7.2% gross margin on an ephemeral product whose only costs are cloud compute resources and software developer labor? No customer support costs? No marketing to speak of? And the business model is only netting 7.2% profit margin? That’s a horrible IPO story. And that story gets worse when you just cut a deal to allow another firm to mine your core asset (your USER’S content) to internalize into its own systems and (eventually) eliminate the need for anyone to surf through a search result page to your site.
WTH