Reddit CEO Says AI Models Depend on Platform Data, Signals More Licensing Deals Ahead

RELATED TOPICS: Content Strategy Search & SEO
Reddit CEO Says AI Models Depend on Platform Data, Signals More Licensing Deals Ahead

Reddit CEO Steve Huffman has publicly stated that major large language models depend on Reddit's user-generated content to function, and the company is actively seeking additional AI data licensing agreements. Huffman delivered the remarks at Fast Company's Most Innovative Companies Summit.

Reddit's Role in AI Training Data

Huffman said large language models "would not exist as we know them" without Reddit's content, calling the platform's user-generated data "modern oil" for AI. He argued that the appeal of Reddit's data to AI companies lies in its breadth and authenticity. In Huffman's words: 

"There's no artificial intelligence without actual intelligence."

Huffman cited data from Profound, a firm that tracks AI citation data, to support his claim that Reddit "continues to be one of the primary sources of both training data" and is "the most cited platform across all models." Separately, data compiled by Profound and cited during Reddit's Q2 2025 earnings call indicates Reddit is now the number one most-cited source in AI models, cited three times more frequently than Wikipedia.

The Licensing Framework: Deals vs. Lawsuits

In 2024, Reddit struck licensing agreements with Google and OpenAI that, though strategically important, represent a small percentage of revenue. The Google deal, announced in February 2024, was valued at approximately $60 million per year and gave Google access to real-time content from Reddit's forums. A few months later, Reddit struck a similar partnership with OpenAI, estimated to be worth around $70 million per year.

At the Summit, Huffman described the deals as a foundation for future negotiations. "Since we did the original two deals with Google and OpenAI, that was over two years ago, so we've learned a lot," he said, adding: "We're open and open for business." He did not announce any new agreements.

Reddit began charging for commercial API access in 2023, a move that preceded the current licensing deals. Huffman said Reddit still provides free data access to researchers and universities and tries to remain flexible for non-commercial use.

For companies that have not entered into licensing terms, Reddit has pursued legal action. In June 2025, Reddit sued Anthropic for scraping its data to train its chatbot, Claude. In October 2025, Reddit filed a federal lawsuit against Perplexity in the Southern District of New York, alleging illegal scraping of user posts. The complaint also named three defendants Reddit says helped Perplexity collect its data: Lithuanian data scraper Oxylabs, a former Russian botnet called AWMProxy, and Texas startup SerpApi.

When Reddit sued Anthropic in June, it drew attention for omitting copyright claims altogether, instead framing data scraping as an issue of how content is accessed, grounding its argument in platform contracts and system-use theories.

Huffman drew a direct line between the two groups of companies at the Summit. He stated that companies unwilling to negotiate licensing terms leave Reddit with no option but litigation: 

"Not every company is willing to be a collaborative partner, and so unfortunately, we have to go the other way, which is lawsuits."

What Drove Reddit to Restrict Data Access

According to Huffman, Reddit's willingness to share data freely changed when the AI industry moved away from open research. He said the core issue was that Reddit could no longer track how its data was being used:

"People are using our data, and we don't know what it was being used for."

Beyond commercial terms, Huffman said Reddit wants to prevent its data from being used to identify users, target them with ads, or replace or disintermediate the platform.

Reddit's Own AI Products

Huffman acknowledged what he called a "paradox": Reddit's content fuels external AI systems, while the company simultaneously deploys AI internally. The most visible product is Reddit Answers, an LLM-powered search feature that reads posts and comments and organizes them into responses built from verbatim user quotes. Huffman noted it is designed for questions without definitive answers.

Reddit also uses AI for content moderation. Huffman described LLMs as capable of evaluating whether a comment crosses into bullying, a task previously difficult due to the subjectivity involved, and said the technology reduces the need for humans to review the most harmful content on the platform.

AI-Written Posts and Community Enforcement

Huffman addressed the separate issue of users composing posts with AI tools and pasting them into Reddit, drawing a distinction between that behaviour and automated bot activity. He described the practice as a growing problem across the internet, not just Reddit, and indicated the platform will not build dedicated detection tools to address it.

Instead, Huffman said Reddit intends to give communities more control. "We'll empower the users more and the subreddits more to just reject that sort of content altogether," he said. He noted that Reddit communities are already downvoting AI-written posts and flagging them in comment threads.

What This Means for Digital Marketers

Reddit's position as the most-cited platform across AI models, a claim Huffman attributes to Profound, has direct implications for content strategy. Brands and marketers whose products, services, or industries are discussed in Reddit threads are increasingly likely to see that content surface in AI-generated answers across tools like ChatGPT, Google's AI Overviews, and others. In 2024, Reddit disclosed that licensing agreements with Google and OpenAI were worth $203 million, with those contracts allowing the companies to legally access Reddit's forum discussions for training AI models and displaying results in products like Google's AI Overviews and OpenAI's ChatGPT. Marketers who have historically treated Reddit as an organic community channel may need to account for its growing role as a primary input in AI-generated responses when planning content and brand monitoring strategies.

During Reddit's Q2 2025 earnings call, Huffman said: "Every variable has changed since we signed those first deals. Our corpus is bigger, it's more distinct, more essential", positioning the company for stronger terms in upcoming contract renewals with both Google and OpenAI.

It's a competitive market. Contact us to learn how you can stand out from the crowd.

The comments are closed.

Ready To Rule The First Page of Google?

Contact us for an exclusive 20-minute assessment & strategy discussion. Fill out the form, and we will get back to you right away!

What Our Clients Have To Say

L
Luciano Zeppieri
S
Sharon Tierney
S
Sheena Owen
A
Andrea Bodi - Lab Works
D
Dr. Philip Solomon MD
Newsletter
Subscribe to Our Newsletter
Newsletter
Subscribe to Our Newsletter