Blogs

AI Models are at Risk of Collapse… Unless SEO Acts

Jonas Trinidad

Aug 12, 2025 • 9 min read

Over the last several years, we’ve witnessed exponential scaling of AI models. OpenAI’s GPT model is proof of this, with GPT-4 reportedly trained on 1,000 times more training parameters than its predecessor. In fact, across all versions, the model has grown over 15,000-fold in terms of training data usage over its seven-year life.

The numbers are impressive to the untrained eye. But those in the industry see a bleak picture: the amount of data we have right now might not be enough.

Researchers at AI research firm Epoch studied several of the latest large language models (LLMs) and their demand for training data. At the current rate, they predict that the demand will hit the median or match the amount available by 2028. Other experts give a somewhat later estimate, with the supply lasting until 2032. (1)

The industry is scrambling to ensure a steady supply of training data in the future. At one point, OpenAI CEO Sam Altman said that the chatbots of the future would be advanced enough to rely on synthetic data for training. But as any AI expert will tell you, a model is only as good as the data it works with. (2)

Amid these developments, it’s a bit surprising (though reasonable) that no one’s talking about a potential third solution: good old SEO. Allow me to explain.

The Worst Case Scenario – A Model Collapse

To understand how SEO can become a dark horse in all this, it’s important to know what happens when LLMs inevitably run out of data. Such a situation leads to what AI experts refer to as a “model collapse.”

Think of model collapse as a story that has deviated from the original telling over several generations. The first few times it’s passed down may still get it right, but storytellers will inevitably get some details wrong. By the time it reaches later generations, the story will have been vastly different from how their ancestors told it.

However, the implications of a collapsing model are far worse. Without real-world data to improve its algorithm, it’ll need to be trained on results from its prior versions or another model. The effect extends to future models, resulting in the degradation of the entire line and rendering them ineffective for the task they’re given.

While no recorded incidents of a model collapse have occurred so far, experts point out that the signs are there. In a study presented at a regional Association for Computational Linguistics conference last April, researchers raised safety concerns regarding the use of LLMs, especially those made with retrieval-augmented generation (RAG). (3)

Source:An et. al. (2025)

In the data above, most RAG LLMs return responses that feature unsafe responses ranging from illegal activity to misinformation. The factors for such a trend boil down to the design safety of the LLM, the safety of the results it retrieves, and the model’s functionality. (3)

Long story short, you don’t want AI to teach someone how to build a bomb. With a model collapse, however, such a scenario is likely to occur.

Dangerous Implications for Search

AI technology is still far from a point where humans can leave it to its own devices. A quick search will provide countless stories of AI backfiring, from making a list of must-read book titles that don’t exist to generating fictional court cases.

As far as search is concerned, I discussed the implications of AI on modern SEO in a recent blog post. There’s no denying the convenience it brings, providing users with the answers they need without clicking. A recent Pew study confirms this, with less than one-tenth of users clicking on results after checking out the AI-generated summary. (4)

Source:Pew Research Center

Of course, publishers and site owners aren’t happy because it denies them organic traffic. The point of SEO is to let customers know that a business exists through a quick Google (or any search engine) search. Despite blaming AI Overview, the same Pew study revealed that Wikipedia and .gov sites make up most of the links to AI summaries. (4)

However, it’s worth noting that not all users will take the time to check if the AI’s answer is correct. Remember the glue-on-pizza example I mentioned in my post about AI SEO? Let’s delve a bit deeper into that.

As it turned out, Google’s AI retrieved that ridiculous statement from a 12-year-old Reddit comment that was meant as a joke. However, lacking a grasp of the concept of humor, the AI passed it off as fact. And this isn’t an isolated case, as the AI also: (5)

Claimed that Barack Obama was a Muslim (which he isn’t)

Claimed that a dog played in an NHL game (which isn’t true)

Advised consuming one small rock a day (which you shouldn’t)

Outlined the health benefits of bathing with a toaster (which you mustn’t)

Described the health benefits of running with scissors (which you mustn’t)

While you won’t see these gaffes in Google’s search results anymore, they pose a threat to users looking for legitimate answers. Imagine if someone really put a helping of glue (even if it’s non-toxic) on pizza or tried eating a rock (even if it’s rich in minerals). If it results in harm, it won’t just be on the search engine but also the source of the preposterous info.

Where Does SEO Come In?

Avoiding a disaster requires being aware of the risk. Fortunately, the industry is cognizant enough to start developing solutions. In one multi-university study, a potential measure involves combining real and model-generated data to mitigate the risk of degradation and produce better-quality training data. (6)

Source:Gerstgrasser et. al. (2024)

Now we’re cooking with gas!

If synthetic data will be the norm by the next decade, SEO-friendly content creation will need to step up to fill the gap as best it can. It may as well be the models’ only source of real-world data for a long time, if not forever.

Content continues to be one of the trickiest aspects of SEO, not because writing articles and blog posts is hard. The issue lies with determining the kind of content your customer base wants. It won’t matter if your content is worthy of a Pulitzer if it doesn’t answer your customers’ frequently asked questions.

Yet, Google search advocate John Mueller said in a recent Search Off The Record podcast that finding out is as easy as asking them straight up. As long as you don’t appear coercive or daunting, you can find out how a customer found your business or what brought them to your physical or online store. (7)

Once you have your topic, the next step is to write content for your audience. Despite fully embracing AI, Google still maintains human quality raters to determine whether or not the content adheres to its E-E-A-T criteria. Below are some tips according to Google. (8)

Cite high-authority references to support your claims

Refrain from copy-pasting passages from said sources

Offer unique insights, especially if the topic’s been discussed to death

Ensure proper grammar and spelling, and also information accuracy

Provide a brief description of the author’s background and credentials

Link the author bio to the author’s page or social media profile

Avoid writing to appease (or trick) the search engine algorithm

Note that fulfilling E-E-A-T doesn’t directly affect a site’s rankings. However, it helps gain favor with the manual reviewers who assess the content based on said criteria and up its chances of getting cited in AI Overviews or AI Mode.

It’s also worth mentioning that Google says there’s no need for a separate AI SEO strategy for now, as conventional SEO still applies. John Mueller confirmed this during the recent Search Central Live in Bangkok, Thailand, last month. (9)

Augment, Not Replace

Proper SEO is more essential than ever, given that it’ll be relied upon as a primary source of real-world training data. Meeting demand (and preventing an AI model collapse) is possible if every publisher and content creator continues to adhere to SEO best practices.

References:

Villalobos P, Sevilla J, Heim L, Besiroglu T, Hobbhahn M, Ho A. Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning [Internet]. Available from: https://arxiv.org/pdf/2211.04325

AI will run out of data within a decade - then what? [Internet]. Cosmos. 2024. Available from: https://cosmosmagazine.com/technology/ai/ai-will-run-out-of-data/

An B, Zhang S, Dredze M. RAG LLMs are Not Safer: A Safety Analysis of Retrieval-Augmented Generation for Large Language Models [Internet]. ACL Anthology. [cited 2025 Jul 31]. Available from: https://aclanthology.org/2025.naacl-long.281.pdf

Chapekis A, Lieb A. Google users are less likely to click on links when an AI summary appears in the results [Internet]. Pew Research Center. 2025. Available from: https://www.pewresearch.org/short-reads/2025/07/22/google-users-are-less-likely-to-click-on-links-when-an-ai-summary-appears-in-the-results/

Goodwin D. Google AI Overviews under fire for giving dangerous and wrong answers [Internet]. Search Engine Land. 2024. Available from: https://searchengineland.com/google-ai-overview-fails-442575

Gerstgrasser M, Schaeffer R, Dey A, Rafailov R, Pai D, Sleight H, et al. Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data [Internet]. 2024 [cited 2025 Jul 31]. Available from: https://arxiv.org/pdf/2404.01413

Montti R. Google Explains How To Approach Content For SEO [Internet]. Search Engine Journal. 2025 [cited 2025 Jul 31]. Available from: https://www.searchenginejournal.com/google-explains-how-to-approach-content-for-seo/550989/

Google. Creating Helpful, Reliable, People-First Content | Google Search Central | Documentation [Internet]. Google Developers. 2025. Available from: https://developers.google.com/search/docs/fundamentals/creating-helpful-content

Schwartz B. Google: Normal SEO Works To Get Into AI Overviews [Internet]. Search Engine Roundtable. 2025 [cited 2025 Aug 1]. Available from: https://www.seroundtable.com/google-ai-overviews-normal-seo-39817.html