By Brett Watkins, CEO, L&E Research
Most of us are mulling the future of the insights industry and the impact of AI and synthetic data on helping brands predict human behavior. AI and synthetic data are promising the next “cheap insights, even faster!” As a qualitative research guy, I feel like I’ve seen this movie before. But this time the pitch is more seductive than ever, given the omnipresence of artificial intelligence in our world today.
And then it hit me: this is analogous to the problem with photocopies.
Oddly enough, what got me thinking wasn’t something from within the industry — it was a question asked directly to Sam Altman, CEO of OpenAI. Lenny Murphy, a well-respected industry expert, recently posted in his Insight Innovations Substack about an OpenAI event where the question was posed: what happens to AI models when there is no training data left? The answer wasn’t reassuring. By current estimates, the global supply of ground truth human-generated data — the real, verified, first-person human content that trains AI systems — could be exhausted somewhere around 2030.
Lenny framed it well: “Primary research has always sold itself as ‘access to human truth.’ For a generation, that value proposition was under pressure from secondary data, synthetic data, and passive behavioral data. Now those pressures are converging with an active supply deficit that makes the industry’s core product — structured, validated, first-person human data — more strategically valuable than it has been in decades.”
The AI revolution runs on human-generated data — the words, opinions, behaviors, and experiences real people have produced and shared over decades. Every large language model you’ve used — ChatGPT, Gemini, Claude, Perplexity — was trained on an enormous corpus of human-written text. The problem is that corpus is finite, and the models are eating it faster than we can produce it.
Epoch AI estimates the effective stock of quality human-generated public text at roughly 300 trillion tokens, with full utilization projected somewhere between 2026 and 2032. Elon Musk declared at CES that AI has essentially exhausted available real-world training data. Dario Amodei, CEO of Anthropic, has publicly acknowledged data scarcity as a meaningful risk to continued AI scaling.
So what does the AI industry do when it runs out of human truth? It turns to synthetic data. And that is where the photocopy analogy comes into play.
What happens when you photocopy something over and over again? By the fifth or sixth generation, the text is blurry, the details are fading, and what started as a crisp original has become a smeared approximation of itself.
That is exactly what happens when AI trains on AI-generated data.
In July 2024, Oxford University researchers published a peer-reviewed paper in Nature that rattled the AI world. When AI models are trained on synthetic data — content generated by other AI models — subsequent generations degrade. Not subtly. Dramatically. Rare knowledge disappears first. Outputs drift toward bland, averaged generalities. Diversity collapses. The researchers called this model collapse. Some academics have gone further, calling the recursive version — training AI on AI that was trained on AI — “Model Autophagy Disorder.” The models, quite literally, begin eating themselves.
By April 2025, an estimated 74% of newly created webpages already contained AI-generated text. The web that future AI will scrape for training data is increasingly a web that AI already wrote.
The 2025 GRIT Insights Practice Report found that data quality concerns among insights professionals have surged 40% year-over-year, driven specifically by anxiety over synthetic respondents, AI bots masquerading as real people, and the integrity of the data supply chain. As one GRIT respondent put it plainly: “Bad data amplified by AI erodes trust faster than ever.”
The photocopy machine is running on fumes — and the industry knows it
For several years, synthetic data has been promoted as a faster, cheaper alternative to primary research. Why recruit real consumers when AI can simulate what they’d say? Why run a focus group when a synthetic persona can model the response in minutes?
The pitch is seductive. Research teams using synthetic data report high satisfaction — 87% express positive feedback, according to GRIT — and a 2025 Qualtrics study found 62% of market researchers used synthetic data in the previous six months. Adoption is real and accelerating.
But here is the uncomfortable truth the adoption numbers don’t tell you. The same researchers watching AI models collapse from a diet of synthetic data have identified what they call hyper-accuracy distortion — a tendency for synthetic responses to produce answers that are too clean, too consistent, and too confident. In other words: platitudes and generalizations. Not insights. Not human.
Carnegie Mellon researchers studying AI-generated interview responses found a consistent pattern: while AI can sound plausible and articulate, it fundamentally lacks real-world context and authentic lived experience. That’s not a software bug to be patched. That’s an ontological reality. AI has never stood in a grocery aisle, held a product, and felt the flicker of hesitation before putting it back on the shelf. It has never sat at a kitchen table and felt the quiet relief of finally finding a solution to a problem carried for months.
Brands make eight, nine, and ten-figure decisions based on research. Synthetic data drifting from reality as it compounds on itself is not an adequate foundation for those decisions.
Here’s where I want to reframe the conversation entirely.
The same dynamic threatening the long-term quality of AI models is simultaneously making verified, structured, first-person human data the most strategically valuable commodity in the insights industry. Don’t take my word for it — OpenAI and Google have signed licensing deals with major publishers and human-first content creators. The hyperscalers are not doing this out of nostalgia for human authorship. They’re doing it because human-generated ground truth data is becoming scarce, and without it, their models degrade.
Without ongoing human research, synthetic models lose their connection to how real people actually think, feel, and behave. Even Qualtrics — one of the more bullish voices on synthetic data’s potential — acknowledges that human respondents remain the irreplaceable source of truth anchoring all synthetic models. Which raises a question worth asking, brand leaders: when exactly did Qualtrics assume ownership of the data it collected on your behalf to build its models and grow its valuation?
The 2025 GRIT report is clear: the insights industry faces a trust crisis, and the solution isn’t a technology fix. It’s a return to authentic human understanding. The firms pulling ahead are those combining AI-native tools with genuine human expertise — not those replacing one with the other.
Your customers — real, verified, ID-validated people who can tell you what they actually think, feel, want, and do — are the exact resource the entire AI industry is running low on. That is not a crisis for primary research. That is a mandate.
Synthetic data has a role to play. Used appropriately — as a complement to human research — it is a useful tool. I’ve said it before and I mean it. Heck, I’m an investor in it.
But as a replacement for real human voices? The science says no. The engineering says no. The GRIT data says no. And four decades of watching brands make decisions based on shortcuts in the research process says no.
Sam Altman’s team is worried about running out of human truth. Your competitors are quietly hoping you’ll settle for synthetic substitutes while they invest in getting closer to their actual customers.
Don’t let them.
At L&E Research, we have been building toward this moment for a long time — not because we predicted the AI data crisis, but because we never stopped believing that the foundation of better understanding humans is creating a healthy ecosystem where brands and their customers can engage. Our panel of over 1.6 million ID-validated US consumers, patients, and medical professionals isn’t just a recruiting asset. In the world we are now entering, it is a strategic one.
The magic of what we do — the reason brands keep coming back after 40 years — is simple. We bring companies and people together to talk. Real people. Real conversations. Real truth. That’s where smarter decisions get made.
We’re not giving you fifth and sixth generation photocopies you can barely read. We’re giving you crystal clear insight into how to move the needle on your brand.
Brett Watkins is the CEO of L&E Research, a research solutions firm that has connected brands with their customers since 1984. L&E delivers high-quality, US-based insights by bringing together ID-validated panel, purpose-built technology, research facilities, and talented people — because smarter decisions start with real conversations with real humans. If you’re a brand leader or the agency behind one, we are the people for you.