6 min read

We’ve handed the keys of human knowledge to language models that are, at their core, sophisticated pattern-matchers running on dirty data. When that data is bad, the answers are bad. And when the stakes are as high as distinguishing a foreign surveillance drone from something we genuinely can’t explain, bad answers aren’t just embarrassing — they’re dangerous.

Harvard astrophysicist Avi Loeb dropped a sharp piece of thinking this week, arguing in a new essay on Medium that the real bottleneck in UAP research isn’t compute power or model size — it’s the quality of the underlying data being fed into these systems. His thesis cuts through the hype cleanly: a thousand large language models trained on garbage won’t get you closer to the truth than one solid, rigorously collected dataset.

He’s right. And the implications stretch way beyond UFO research.

Enjoying this story?

Get sharp tech takes like this twice a week, free.

Subscribe Free →

The Data Problem Nobody Wants to Talk About

The AI industry has spent years treating data quantity as a proxy for data quality. Bigger training sets. More tokens. More parameters. More everything. The assumption baked into this arms race is that scale compensates for noise. It doesn’t.

Natural language processing models learn from text. Human-generated text. And human-generated text about UAPs is, to put it charitably, a mess. You’ve got declassified government reports sitting next to Reddit conspiracy threads. Peer-reviewed astrophysics papers scraped alongside breathless tabloid stories. A model trained on that mixture doesn’t develop discernment — it develops a talent for sounding confident while averaging across wildly incompatible sources.

That’s not intelligence. That’s noise with good grammar.

Why UAPs Are the Perfect Stress Test

UAPs — unidentified aerial phenomena, the term the U.S. government now uses in official contexts — represent one of the most data-starved, politically contaminated, and epistemically chaotic topics you could feed into an NLP system. The signal-to-noise ratio is abysmal. Eyewitness accounts dominate. Sensor data is rare, often classified, or technically ambiguous. Government disclosure is selective and slow.

When you ask an LLM about UAPs, it synthesizes what the internet believes about UAPs. That’s not the same thing as what UAPs actually are. The model has no way to distinguish between a credible infrared sensor reading from a Navy pilot and a blurry phone video from a guy in his backyard. Both exist in the training corpus. Both contribute to the output.

Loeb’s argument is that real progress requires purpose-built instrumentation producing high-fidelity, peer-reviewable data — not more model layers stacked on top of the same swamp of internet text. His Galileo Project is building exactly that: a network of sensors designed to capture clean, multi-modal data on anomalous aerial objects.

This is the correct instinct. And it applies to almost every domain where AI is being deployed with confidence but without epistemic hygiene.

The Broader NLP Wake-Up Call

Think about where else this pattern plays out. Capital markets are already pricing AI-powered analytics tools into sports technology valuations, assuming these systems produce reliable insights. But if the underlying training data is inconsistent, biased, or incomplete, those insights are statistical mirages.

The same concern shadows AI applications in agriculture. Conversations at the AI in Agriculture Conference keep circling back to the same tension: the models are capable, but the farm-level sensor data feeding them is patchy, inconsistently formatted, and rarely labeled with any precision. You can have the best NLP pipeline in Silicon Valley and still produce crop yield predictions that are functionally useless.

And in geopolitical contexts, the stakes get darker. AI-generated deepfakes are already being weaponized in grey-zone warfare to manipulate public perception and erode Western support for Ukraine. The poison there isn’t the model — it’s the synthetic data deliberately injected to corrupt what people believe is real. That’s data quality as a weapon.

The Hot Take

The AI industry’s obsession with model size is a distraction that serves investors more than it serves truth. OpenAI, Anthropic, Google — they all benefit from a world where the solution to every AI failure is a bigger, more expensive model. But Loeb’s UAP argument exposes that framing as self-serving nonsense. A cleaner dataset collected with scientific rigor will always outperform a bloated model trained on the accumulated biases of human internet behavior. We should be funding observatories, sensor networks, and data standards — not the next parameter count milestone.

What Actually Needs to Change

The NLP research community needs to stop treating data curation as a preprocessing chore and start treating it as the actual scientific work. That means provenance tracking. It means uncertainty quantification built into outputs, not bolted on afterward. It means being honest about the domains where language models simply cannot compensate for thin, contaminated, or politically distorted training data.

Loeb is an astrophysicist talking about aliens — or whatever UAPs turn out to be — but his argument is one of the most grounded critiques of applied AI published this year. Quality beats quantity. Evidence beats inference. And no amount of compute fixes a broken foundation. The machines are only as honest as what you teach them with.


Watch the Breakdown

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Newest
Oldest Most Voted
Inline Feedbacks
View all comments