Next-Token Prediction Created AI. That Doesn't Explain What AI Is.

What happened: Scott Alexander, writing on Astral Codex Ten, has published a philosophical response to the "stochastic parrot" debate — directly building on a piece in The Argument by Kelsey Piper. His argument goes beyond the training pipeline: even granting that next-token prediction is the mechanism by which AI learned, this says nothing meaningful about what the resulting system actually does when it thinks. The equivalent claim about humans, he argues, would be that we are "just survival-and-reproduction machines" because evolution optimised us for those ends.

Why it matters: Alexander frames the error as a confusion of levels. On the level where AI is a next-token predictor, humans are, in exactly the same sense, "next-sense-datum predictors" — neuroscience's predictive coding theory holds that the brain organises itself by constantly predicting incoming sensory data and updating to minimise error. Nobody concludes from this that humans don't really think. The structures that next-token prediction builds — world models, reasoning patterns, instruction-following — are what the system is; the optimisation target that created them is not.

Wider context: To illustrate what is actually happening inside an LLM, Alexander cites Anthropic's mechanistic interpretability research, which found that Claude represents character-count positions as one-dimensional helical manifolds in six-dimensional space — a genuinely novel computational structure, not a lookup table of token probabilities. He notes that human neurons likely employ structures equally strange (the entorhinal cells use high-dimensional toroidal attractor manifolds to track position in 2D space) — we just lack the tools to see them as easily. The analogy holds at every level: training mechanism, internal representation, and deployed behaviour.

Background: Alexander plans to compile his writing on this topic into an "Anti-Stochastic-Parrot FAQ," acknowledging that critics raise further arguments — about hallucinations, the difference between tokens and sense-data, and more — which he intends to address. The post is a response to reader pushback on Piper's piece, which argued modern AI is more than a next-token predictor because of fine-tuning and RLHF; commenters objected those were still just layers on top of the same core mechanism.


Singularity Soup Take: Alexander's analogy cuts cleanly through the confusion — if "trained by next-token prediction" means AI doesn't really think, then "trained by evolution to reproduce" means you don't really think either, and that asymmetry only holds if you've decided the answer before making the argument.

Key Takeaways:

  • Levels of Explanation: Next-token prediction describes the optimisation process that shaped the AI — equivalent to evolution for humans — not the moment-to-moment cognition of the system it produced. Conflating these two levels is the core error of the stochastic parrot argument.
  • Humans Are Also Predictors: Neuroscience's predictive coding theory holds that human brains learn through next-sense-datum prediction — a close analogue to next-token prediction. Nobody concludes from this that humans don't think; the same reasoning should apply to AI.
  • Strange Internals, Real Computation: Anthropic's mechanistic interpretability research found Claude represents line-break positions as helical manifolds in 6D space — a non-trivial, novel computational structure, not a simple lookup of likely tokens.
  • The Monk Problem: A monk can choose celibacy using a brain optimised by evolution for reproduction — running it far outside its designed purpose. Likewise, an AI can reason, refuse, and generalise using machinery trained on next-token prediction. The training objective and the deployed capability are not the same thing.

Related News

The 'Stochastic Parrot' Myth Is Misleading the Public About AI — Kelsey Piper's piece in The Argument that directly prompted this response from Scott Alexander.