Large language models are not about language

Cambridge and other top universities issue a joint manifesto: LLMs fundamentally do not understand language! A probabilistic game under 70MW of power consumption

When the whole world is cheering for the fluent conversations and astonishing capabilities of Large Language Models (LLMs), linguists from top institutions such as the University of Cambridge and Macquarie University have poured cold water on the excitement.

ArXiv URL：http://arxiv.org/abs/2512.13441v1

In a recent commentary, they bluntly stated: for linguistic research, LLMs are almost “useless.”

This may sound harsh, even counterintuitive. After all, ChatGPT seems to have “mastered” human language. But in the article titled Large language models are not about language, the authors, through hard-nosed comparisons from cognitive science, neurobiology, and computational efficiency, reveal a brutal truth: LLMs are merely playing an expensive probabilistic game; they do not truly possess the human language system.

The core divide: flat “strings” vs. three-dimensional “trees”

Why do LLMs not understand language? The key lies in the fundamental difference between the way they process information and the human brain.

The authors point out that LLMs are essentially probabilistic models. Their working principle can be traced back to Markov’s 1913 analysis of Pushkin’s poetry—predicting the next word by statistically analyzing the previous one. Although today’s models have reached trillion-scale parameter counts, their underlying logic is still statistical analysis of externalized strings. What they see is a flat, linear sequence of text.

By contrast, human language is not just “speaking.” The foundation of human language is an mind-internal computational system.

According to the Strong Minimalist Thesis in linguistics, the human brain generates hierarchical thought structures through recursive functions. These structures determine semantics. In other words, human language is a three-dimensional “tree” in the mind, while LLMs can only handle flattened “strings.”

As the authors put it: “The probabilistic nature of LLMs is the exact opposite of the recursive function by which the human mind generates hierarchical structures.”

70MW vs. 20W: not just an energy gap, but a chasm in intelligence

To prove that LLM learning bears no resemblance to human learning, the authors present a staggering data comparison.

LLM “acquisition” depends on massive data feeding and astonishing computational power. The article specifically mentions xAI’s data center in Memphis: to run 100,000 GPUs, the center requires 70MW of power, so much that the local grid cannot handle it and 18 natural gas generators had to be deployed. Google is even planning to order nuclear reactors for its AI data centers.

By comparison, the human brain operates at only about 20W, and the energy used for language processing is even less.

More importantly, the learning process of human infants exhibits the phenomenon of Poverty of the Stimulus. Infants do not need to read the entire internet, and even with extremely limited input, they can build complex syntactic structures in their minds. This highly efficient “less is more” mechanism is something brute-force-computing LLMs can never match.

The “impossible language” test: AI’s fatal flaw

If energy consumption is merely an engineering issue, then the response to “impossible languages” exposes LLMs’ cognitive shortcomings.

Neuroscience experiments show that the human brain has a strict filtering mechanism for language. When we process “real language” that conforms to hierarchical rules, Broca’s area in the brain is activated; when faced with “impossible languages” based on linear rules—such as simply reversing the order of words—the brain shows inhibition. This indicates that the human brain is naturally able to distinguish what is language and what is not.

LLMs, however, have no such ability to discriminate.

Studies show that LLMs can learn normal English, and can just as “perfectly” learn “impossible languages” in which words are randomly shuffled or completely reversed. For LLMs, as long as the data is large enough, they can fit any statistical pattern, regardless of whether that pattern conforms to the essence of human language.

The authors rebut the view held by Futrell, Mahowald, and others that “LLMs have linguistic inductive biases,” pointing out that LLMs’ “learning ability” when confronted with structureless random text precisely proves that they lack the cognitive architecture unique to humans.

Conclusion: it sounds like a duck, but it isn’t one

The article’s conclusion is both sharp and sober. The authors argue that since LLMs and human language faculty differ so fundamentally in basic principles, learning methods, and neurobiological foundations, expecting to understand human language cognition by studying LLMs is like trying to find a needle in a haystack.

The article ends with a classic proverb:

“An LLM may quack like a duck, but isn’t one.” LLMs may sound like a duck, but they are not ducks.

Under the current probabilistic model paradigm, it will never become that “duck.” This is not only a cooling-off of AI hype, but also a tribute to the exquisite biological evolutionary miracle that is the human brain.