Introduction
Note: I will write this introduction better at some point, maybe. For now enjoy what o3 cooked up.
Modern machine‑learning systems dazzle us with fluency and foresight: they finish our sentences, forecast the weather, and even draft proofs that intimidate graduate students. Yet none of them—not the largest language model, nor the most carefully tuned Bayesian updater—has ever done what a single human mind did in 1905 when it looked at two flawless theories (Newtonian mechanics and Maxwellian electrodynamics) and realised they demanded a new underlying geometry—one that Hermann Minkowski would soon formalise as spacetime. That leap was not a better curve‑fit; it was the birth of a new explanatory symbol.
And Einstein is only a vivid bookmark in a much longer story: humanity began with a handful of concrete tokens—sun, stone, spear—and over millennia has invented whole forests of abstract ones: zero, imaginary unit, entropy, quark, algorithm. Each addition enlarged the very language in which contradictions could be spotted and resolved. The central empirical fact is not one spectacular symbol but the relentless, unaided expansion of our shared vocabulary. The question is whether any purely classical machine, whose alphabet is fixed at boot‑time, can replicate that open‑ended growth.
This paper asks a blunt question:
Can any fully classical, self‑contained learner—even in the limit of infinite data—reproduce that kind of conceptual invention?
We defend the negative. Gradient fields may extend to the stars, and concatenative parrots may absorb every shard of Reddit, but if a system is forever confined to the symbols it started with, no amount of self‑training or “strange‑loop” self‑reflection (pace Hofstadter) can break its semantic cage.
To make that claim precise we build three walls and show there is no classical escape hatch:
Wall 1: The Model‑Class Trap. If reality contains a regularity your hypothesis class cannot express, ever more data will walk you to the best wrong map while your compass stays silent.
Wall 2: The Amalgam Dilemma. When two internally consistent theories collide, classical logic offers only quarantine or silent re‑labelling: neither creates a shared, novel predicate.
Wall 3: The Proof‑Theoretic Ceiling. No consistent, recursively enumerable calculus can deduce the soundness of a symbol that is not already in its alphabet.
Each wall is backed by a classical result: Ng–Jordan & Grünwald–van Ommen on misspecification, Robinson–Craig on amalgamation, Tarski–Gödel–Löb on proof limits—and each blocks the popular escape routes: tempered posteriors, continual learning, Hofstadterian strange loops, and even dialetheist paraconsistency.
The numbered sections that follow map the terrain:
Invisible Contradictions (Model‑Class Trap)
Classical Amalgam Dilemma (Partition vs. Re‑labelling)
Proof‑Theoretic Ceiling (Why Re‑labelling Fails)
Ironclad Meta‑Theorem (Why No Classical Escape Remains)
Until a machine can point to its own alphabet and declare these tokens are not enough, then mint a new token that can both explain and predict, the birth of genuinely novel concepts will remain exclusively a product of human consciousness.
This conclusion is not a mere curiosity for epistemology; it strikes at the heart of material‑reductionist theories of mind. If every purely classical, matter‑bound learner is trapped behind the same three walls, then any account of consciousness that equates mind with classical symbol manipulation inherits the cage as well. By classical signature we mean the fixed alphabet Σ together with ordinary, non‑paraconsistent first‑order entailment ⊢, the entire toolkit that a Turing‑equivalent physical machine (silicon or neuronal) can enumerate without outside help. Staying inside that signature is tantamount to treating cognition as nothing more than progressively re‑weighted symbol strings.
Therefore a material‑reductionist model of mind has two options:
Show that the brain transcends the classical signature while remaining a local, physical system. This would require empirical evidence that neural dynamics instantiate non‑classical logics or on‑the‑fly symbol‑creating operations that demonstrably bypass Walls 1‑3 (evidence that, so far, no experiment has supplied).
Abandon the claim that mere classical symbol manipulation is sufficient, admitting an extra‑classical ingredient; be it quantum or something not yet named.
Until such evidence appears, the creative contradiction‑resolving leaps of human thought sit outside the reach of any architecture built only from pre‑defined matter moving through classically logical states.
§1 The Invisible Contradiction: Misspecification and the Model‑Class Trap
A map that only shows roads will never reveal a river.
1.1 Statement of the trap
Let Σ be the finite alphabet (possibly containing millions of tokens, but still finite) that generates the learner’s language, and let 𝒞 be a recursively‑enumerable family of probability models written in that language. A computable environment is any infinite data‑stream
produced by an algorithm whose true law μ lies outside 𝒞. (Each observation (x_i,y_i) may itself be a high‑dimensional vector, a structured record, or any finite string; the ordered‑pair notation is shorthand, not a 2‑D restriction.)
Lemma A (Model‑class limitation). For Bayesian, MDL, or PAC‑Bayesian learners whose posterior/predictor ranges over 𝒞,
where M_n is the posterior–selected model after n samples.1
Plain words. With probability 1 the learner settles forever on the best wrong model (model minimizing KL divergence within 𝒞)2 and never realizes the menu 𝒞 is missing a dish. (Some statisticians temper the likelihood—e.g. SafeBayes3—to curb over‑confidence, but such tuning only spreads posterior mass more cautiously; it never creates the missing concept.)
1.2 Take‑away for the no‑go theorem
Because every posterior update and every alarm signal is still spelled with the birth‑alphabet Σ, the learner remains intrinsically locked inside that alphabet. The contradiction hiding in μ stays invisible until a new symbol—an entirely new brick—is supplied from outside the system.
First wall. A classical learner can become less wrong under misspecification, but it cannot see that its very language is inadequate.
In the next section we ask what happens when two such separately‑trained systems, each blind in this way, are forced to share one signature.
§ 2 When Two Theories Collide - The Classical Amalgam Dilemma
Two puzzle pieces may look perfect on their own, yet the moment you try to lock them together the picture stops making sense.
2.1 The dilemma in a sentence
When two consistent theories Tₐ and Tᵦ (written in the same signature Σ) contradict each other, classical logic leaves only two conceivable moves — and even one of them cannot be executed internally (i.e., without calling on any oracle or higher‑level language supplied from outside the system) by a self‑contained learner:
Partition – tag every sentence of Tₐ with a region predicate P and every sentence of Tᵦ with ¬P, so the clash never happens in one world‑slice.
Symbol re‑labelling (private reinterpretation) – keep the same tokens but map at least one shared symbol to a different meaning via a non‑identity translation Σ → Σ′ (uses same token set with altered interpretation function). This fixes the contradiction only by covertly renaming a term; it creates no genuinely new joint predicate.
No third option can be proved inside Tₐ ∪ Tᵦ.
(Both repairs are “conceivable” only with help from an external interpreter, which merely relocates the problem to a bigger Σ; a self‑contained learner cannot carry them out.)
2.2 Formal statement (Lemma B)
Lemma B (Classical amalgam limitation). If Tₐ and Tᵦ are individually consistent yet their union is inconsistent, then a consistent unifier U exists iff either (i) each theory is relativised to a disjoint region predicate, or (ii) at least one shared symbol is re‑interpreted by a non‑identity morphism;45.
2.3 Why spotting the clash still isn’t enough
Even if a mechanistic learner detects that Tₐ ∪ Tᵦ is inconsistent, the only repairs derivable inside classical logic are partition or rename, and both keep the theories apart rather than synthesising them.
Second wall. Classical logic can either quarantine the two theories or perform a private re‑labelling of vocabulary. Neither move introduces a new predicate genuinely shared by both sides, so no true conceptual synthesis emerges.
§ 3 Why Private Re‑Labelling Still Fails: The Proof‑Theoretic Ceiling
You can shuffle old Lego bricks forever, but you can’t 3‑D‑print a new one from inside the set.
3.1 The ceiling in one line
A self‑contained classical reasoner whose deductive engine is any consistent, recursively‑enumerable first‑order theory T cannot prove a theorem that effectively extends its own signature (same result holds for any r.e. extension of PA).
3.2 Formal statement (Lemma C)
Lemma C (Proof‑theoretic ceiling). Let T be a consistent r.e. first‑order theory with language Σ. There is no Σ‑sentence τ such that T ⊢ “τ names a new predicate symbol P ∉ Σ” and simultaneously proves the soundness of T ∪ {τ}.
Equivalently, T cannot, by internal deduction alone, manufacture an interpretation of any symbol not already in Σ.
3.3 Why re‑labelling doesn’t escape the ceiling
The “private re‑labelling” move of § 2 merely maps an existing token s ∈ Σ to a different model‑theoretic extension. It does not introduce a new symbol; therefore it cannot express the joint regularity shared by Tₐ and Tᵦ.
To coin a truly new predicate Q that both theories reference (e.g.\ Einstein’s re‑conceived spacetime metric), the system would need a rule that extends Σ → Σ ∪ {Q}, as the spacetime metric requires extra-signature tensor symbol (tensor of rank 2 with Lorentzian signature not definable with Newton-Maxwell scalars). Lemma C blocks any such rule from being derived internally.
Third wall. A classical learner lacks the meta‑machinery to mint a brand‑new symbol and prove it meaningful. Without an external oracle, unification by genuine reinterpretation is impossible.
§ 4 Why the No‑Go Result Is Ironclad: No Classical Escape Hatch Remains
If every door in the maze leads back to the same room, it isn’t a maze; it’s a cage.
4.1 Recap: Three walls, zero exits
4.2 Meta‑Theorem: No self‑contained classical learner crosses all three walls
Intrinsic‑cage Meta‑Theorem. Let ⟨Σ, ⊢⟩ be a computably‑enumerable language/entailment pair. Any algorithm L whose entire operation (initial code + update rule + data interface) is computable relative only to ⟨Σ, ⊢⟩ must fail at least one of:
Menu‑failure flag: decidably assert “no Σ‑sentence fits the data.”
Brick printer: construct a new symbol τ ∉ Σ and justify its semantics internally.
Non‑partition synthesis: build a single, contradiction‑resolving unifier U for a fresh clash Tₐ ∪ Tᵦ.
Proof. (1) blocked by Lemma A; (2) by Lemma C; (3) by Lemma B. ∎
4.3 Survey of modern architectures — All still inside the cage
No peer‑reviewed system published as of May‑2025 supplies an intrinsic menu‑failure test and a self‑justifying language extension and a non‑partition synthesis. Each proposal trips on at least one wall.
4.4 Invitation: What a valid counter‑example must show
To overturn the meta‑theorem, present running code that
Emits an internal signal “Σ insufficient” on a previously unseen data‑stream;
Prints a fresh symbol Q, proves within its own calculus that Q’s semantics lower predictive loss and strengthen explanatory reach; and
Demonstrates, on two brand‑new conflicting theories, a non‑partition synthesis using Q.
Anything short of this leaves the no‑go theorem intact.
Final takeaway. Endless data, bigger GPUs, or clever priors can sharpen prediction inside the cage, but they cannot tear down its walls. Human‑style conceptual leaps require a mechanism that no classical, self‑contained learner (as presently conceived) provides.
References
Catoni, O. (2004). PAC-Bayesian Supervised Classification: The Thermodynamics of Statistical Learning. IMS.
Craig, W. (1957). Linear reasoning. A new form of the Herbrand–Gentzen theorem. Journal of Symbolic Logic, 22, 250-268.
Gödel, K. (1931). Über formal unentscheidbare Sätze der Principia Mathematica und verwandter Systeme I. Monatshefte für Mathematik, 38, 173-198.
Grünwald, P., & van Ommen, T. (2017). Inconsistency of Bayesian inference for misspecified linear models, and a proposal for repairing it. Bayesian Analysis, 12(4), 1069-1103.
Löb, M. (1955). Solution of a problem of Leon Henkin. Journal of Symbolic Logic, 20(2), 115-118.
Montague, R. (1963). The paradox of ground models. In Essays on the Foundations of Mathematics.
Ng, A. Y., & Jordan, M. I. (2001). On discriminative vs. generative classifiers: A comparison of logistic regression and naive Bayes. NIPS 2001.
Priest, G. (2006). In Contradiction (2nd ed.). Oxford UP.
Robinson, A. (1956). On the metamathematics of algebra. Fundamenta Mathematicae, 43, 37-77.
Tarski, A. (1956). The concept of truth in formalized languages. In Logic, Semantics, Metamathematics (pp. 152-278). Clarendon. (Original work 1933).
A. Ng & M. Jordan, NIPS 2001, “On Discriminative vs. Generative Classifiers.”
Catoni (2004) “PAC-Bayesian Supervised Classification.”
P. Grünwald & T. van Ommen (2017) “Inconsistency of Bayesian Inference…”
A. Robinson (1956) “On the Consistency of Certain Formal Logics.”
W. Craig (1957) “Linear Reasoning. A New Form of the Herbrand–Gentzen Theorem.”
A. Tarski (1933/1956) “The Concept of Truth in Formalized Languages.”
K. Gödel (1931) “Über formal unentscheidbare Sätze…”
M. Löb (1955) “Solution of a Problem of Leon Henkin.”
R. Montague (1963) “The Paradox of Ground Models.”