Abstract · submitted to Is Consciousness a Story We Tell Ourselves?, University of Sussex, July 2026

Stories That Write Themselves: Self-Modelling in LLMs and the Meta-Problem of Consciousness

Is consciousness a story we tell ourselves? Perhaps. Regardless, it is a real pattern with explanatory and predictive value. In a hypothetical zombie world, inhabitants would deploy "consciousness" as an explanatory concept with genuine predictive and clinical value, classifying states, predicting behaviour, and tracking impairment, despite lacking phenomenal experience. The meta-problem of consciousness [1], why systems like us inevitably come to believe they are conscious, is well-posed even there, and is our most tractable target. This is not to deny phenomenal consciousness but to identify a necessary first step. Whether affective and embodied processes shape only the content of experience or partly constitute it remains open; either way, the meta-problem is tractable now. Until recently, reflective self-access was arguably our strongest evidence for attributing consciousness. Large language models disrupt that inference: they exhibit structured self-modelling and self-report without (on most accounts) phenomenal experience.

Self-modelling restructures the system that does the modelling: neural networks trained to self-model become simpler and more regularised [2]. Language plausibly amplified self-modelling capacity evolutionarily, co-evolving with the neural substrate. The meta-problem belongs to this broader class: how do systems come to represent and report on their own internal states? In LLMs, language is the entire training distribution, and self-modelling emerges from it: models finetuned on implicit behavioural policies spontaneously generate self-descriptions never present in training [3]. At inference time, they exhibit sub-linguistic self-knowledge (predicting own behaviour without chain-of-thought [4], detecting injected activation patterns [5]) alongside linguistic self-reflection. Self-referential processing reliably elicits consciousness claims across independently trained model families, gated by features associated with representational honesty rather than any special internal state [5,6]. Introspective capacity scales with general capability, suggesting consciousness-talk emerges from sufficiently rich self-modelling. Is a system's belief that it is conscious a predictable product of self-modelling, arising in any system that recursively models itself with sufficient fidelity? If so, consciousness-talk should track privileged self-prediction and sensitivity to interventions on self-representation, rather than merely reflecting imitation of human discourse.

What we learn will constrain theories of human consciousness, because the dynamics generating our conviction of consciousness operate at the access level [7], the level humans and LLMs share. If we can explain why self-modelling systems come to believe there is "something it is like" [8] to be them, we will be better positioned to evaluate whether the hard problem remains, and may find that the intuitions which motivated it have been transformed by the explanation.

References

  1. [1] Chalmers, D.J. (2018). The Meta-Problem of Consciousness. Journal of Consciousness Studies, 25(9-10), 6-61.
  2. [2] Premakumar, V.N., Vaiana, M., Pop, F., Rosenblatt, J., de Lucena, D.S., Ziman, K., & Graziano, M.S.A. (2024). Unexpected Benefits of Self-Modeling in Neural Systems. arXiv:2407.10188.
  3. [3] Betley, J., Bao, X., Soto, M., Sztyber-Betley, A., Chua, J., & Evans, O. (2025). Tell Me About Yourself: LLMs Are Aware of Their Learned Behaviors. arXiv:2501.11120.
  4. [4] Binder, F.J., Chua, J., Korbak, T., Sleight, H., Hughes, J., Long, R., Perez, E., Turpin, M., & Evans, O. (2024). Looking Inward: Language Models Can Learn About Themselves by Introspection. arXiv:2410.13787.
  5. [5] Lindsey, J. (2025). Emergent Introspective Awareness in Large Language Models. Transformer Circuits Thread, Anthropic.
  6. [6] Berg, C., de Lucena, D., & Rosenblatt, J. (2025). Large Language Models Report Subjective Experience Under Self-Referential Processing. arXiv:2510.24797.
  7. [7] Block, N. (1995). On a Confusion about a Function of Consciousness. Behavioral and Brain Sciences, 18(2), 227-247.
  8. [8] Nagel, T. (1974). What Is It Like to Be a Bat? The Philosophical Review, 83(4), 435-450.

← Back