LLMs show a “highly unreliable” capacity to describe their own internal processes

November 5, 2025

Female robot

(Ars Technica) – If you ask an LLM to explain its own reasoning process, it may well simply confabulate a plausible-sounding explanation for its actions based on text found in its training data. To get around this problem, Anthropic is expanding on its previous research into AI interpretability with a new study that aims to measure LLMs’ actual so-called “introspective awareness” of their own inference processes. (Read More)