On the Biology of a Large Language Model, Gonçalo Teixeira

On the Biology of a Large Language Model was published by Anthropic in March 2025, alongside the sibling paper Circuit Tracing: Revealing Computational Graphs in Language Models. It applied the attribution-graphs methodology to Claude 3.5 Haiku and documented concrete findings about the model's internal reasoning: chained activations in multi-hop reasoning ('Dallas → Texas → Austin'), advance planning of rhymes in poetry, and activation of specific features for deception and sycophancy. Opening the Black Box treats it as the demonstration that interpretability has moved past describing isolated features and is entering the phase of describing circuits.

Essays referencing this

Opening the Black Box