The paper asks whether decoder-only transformer language models preserve input information by mapping distinct input prompts to distinct hidden representations, despite components that are individually non-injective. It proves mathematically that the prompt-to-hidden-state map is injective at initialization and remains injective under gradient-based training, leveraging real-analytic components, continuous initialization distributions, and preservation of absolute continuity; collisions would occur only in measure-zero parameter settings. It then introduces SipIt, a linear-time algorithm that reconstructs the exact input prompt from hidden activations by token-by-token identification, with correctness guarantees of steps. Empirical validation across six state-of-the-art models shows no collisions in billions of tests and exact token-level recovery, highlighting implications for transparency, interpretability, and safe deployment.
Abstract
Transformer components such as non-linear activations and normalization are
inherently non-injective, suggesting that different inputs could map to the
same output and prevent exact recovery of the input from a model's
representations. In this paper, we challenge this view. First, we prove
mathematically that transformer language models mapping discrete input
sequences to their corresponding sequence of continuous representations are
injective and therefore lossless, a property established at initialization and
preserved during training. Second, we confirm this result empirically through
billions of collision tests on six state-of-the-art language models, and
observe no collisions. Third, we operationalize injectivity: we introduce
SipIt, the first algorithm that provably and efficiently reconstructs the exact
input text from hidden activations, establishing linear-time guarantees and
demonstrating exact invertibility in practice. Overall, our work establishes
injectivity as a fundamental and exploitable property of language models, with
direct implications for transparency, interpretability, and safe deployment.