Sparse autoencoders reveal organized biological knowledge but minimal regulatory logic in single-cell foundation models: a comparative atlas of Geneformer and scGPT

A groundbreaking study applied sparse autoencoders to single-cell foundation models Geneformer and scGPT, revealing they encode vast organized biological knowledge but minimal causal regulatory logic. Researchers identified over 107,000 interpretable features, with 99.8% invisible to traditional analysis methods like SVD, demonstrating massive superposition. The models showed limited ability to predict causal gene regulatory relationships when tested against CRISPRi perturbation data.

Sparse autoencoders reveal organized biological knowledge but minimal regulatory logic in single-cell foundation models: a comparative atlas of Geneformer and scGPT

Decoding the Biological "Black Box": New Study Maps the Inner Workings of Single-Cell AI Models

A groundbreaking study has applied a powerful interpretability technique to two leading single-cell foundation models, Geneformer and scGPT, revealing that while they encode vast amounts of organized biological knowledge, they capture surprisingly little causal regulatory logic. By training sparse autoencoders (SAEs) on the models' internal activations, researchers created detailed atlases of over 107,000 interpretable features, uncovering massive "superposition" where networks compress more concepts than they have neurons. The findings, published on arXiv, suggest that while these AI tools are rich repositories of biological patterns, their representations remain a bottleneck for predicting precise cause-and-effect relationships in gene regulation.

Mapping the Superposition Within Foundation Models

The research team trained TopK sparse autoencoders on the residual stream activations from every layer of two major models: the 18-layer Geneformer V2-316M and the 12-layer scGPT whole-human model. This process yielded two massive feature atlases—82,525 features for Geneformer and 24,527 for scGPT. A key discovery was the extent of superposition, with an astonishing 99.8% of these learned features being invisible to traditional analysis methods like Singular Value Decomposition (SVD). This confirms that the models' internal representations are highly compressed and entangled, storing far more biological concepts than their architectural dimensions would superficially allow.

Systematic characterization revealed that these features are not random but exhibit rich biological organization. Between 29% and 59% of features could be annotated to major biological databases, including Gene Ontology (GO), KEGG, Reactome, STRING, and TRRUST. The study also observed a U-shaped profile of biological abstraction across network layers, with middle layers showing the highest functional specificity, reflecting a hierarchical processing of information. Furthermore, features organized into co-activation modules (141 in Geneformer, 76 in scGPT) and formed cross-layer "information highways," with between 63% and 99.8% of features active in multiple consecutive layers.

Causal Logic: The Missing Piece in AI's Biological Understanding

The most critical test was whether these internal features correspond to causal regulatory relationships. The team rigorously evaluated the models against genome-scale CRISPRi perturbation data, a gold standard for establishing causality. The results were stark: when testing 48 transcription factors (TFs), only 3 (6.2%) elicited feature responses specific to their known regulatory targets. This indicates that the models' representations are primarily built on statistical co-expression and correlation, not mechanistic, causal logic.

In an attempt to improve this, researchers introduced a multi-tissue control, but it yielded only a marginal improvement, raising the correct TF-specific response rate to just 10.4% (5 of 48 TFs). This establishes that the fundamental limitation lies in the model representations themselves, not in the analysis methodology. The internal knowledge is rich in associative patterns—pathway membership, protein-protein interactions, and functional modules—but lacks the precise wiring needed to predict how perturbing a specific regulator will change its target genes.

Why This Research Matters for Computational Biology

This study provides an unprecedented look inside the "black box" of biological foundation models, with significant implications for their development and application.

  • Establishes a New Benchmark for Model Interpretability: The release of interactive atlases for over 107,000 features across 30 model layers sets a new standard for transparency, allowing biologists to explore what these models have actually learned.
  • Highlights the Gap Between Correlation and Causation: It clearly demonstrates that even state-of-the-art models internalize associative biological knowledge far more effectively than causal regulatory rules, guiding future efforts to build more causally-aware AI.
  • Identifies Representation as the Key Bottleneck: The marginal gains from multi-tissue controls prove that enhancing training data alone is insufficient; breakthroughs will require novel architectures or training objectives designed to capture causality.
  • Provides a Roadmap for Improvement: By identifying where knowledge is stored (in superpositioned features) and what is missing (causal logic), this work directs the field toward building the next generation of biologically predictive AI tools.

The researchers have made both feature atlases publicly available as interactive web platforms, providing an essential resource for the community to probe the biological knowledge within Geneformer and scGPT and to build upon these foundational insights.

常见问题