knockoff
- Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Health & Medicine > Health Care Technology (1.00)
- Health & Medicine > Therapeutic Area > Neurology (0.94)
- Health & Medicine > Diagnostic Medicine > Imaging (0.93)
- Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Health & Medicine > Health Care Technology (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (0.94)
- Health & Medicine > Therapeutic Area > Neurology (0.94)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.68)
Which Sparse Autoencoder Features Are Real? Model-X Knockoffs for False Discovery Rate Control
In artificial intelligence research, comprehending the internal representations of a large language model is still a fundamental challenge [Olah et al., 2020]. Neural network activations can now be broken down into interpretable features using sparse autoencoders (SAEs) [Cunningham et al., 2023, Templeton et al., 2024]. SAEs seek to deconstruct polysemantic neurons into monosemantic features that correlate to concepts that are comprehensible to humans by learning overcomplete sparse representations of model activations. Finding SAE features and confirming their legitimacy are not the same thing, though. The methods used in most interpretability research today are correlation with downstream tasks, automated explanation scoring, or manual inspection. These methods are unable to differentiate between real computational patterns and spurious correlations that result from the multiple testing problem, and they lack formal statistical guarantees. Random chance alone will yield a large number of apparent correlations with any target variable when thousands of candidate features are examined.
- North America > United States > Michigan (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.71)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > France (0.04)
- Asia > China (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > West Midlands > Birmingham (0.04)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.68)
- Health & Medicine > Therapeutic Area > Immunology (0.48)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)