ablation
Identifying interactions at scale for LLMs
Understanding the behavior of complex machine learning systems, particularly Large Language Models (LLMs), is a critical challenge in modern artificial intelligence. Interpretability research aims to make the decision-making process more transparent to model builders and impacted humans, a step toward safer and more trustworthy AI. To achieve state-of-the-art performance, models synthesize complex feature relationships, find shared patterns from diverse training examples, and process information through highly interconnected internal components. In this blog post, we describe the fundamental ideas behind SPEX and ProxySPEX, algorithms capable of identifying these critical interactions at scale. We mask or remove specific segments of the input prompt and measure the resulting shift in the predictions.
LASER: Low-Rank Activation SVD for Efficient Recursion
Çakar, Ege, Raghu, Ketan Ali, Zheng, Lia
Recursive architectures such as Tiny Recursive Models (TRMs) perform implicit reasoning through iterative latent computation, yet the geometric structure of these reasoning trajectories remains poorly understood. We investigate the activation manifold of TRMs during recursive unrolling and find that activations occupy an effectively linear, low-dimensional subspace whose principal directions can be tracked dynamically with cheap power iterations. This suggests that weight-sharing concentrates iterative computation along a small number of dominant eigendirections, and we find that this concentration varies sharply across computational sites. We exploit this structure through LASER (Low-Rank Activation SVD for Efficient Recursion), a dynamic compression framework that maintains an evolving low-rank basis via matrix-free subspace tracking with a fidelity-triggered reset mechanism, achieving ${\sim}60\%$ activation memory savings with no statistically significant accuracy degradation. Our analysis raises questions about how recursive architectures allocate representational capacity during implicit reasoning, and whether this concentration can be exploited to improve the efficiency and stability of latent computation.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- North America > United States (0.14)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Asia > Singapore (0.04)
- Asia > Indonesia > Bali (0.04)
- (9 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.93)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Data Science (0.92)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.71)
- Asia > Middle East > Israel (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (3 more...)
- Research Report > Experimental Study (0.92)
- Research Report > New Finding (0.67)
- South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report > New Finding (0.93)
- Research Report > Experimental Study (0.68)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
- Information Technology > Data Science (0.68)
- South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report > New Finding (0.93)
- Research Report > Experimental Study (0.68)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
- Information Technology > Data Science (0.68)
- Europe > Switzerland > Vaud > Lausanne (0.04)
- Asia > Middle East > Jordan (0.04)
- Media > Music (1.00)
- Leisure & Entertainment (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)