Goto

Collaborating Authors

 Chou, Chi-Ning


Feature Learning beyond the Lazy-Rich Dichotomy: Insights from Representational Geometry

arXiv.org Artificial Intelligence

The ability to integrate task-relevant information into neural representations is a fundamental aspect of both biological and artificial intelligence. To enable theoretical analysis, recent work has examined whether a network learns task-relevant features (rich learning) or resembles a random feature model (or a kernel machine, i.e., lazy learning). However, this simple lazy-versus-rich dichotomy overlooks the possibility of various subtypes of feature learning that emerge from different architectures, learning rules, and data properties. Furthermore, most existing approaches emphasize weight matrices or neural tangent kernels, limiting their applicability to neuroscience because they do not explicitly characterize representations. In this work, we introduce an analysis framework based on representational geometry to study feature learning. Instead of analyzing what are the learned features, we focus on characterizing how task-relevant representational manifolds evolve during the learning process. In both theory and experiment, we find that when a network learns features useful for solving a task, the task-relevant manifolds become increasingly untangled. Moreover, by tracking changes in the underlying manifold geometry, we uncover distinct learning stages throughout training, as well as different learning strategies associated with training hyperparameters, uncovering subtypes of feature learning beyond the lazy-versus-rich dichotomy. Applying our method to neuroscience and machine learning, we gain geometric insights into the structural inductive biases of neural circuits solving cognitive tasks and the mechanisms underlying out-of-distribution generalization in image classification. Our framework provides a novel geometric perspective for understanding and quantifying feature learning in both artificial and biological neural networks.


The Geometry of Prompting: Unveiling Distinct Mechanisms of Task Adaptation in Language Models

arXiv.org Artificial Intelligence

Decoder-only language models have the ability to dynamically switch between various computational tasks based on input prompts. Despite many successful applications of prompting, there is very limited understanding of the internal mechanism behind such flexibility. In this work, we investigate how different prompting methods affect the geometry of representations in these models. Employing a framework grounded in statistical physics, we reveal that various prompting techniques, while achieving similar performance, operate through distinct representational mechanisms for task adaptation. Our analysis highlights the critical role of input distribution samples and label semantics in few-shot in-context learning. We also demonstrate evidence of synergistic and interfering interactions between different tasks on the representational level. Our work contributes to the theoretical understanding of large language models and lays the groundwork for developing more effective, representation-aware prompting strategies.


Nonlinear classification of neural manifolds with contextual information

arXiv.org Machine Learning

Understanding how neural systems efficiently process information through distributed representations is a fundamental challenge at the interface of neuroscience and machine learning. Recent approaches analyze the statistical and geometrical attributes of neural representations as population-level mechanistic descriptors of task implementation. In particular, manifold capacity has emerged as a promising framework linking population geometry to the separability of neural manifolds. However, this metric has been limited to linear readouts. Here, we propose a theoretical framework that overcomes this limitation by leveraging contextual input information. We derive an exact formula for the context-dependent capacity that depends on manifold geometry and context correlations, and validate it on synthetic and real data. Our framework's increased expressivity captures representation untanglement in deep networks at early stages of the layer hierarchy, previously inaccessible to analysis. As context-dependent nonlinearity is ubiquitous in neural systems, our data-driven and theoretically grounded approach promises to elucidate context-dependent computation across scales, datasets, and models.


Probing Biological and Artificial Neural Networks with Task-dependent Neural Manifolds

arXiv.org Artificial Intelligence

Recently, growth in our understanding of the computations performed in both biological and artificial neural networks has largely been driven by either low-level mechanistic studies or global normative approaches. However, concrete methodologies for bridging the gap between these levels of abstraction remain elusive. In this work, we investigate the internal mechanisms of neural networks through the lens of neural population geometry, aiming to provide understanding at an intermediate level of abstraction, as a way to bridge that gap. Utilizing manifold capacity theory (MCT) from statistical physics and manifold alignment analysis (MAA) from high-dimensional statistics, we probe the underlying organization of task-dependent manifolds in deep neural networks and macaque neural recordings. Specifically, we quantitatively characterize how different learning objectives lead to differences in the organizational strategies of these models and demonstrate how these geometric analyses are connected to the decodability of task-relevant information. These analyses present a strong direction for bridging mechanistic and normative theories in neural networks through neural population geometry, potentially opening up many future research avenues in both machine learning and neuroscience.