Goto

Collaborating Authors

 feature analysis


Priors in Time: Missing Inductive Biases for Language Model Interpretability

Lubana, Ekdeep Singh, Rager, Can, Hindupur, Sai Sumedh R., Costa, Valerie, Tuckute, Greta, Patel, Oam, Murthy, Sonia Krishna, Fel, Thomas, Wurgaft, Daniel, Bigelow, Eric J., Lin, Johnny, Ba, Demba, Wattenberg, Martin, Viegas, Fernanda, Weber, Melanie, Mueller, Aaron

arXiv.org Artificial Intelligence

Recovering meaningful concepts from language model activations is a central aim of interpretability. While existing feature extraction methods aim to identify concepts that are independent directions, it is unclear if this assumption can capture the rich temporal structure of language. Specifically, via a Bayesian lens, we demonstrate that Sparse Autoencoders (SAEs) impose priors that assume independence of concepts across time, implying stationarity. Meanwhile, language model representations exhibit rich temporal dynamics, including systematic growth in conceptual dimensionality, context-dependent correlations, and pronounced non-stationarity, in direct conflict with the priors of SAEs. Taking inspiration from computational neuroscience, we introduce a new interpretability objective -- Temporal Feature Analysis -- which possesses a temporal inductive bias to decompose representations at a given time into two parts: a predictable component, which can be inferred from the context, and a residual component, which captures novel information unexplained by the context. Temporal Feature Analyzers correctly parse garden path sentences, identify event boundaries, and more broadly delineate abstract, slow-moving information from novel, fast-moving information, while existing SAEs show significant pitfalls in all the above tasks. Overall, our results underscore the need for inductive biases that match the data in designing robust interpretability tools.


How Deep is the Feature Analysis underlying Rapid Visual Categorization?

Neural Information Processing Systems

Rapid categorization paradigms have a long history in experimental psychology: Characterized by short presentation times and speeded behavioral responses, these tasks highlight the efficiency with which our visual system processes natural object categories. Previous studies have shown that feed-forward hierarchical models of the visual cortex provide a good fit to human visual decisions. At the same time, recent work in computer vision has demonstrated significant gains in object recognition accuracy with increasingly deep hierarchical architectures. But it is unclear how well these models account for human visual decisions and what they may reveal about the underlying brain processes. We have conducted a large-scale psychophysics study to assess the correlation between computational models and human behavioral responses on a rapid animal vs. non-animal categorization task. We considered visual representations of varying complexity by analyzing the output of different stages of processing in three state-of-the-art deep networks. We found that recognition accuracy increases with higher stages of visual processing (higher level stages indeed outperforming human participants on the same task) but that human decisions agree best with predictions from intermediate stages. Overall, these results suggest that human participants may rely on visual features of intermediate complexity and that the complexity of visual representations afforded by modern deep network models may exceed the complexity of those used by human participants during rapid categorization.


Attractor Network Dynamics Enable Preplay and Rapid Path Planning in Maze–like Environments

Dane S. Corneil, Wulfram Gerstner

Neural Information Processing Systems

Rodents navigating in a well-known environment can rapidly learn and revisit observed reward locations, often after a single trial. While the mechanism for rapid path planning is unknown, the CA3 region in the hippocampus plays an important role, and emerging evidence suggests that place cell activity during hippocam-pal "preplay" periods may trace out future goal-directed trajectories. Here, we show how a particular mapping of space allows for the immediate generation of trajectories between arbitrary start and goal locations in an environment, based only on the mapped representation of the goal. We show that this representation can be implemented in a neural attractor network model, resulting in bump-like activity profiles resembling those of the CA3 region of hippocampus. Neurons tend to locally excite neurons with similar place field centers, while inhibiting other neurons with distant place field centers, such that stable bumps of activity can form at arbitrary locations in the environment. The network is initialized to represent a point in the environment, then weakly stimulated with an input corresponding to an arbitrary goal location. We show that the resulting activity can be interpreted as a gradient ascent on the value function induced by a reward at the goal location. Indeed, in networks with large place fields, we show that the network properties cause the bump to move smoothly from its initial location to the goal, around obstacles or walls. Our results illustrate that an attractor network with hippocampal-like attributes may be important for rapid path planning.


Recursive State Inference for Linear PASFA

Rishi, Vishal

arXiv.org Artificial Intelligence

Recent probabilistic extensions to SFA learn effective representations for classification tasks. Notably, the Probabilistic Adaptive Slow Feature Analysis models the slow features as states in an ARMA process and estimate the model from the observations. However, there is a need to develop efficient methods to infer the states (slow features) from the observations and the model. In this paper, a recursive extension to the linear PASFA has been proposed. The proposed algorithm performs MMSE estimation of states evolving according to an ARMA process, given the observations and the model. Although current methods tackle this problem using Kalman filters after transforming the ARMA process into a state space model, the original states (or slow features) that form useful representations cannot be easily recovered. The proposed technique is evaluated on a synthetic dataset to demonstrate its correctness.



How Deep is the Feature Analysis underlying Rapid Visual Categorization?

Neural Information Processing Systems

Rapid categorization paradigms have a long history in experimental psychology: Characterized by short presentation times and speeded behavioral responses, these tasks highlight the efficiency with which our visual system processes natural object categories. Previous studies have shown that feed-forward hierarchical models of the visual cortex provide a good fit to human visual decisions. At the same time, recent work in computer vision has demonstrated significant gains in object recognition accuracy with increasingly deep hierarchical architectures. But it is unclear how well these models account for human visual decisions and what they may reveal about the underlying brain processes. We have conducted a large-scale psychophysics study to assess the correlation between computational models and human behavioral responses on a rapid animal vs. non-animal categorization task. We considered visual representations of varying complexity by analyzing the output of different stages of processing in three state-of-the-art deep networks.


UKTA: Unified Korean Text Analyzer

Ahn, Seokho, Park, Junhyung, Go, Ganghee, Kim, Chulhui, Jung, Jiho, Shin, Myung Sun, Kim, Do-Guk, Seo, Young-Duk

arXiv.org Artificial Intelligence

High-level, abstract evaluation results should be interpretable by humans, who need to understand Evaluating writing quality is complex and time-consuming often the reason behind the scores and the features that influenced the delaying feedback to learners. While automated writing evaluation results. Providing this explainability to users is crucial for ensuring tools are effective for English, Korean automated writing evaluation reliability, as these tools have the potential to make mistakes; tools face challenges due to their inability to address multi-view Unfortunately, existing Korean text analyzers [16, 18, 20] and automated analysis, error propagation, and evaluation explainability. To overcome writing evaluation tools [21, 37] do not fully meet all these these challenges, we introduce UKTA (Unified Korean Text requirements, limiting their practical use. Analyzer), a comprehensive Korea text analysis and writing evaluation To address the research gap, we introduce UKTA (Unified Korean system. UKTA provides accurate low-level morpheme analysis, Text Analyzer), a comprehensive Korean text analysis system for key lexical features for mid-level explainability, and transparent evaluating Korean writing. First, we provide accurate low-level analysis high-level rubric-based writing scores. Our approach enhances based on state-of-the-art Korean morpheme analyzer, which accuracy and quadratic weighted kappa over existing baseline, positioning minimizes error propagation. In addition to morpheme analysis, we UKTA as a leading multi-perspective tool for Korean text categorize and provide key features, such as lexical richness and analysis and writing evaluation.


Integrated feature analysis for deep learning interpretation and class activation maps

Li, Yanli, Hassanzadeh, Tahereh, Shamonin, Denis P., Reijnierse, Monique, Mil, Annette H. M. van der Helm-van, Stoel, Berend C.

arXiv.org Artificial Intelligence

Understanding the decisions of deep learning (DL) models is essential for the acceptance of DL to risk-sensitive applications. Although methods, like class activation maps (CAMs), give a glimpse into the black box, they do miss some crucial information, thereby limiting its interpretability and merely providing the considered locations of objects. To provide more insight into the models and the influence of datasets, we propose an integrated feature analysis method, which consists of feature distribution analysis and feature decomposition, to look closer into the intermediate features extracted by DL models. This integrated feature analysis could provide information on overfitting, confounders, outliers in datasets, model redundancies and principal features extracted by the models, and provide distribution information to form a common intensity scale, which are missing in current CAM algorithms. The integrated feature analysis was applied to eight different datasets for general validation: photographs of handwritten digits, two datasets of natural images and five medical datasets, including skin photography, ultrasound, CT, X-rays and MRIs. The method was evaluated by calculating the consistency between the CAMs average class activation levels and the logits of the model. Based on the eight datasets, the correlation coefficients through our method were all very close to 100%, and based on the feature decomposition, 5%-25% of features could generate equally informative saliency maps and obtain the same model performances as using all features. This proves the reliability of the integrated feature analysis. As the proposed methods rely on very few assumptions, this is a step towards better model interpretation and a useful extension to existing CAM algorithms. Codes: https://github.com/YanliLi27/IFA


Exploring Gender-Specific Speech Patterns in Automatic Suicide Risk Assessment

Gerczuk, Maurice, Amiriparian, Shahin, Lutz, Justina, Strube, Wolfgang, Papazova, Irina, Hasan, Alkomiet, Schuller, Björn W.

arXiv.org Artificial Intelligence

In emergency medicine, timely intervention for patients at risk of suicide is often hindered by delayed access to specialised psychiatric care. To bridge this gap, we introduce a speech-based approach for automatic suicide risk assessment. Our study involves a novel dataset comprising speech recordings of 20 patients who read neutral texts. We extract four speech representations encompassing interpretable and deep features. Further, we explore the impact of gender-based modelling and phrase-level normalisation. By applying gender-exclusive modelling, features extracted from an emotion fine-tuned wav2vec2.0 model can be utilised to discriminate high- from low- suicide risk with a balanced accuracy of 81%. Finally, our analysis reveals a discrepancy in the relationship of speech characteristics and suicide risk between female and male subjects. For men in our dataset, suicide risk increases together with agitation while voice characteristics of female subjects point the other way.


Attractor Network Dynamics Enable Preplay and Rapid Path Planning in Maze-like Environments

Neural Information Processing Systems

Rodents navigating in a well-known environment can rapidly learn and revisit observed reward locations, often after a single trial. While the mechanism for rapid path planning is unknown, the CA3 region in the hippocampus plays an important role, and emerging evidence suggests that place cell activity during hippocampal "preplay" periods may trace out future goal-directed trajectories. Here, we show how a particular mapping of space allows for the immediate generation of trajectories between arbitrary start and goal locations in an environment, based only on the mapped representation of the goal. We show that this representation can be implemented in a neural attractor network model, resulting in bump-like activity profiles resembling those of the CA3 region of hippocampus. Neurons tend to locally excite neurons with similar place field centers, while inhibiting other neurons with distant place field centers, such that stable bumps of activity can form at arbitrary locations in the environment. The network is initialized to represent a point in the environment, then weakly stimulated with an input corresponding to an arbitrary goal location. We show that the resulting activity can be interpreted as a gradient ascent on the value function induced by a reward at the goal location. Indeed, in networks with large place fields, we show that the network properties cause the bump to move smoothly from its initial location to the goal, around obstacles or walls. Our results illustrate that an attractor network with hippocampal-like attributes may be important for rapid path planning.