Goto

Collaborating Authors

 feature 0


A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders

Neural Information Processing Systems

As we increase the number of features in the SAE, hierarchical features tend to split into finer features ("math" may split into "algebra", "geometry", etc.), a phenomenon referred to as feature splitting. However, we show that sparse decomposition and splitting of hierarchical features is not robust. Specifically, we show that seemingly monosemantic features fail to fire where they should, and instead get "absorbed" into their children features. We coin this phenomenon feature absorption, and show that it is caused by optimizing for sparsity in SAEs whenever the underlying features form a hierarchy. We introduce a metric to detect absorption in SAEs, and validate our findings empirically on hundreds of LLM SAEs. Our investigation suggests that varying SAE sizes or sparsity is insufficient to solve this issue. We discuss the implications of feature absorption in SAEs and some potential approaches to solve the fundamental theoretical issues before SAEs can be used for interpreting LLMs robustly and at scale.



SHAP-Based Supervised Clustering for Sample Classification and the Generalized Waterfall Plot

arXiv.org Machine Learning

In this growing age of data and technology, large black-box models are becoming the norm due to their ability to handle vast amounts of data and learn incredibly complex input-output relationships. The deficiency of these methods, however, is their inability to explain the prediction process, making them untrustworthy and their use precarious in high-stakes situations. SHapley Additive exPlanations (SHAP) analysis is an explainable AI method growing in popularity for its ability to explain model predictions in terms of the original features. For each sample and feature in the data set, we associate a SHAP value that quantifies the contribution of that feature to the prediction of that sample. Clustering these SHAP values can provide insight into the data by grouping samples that not only received the same prediction, but received the same prediction for similar reasons. In doing so, we map the various pathways through which distinct samples arrive at the same prediction. To showcase this methodology, we present a simulated experiment in addition to a case study in Alzheimer's disease using data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. We also present a novel generalization of the waterfall plot for multi-classification.


A The Estimator null A X W)

Neural Information Processing Systems

A.2 Proof of Theorem 1 To prove Theorem 1, we assume that G Proof of Lemma 1. Let's first rewrite Equation (4) as null null By Lemma 1, linearity of expectation and knowing that each RWT is independent from the other tours by the Strong Markov Property, Theorem 1 holds. MHM-GNN can recover edge-based models where representations don't use graph-wide However, on Rent the Runway we see the raw features achieving the highest performance. That is, structural information does not seem to be relevant to this specific task. All hyperparameters were chosen to minimize training loss. For k = 5, we used a minibatch of size 5 in all datasets.


Application and Evaluation of Large Language Models for Forecasting the Impact of Traffic Incidents

arXiv.org Artificial Intelligence

This study examines the feasibility of applying large language models (LLMs) for forecasting the impact of traffic incident s on the traffic flow. The use of LLMs for this task has several advantages over existing machine learning - based solutions such as not requiring a large training dataset and the ability to utilize free - text incident logs . We propose a fully LLM - based solution that predicts the incident impact using a combination of traffic features and LLM - extracted incident features. A key ingredient of this solution is an effective method of select ing examples for the LLM's in - context learning. We evaluate the performance of three advanced LLMs and two state - of - the - art machine learning models on a real traffic incident dataset . The results show that the best - performing LLM matches the accuracy of the most accurate machine learning model, despite the former not having been trained on this prediction task. The findings indicate that LLMs are a practically viable option for traffic incident impact prediction.


Applying Large Language Models to Issue Classification: Revisiting with Extended Data and New Models

arXiv.org Artificial Intelligence

Effective prioritization of issue reports in software engineering helps to optimize resource allocation and information recovery. However, manual issue classification is laborious and lacks scalability. As an alternative, many open source software (OSS) projects employ automated processes for this task, yet this method often relies on large datasets for adequate training. Traditionally, machine learning techniques have been used for issue classification. More recently, large language models (LLMs) have emerged as powerful tools for addressing a range of software engineering challenges, including code and test generation, mapping new requirements to legacy software endpoints, and conducting code reviews. The following research investigates an automated approach to issue classification based on LLMs. By leveraging the capabilities of such models, we aim to develop a robust system for prioritizing issue reports, mitigating the necessity for extensive training data while also maintaining reliability in classification. In our research, we developed an LLM-based approach for accurately labeling issues by selecting two of the most prominent large language models. We then compared their performance across multiple datasets. Our findings show that GPT-4o achieved the best results in classifying issues from the NLBSE 2024 competition. Moreover, GPT-4o outperformed DeepSeek R1, achieving an F1 score 20% higher when both models were trained on the same dataset from the NLBSE 2023 competition, which was ten times larger than the NLBSE 2024 dataset. The fine-tuned GPT-4o model attained an average F1 score of 80.7%, while the fine-tuned DeepSeek R1 model achieved 59.33%. Increasing the dataset size did not improve the F1 score, reducing the dependence on massive datasets for building an efficient solution to issue classification.


Sparse Autoencoder Features for Classifications and Transferability

arXiv.org Artificial Intelligence

Sparse Autoencoders (SAEs) provide potentials for uncovering structured, human-interpretable representations in Large Language Models (LLMs), making them a crucial tool for transparent and controllable AI systems. We systematically analyze SAE for interpretable feature extraction from LLMs in safety-critical classification tasks. Our framework evaluates (1) model-layer selection and scaling properties, (2) SAE architectural configurations, including width and pooling strategies, and (3) the effect of binarizing continuous SAE activations. SAE-derived features achieve macro F1 > 0.8, outperforming hidden-state and BoW baselines while demonstrating cross-model transfer from Gemma 2 2B to 9B-IT models. These features generalize in a zero-shot manner to cross-lingual toxicity detection and visual classification tasks. Our analysis highlights the significant impact of pooling strategies and binarization thresholds, showing that binarization offers an efficient alternative to traditional feature selection while maintaining or improving performance. These findings establish new best practices for SAE-based interpretability and enable scalable, transparent deployment of LLMs in real-world applications. Full repo: https://github.com/shan23chen/MOSAIC.


SPOCK 2.0: Update to the FeatureClassifier in the Stability of Planetary Orbital Configurations Klassifier

arXiv.org Artificial Intelligence

ABSTRACT The Stability of Planetary Orbital Configurations Klassifier (SPOCK) package collects machine learning models for predicting the stability and collisional evolution of compact planetary systems. In this paper we explore improvements to SPOCK's binary stability classifier (FeatureClassifier), which predicts orbital stability by collecting data over a short N-body integration of a system. We additionally discovered that 10% of N-body integrations in SPOCK's original training dataset were duplicated by accident, and that < 1% were misclassified as stable when they in fact led to ejections. We provide a cleaned dataset of 100,000+ unique integrations, release a newly trained stability classification model, and make minor updates to the API. INTRODUCTION clude systems that go unstable during the short integration phase; which slightly reduces the model AUC Determining orbital stability over planetary systems' from 0.9527 to 0.9490 (an AUC of 1 would be a perfect typical lifetimes of several Gyr through direct numerical model).


Case Study: Leveraging GenAI to Build AI-based Surrogates and Regressors for Modeling Radio Frequency Heating in Fusion Energy Science

arXiv.org Artificial Intelligence

This work presents a detailed case study on using Generative AI (GenAI) to develop AI surrogates for simulation models in fusion energy research. The scope includes the methodology, implementation, and results of using GenAI to assist in model development and optimization, comparing these results with previous manually developed models.


From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples

arXiv.org Artificial Intelligence

We analyze how well pre-trained large language models (e.g., Llama2, GPT-4, Claude 3, etc) can do linear and non-linear regression when given in-context examples, without any additional training or gradient updates. Our findings reveal that several large language models (e.g., GPT-4, Claude 3) are able to perform regression tasks with a performance rivaling (or even outperforming) that of traditional supervised methods such as Random Forest, Bagging, or Gradient Boosting. For example, on the challenging Friedman #2 regression dataset, Claude 3 outperforms many supervised methods such as AdaBoost, SVM, Random Forest, KNN, or Gradient Boosting. We then investigate how well the performance of large language models scales with the number of in-context exemplars. We borrow from the notion of regret from online learning and empirically show that LLMs are capable of obtaining a sub-linear regret.