Goto

Collaborating Authors

 embedding 0


Pinpointing Attention-Causal Communication in Language Models

Neural Information Processing Systems

The attention mechanism plays a central role in the computations performed by transformer-based models, and understanding the reasons why heads attend to specific tokens can aid in interpretability of language models. Although considerable work has shown that models construct low-dimensional feature representations, little work has explicitly tied low-dimensional features to the attention mechanism itself. In this paper we work to bridge this gap by presenting methods for identifying attention-causal communication, meaning low-dimensional features that are written into and read from tokens, and that have a provable causal relationship to attention patterns. The starting point for our method is prior work [1-3] showing that model components make use of low dimensional communication channels that can be exposed by the singular vectors of QK matrices. Our contribution is to provide a rigorous and principled approach to finding those channels and isolating the attention-causal signals they contain. We show that by identifying those signals, we can perform prompt-specific circuit discovery in a single forward pass. Further, we show that signals can uncover unexplored mechanisms at work in the model, including a surprising degree of global coordination across attention heads.


Analyzing Cost-Sensitive Surrogate Losses via $\mathcal{H}$-calibration

arXiv.org Artificial Intelligence

This paper aims to understand whether machine learning models should be trained using cost-sensitive surrogates or cost-agnostic ones (e.g., cross-entropy). Analyzing this question through the lens of $\mathcal{H}$-calibration, we find that cost-sensitive surrogates can strictly outperform their cost-agnostic counterparts when learning small models under common distributional assumptions. Since these distributional assumptions are hard to verify in practice, we also show that cost-sensitive surrogates consistently outperform cost-agnostic surrogates on classification datasets from the UCI repository. Together, these make a strong case for using cost-sensitive surrogates in practice.


Generating Comprehensive Lithium Battery Charging Data with Generative AI

arXiv.org Artificial Intelligence

In optimizing performance and extending the lifespan of lithium batteries, accurate state prediction is pivotal. Traditional regression and classification methods have achieved some success in battery state prediction. However, the efficacy of these data-driven approaches heavily relies on the availability and quality of public datasets. Additionally, generating electrochemical data predominantly through battery experiments is a lengthy and costly process, making it challenging to acquire high-quality electrochemical data. This difficulty, coupled with data incompleteness, significantly impacts prediction accuracy. Addressing these challenges, this study introduces the End of Life (EOL) and Equivalent Cycle Life (ECL) as conditions for generative AI models. By integrating an embedding layer into the CVAE model, we developed the Refined Conditional Variational Autoencoder (RCVAE). Through preprocessing data into a quasi-video format, our study achieves an integrated synthesis of electrochemical data, including voltage, current, temperature, and charging capacity, which is then processed by the RCVAE model. Coupled with customized training and inference algorithms, this model can generate specific electrochemical data for EOL and ECL under supervised conditions. This method provides users with a comprehensive electrochemical dataset, pioneering a new research domain for the artificial synthesis of lithium battery data. Furthermore, based on the detailed synthetic data, various battery state indicators can be calculated, offering new perspectives and possibilities for lithium battery performance prediction.


Latent Multi-Criteria Ratings for Recommendations

arXiv.org Machine Learning

Multi-criteria recommender systems have been increasingly valuable for helping consumers identify the most relevant items based on different dimensions of user experiences. However, previously proposed multi-criteria models did not take into account latent embeddings generated from user reviews, which capture latent semantic relations between users and items. To address these concerns, we utilize variational autoencoders to map user reviews into latent embeddings, which are subsequently compressed into low-dimensional discrete vectors. The resulting compressed vectors constitute latent multi-criteria ratings that we use for the recommendation purposes via standard multi-criteria recommendation methods. We show that the proposed latent multi-criteria rating approach outperforms several baselines significantly and consistently across different datasets and performance evaluation measures.