Goto

Collaborating Authors

 different state



Scaling Laws for State Dynamics in Large Language Models

arXiv.org Artificial Intelligence

Large Language Models (LLMs) are increasingly used in tasks requiring internal state tracking, yet their ability to model state transition dynamics remains poorly understood. We evaluate how well LLMs capture deterministic state dynamics across 3 domains: Box Tracking, Abstract DFA Sequences, and Complex Text Games, each formalizable as a finite-state system. Across tasks, we find that next-state prediction accuracy degrades with increasing state-space size and sparse transitions. GPT-2 XL reaches about 70% accuracy in low-complexity settings but drops below 30% when the number of boxes or states exceeds 5 or 10, respectively. In DFA tasks, Pythia-1B fails to exceed 50% accuracy when the number of states is > 10 and transitions are < 30. Through activation patching, we identify attention heads responsible for propagating state information: GPT-2 XL Layer 22 Head 20, and Pythia-1B Heads at Layers 10, 11, 12, and 14. While these heads successfully move relevant state features, action information is not reliably routed to the final token, indicating weak joint state-action reasoning. Our results suggest that state tracking in LLMs emerges from distributed interactions of next-token heads rather than explicit symbolic computation.


Rhythmic sharing: A bio-inspired paradigm for zero-shot adaptation and learning in neural networks

arXiv.org Artificial Intelligence

The brain can rapidly adapt to new contexts and learn from limited data, a coveted characteristic that artificial intelligence algorithms have struggled to mimic. Inspired by oscillatory rhythms of the mechanical structures of neural cells, we developed a learning paradigm that is based on oscillations in link strengths and associates learning with the coordination of these oscillations. We find that this paradigm yields rapid adaptation and learning in artificial neural networks. Link oscillations can rapidly change coordination, endowing the network with the ability to sense subtle context changes in an unsupervised manner. In other words, the network generates the missing contextual tokens required to perform as a generalist AI architecture capable of predicting dynamics in multiple contexts. Oscillations also allow the network to extrapolate dynamics to never-seen-before contexts. These capabilities make our learning paradigm a powerful starting point for novel models of learning and cognition. Furthermore, learning through link coordination is agnostic to the specifics of the neural network architecture, hence our study opens the door for introducing rapid adaptation and learning capabilities into leading AI models.


Bug Analysis Towards Bug Resolution Time Prediction

arXiv.org Artificial Intelligence

Bugs are inevitable in software development, and their reporting in open repositories can enhance software transparency and reliability assessment. This study aims to extract information from the issue tracking system Jira and proposes a methodology to estimate resolution time for new bugs. The methodology is applied to network project ONAP, addressing concerns of network operators and manufacturers. This research provides insights into bug resolution times and related aspects in network softwarization projects.


Codebook Features: Sparse and Discrete Interpretability for Neural Networks

arXiv.org Artificial Intelligence

Understanding neural networks is challenging in part because of the dense, continuous nature of their hidden states. We explore whether we can train neural networks to have hidden states that are sparse, discrete, and more interpretable by quantizing their continuous features into what we call codebook features. Codebook features are produced by finetuning neural networks with vector quantization bottlenecks at each layer, producing a network whose hidden features are the sum of a small number of discrete vector codes chosen from a larger codebook. Surprisingly, we find that neural networks can operate under this extreme bottleneck with only modest degradation in performance. This sparse, discrete bottleneck also provides an intuitive way of controlling neural network behavior: first, find codes that activate when the desired behavior is present, then activate those same codes during generation to elicit that behavior. We validate our approach by training codebook Transformers on several different datasets. First, we explore a finite state machine dataset with far more hidden states than neurons. In this setting, our approach overcomes the superposition problem by assigning states to distinct codes, and we find that we can make the neural network behave as if it is in a different state by activating the code for that state. Second, we train Transformer language models with up to 410M parameters on two natural language datasets. We identify codes in these models representing diverse, disentangled concepts (ranging from negative emotions to months of the year) and find that we can guide the model to generate different topics by activating the appropriate codes during inference. Overall, codebook features appear to be a promising unit of analysis and control for neural networks and interpretability. Our codebase and models are open-sourced at https://github.com/taufeeque9/codebook-features.


On the Detection and Quantification of Nonlinearity via Statistics of the Gradients of a Black-Box Model

arXiv.org Artificial Intelligence

Detection and identification of nonlinearity is a task of high importance for structural dynamics. Detecting nonlinearity in a structure, which has been designed to operate in its linear region, might indicate the existence of damage. Therefore, it is important, even for safety reasons, to detect when a structure exhibits nonlinear behaviour. In the current work, a method to detect nonlinearity is proposed, based on the distribution of the gradients of a data-driven model, which is fitted on data acquired from the structure of interest. The data-driven model herein is a neural network. The selection of such a type of model was done in order to not allow the user to decide how linear or nonlinear the model shall be, but to let the training algorithm of the neural network shape the level of nonlinearity according to the training data. The neural network is trained to predict the accelerations of the structure for a time-instant using as inputs accelerations of previous time-instants, i.e. one-step-ahead predictions. Afterwards, the gradients of the output of the neural network with respect to its inputs are calculated. Given that the structure is linear, the distribution of the aforementioned gradients should be quite peaked, while in the case of a structure with nonlinearities, the distribution of the gradients shall be more spread and, potentially, multimodal. To test the above assumption, data from an experimental structure are considered. The structure is tested under different scenarios, some of which are linear and some nonlinear. The statistics of the distributions of the gradients for the different scenarios can be used to identify cases where nonlinearity is present. Moreover, via the proposed method one is able to quantify the nonlinearity by observing higher values of standard deviation of the distribution of the gradients for "more nonlinear" scenarios.


Fine-grained visual recognition for mobile AR technical support

#artificialintelligence

When a hardware-related system disruption like an outage due to hard drive failure happens, the path to recovery includes checking hardware support information, describing the problem to a support representative, waiting for a field technician to arrive, hoping the technician can resolve the issue in a timely manner. Our team of researchers recently published paper "Fine-Grained Visual Recognition in Mobile Augmented Reality for Technical Support," in IEEE ISMAR 2020[1, 2], which outlines an augmented reality (AR) solution that our colleagues in IBM Technology Support Services (TSS) use to increase the rate of first-time fixes and reduce the mean time to recovery from a hardware disruption. "The most recent industry surveys have shown that the average enterprise estimates that there is an impact of approximately $8,851 for every minute of unplanned downtime in their primary computing environment." By displaying guidance over the physical environment, augmented reality support uses visual guidance to drastically reduce the effort needed to relay instructions, the number of errors and even the time required to look up service information. Technical support service providers typically maintain tens of thousands of products in order to meet the needs of their clients.


Predicting COVID-19 With Machine Learning

#artificialintelligence

Predicting COVID-19 in India using Machine Learning.In this session, we will take a COVID-19 dataset and understand how the disease has spread across different states in India. We will perform some data manipulation and data visualization operations on top of the dataset. Great Learning brings you this live session on'Predicting COVID-19 in India using Machine Learning'.In this session, we will take a COVID-19 dataset and understand how the disease has spread across different states in India. We will perform some data manipulation and data visualization operations on top of the dataset.


Arena Model: Inference About Competitions

arXiv.org Machine Learning

The authors propose a parametric model called the arena model for prediction in paired competitions, i.e. paired comparisons with eliminations and bifurcations. The arena model has a number of appealing advantages. First, it predicts the results of competitions without rating many individuals. Second, it takes full advantage of the structure of competitions. Third, the model provides an easy method to quantify the uncertainty in competitions. Fourth, some of our methods can be directly generalized for comparisons among three or more individuals. Furthermore, the authors identify an invariant Bayes estimator with regard to the prior distribution and prove the consistency of the estimations of uncertainty. Currently, the arena model is not effective in tracking the change of strengths of individuals, but its basic framework provides a solid foundation for future study of such cases.


Reinforcement Learning from scratch – Insight Data

#artificialintelligence

Recently, I gave a talk at the O'Reilly AI conference in Beijing about some of the interesting lessons we've learned in the world of NLP. While there, I was lucky enough to attend a tutorial on Deep Reinforcement Learning (Deep RL) from scratch by Unity Technologies. I thought that the session, led by Arthur Juliani, was extremely informative and wanted to share some big takeaways below. In our conversations with companies, we've seen a rise of interesting Deep RL applications, tools and results. In parallel, the inner workings and applications of Deep RL, such as AlphaGo pictured above, can often seem esoteric and hard to understand.