AITopics | transition matrix

Collaborating Authors

transition matrix

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Structured Sparse Transition Matrices to Enable State Tracking in State-Space Models

Neural Information Processing SystemsJun-30-2026, 08:10:46 GMT

Modern state-space models (SSMs) often utilize structured transition matrices which enable efficient computation but pose restrictions on the model's expressivity, as measured in terms of the ability to emulate finite-state automata (FSA). While unstructured transition matrices are optimal in terms of expressivity, they come at a prohibitively high compute and memory cost, even for moderate state sizes. We propose a structured sparse parametrization of transition matrices in SSMs that enables FSA state tracking with provably optimal state size and depth, while keeping the computational cost of the recurrence comparable to that of diagonal SSMs.

artificial intelligence, proceedings, transition matrix, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.60)

Add feedback

On Group Sufficiency Under Label Bias

Neural Information Processing SystemsJun-22-2026, 06:07:44 GMT

Real-world classification datasets often contain label bias, where observed labels differ systematically from the true labels at different rates for different demographic groups. Machine learning models trained on such datasets may then exhibit disparities in predictive performance across these groups. In this work, we characterize the problem of learning fair classification models with respect to the underlying ground truth labels when given only label biased data. We focus on the particular fairness definition of group sufficiency, i.e. equal calibration of risk scores across protected groups. We theoretically show that enforcing fairness with respect to label biased data necessarily results in group miscalibration with respect to the true labels. We then propose a regularizer which minimizes an upper bound on the sufficiency gap by penalizing a conditional mutual information term. Across experiments on eight tabular, image, and text datasets with both synthetic and real label noise, we find that our method reduces the sufficiency gap by up to 7.2% with no significant decrease in overall accuracy.

artificial intelligence, dataset, machine learning, (15 more...)

Neural Information Processing Systems

Country: North America > United States (0.27)

Genre: Research Report > Experimental Study > Negative Result (0.34)

Industry:

Health & Medicine (1.00)
Education > Educational Setting (1.00)
Information Technology (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Universal Sequence Preconditioning

Neural Information Processing SystemsJun-20-2026, 12:32:09 GMT

We study the problem of preconditioning in sequential prediction. From the theoretical lens of linear dynamical systems, we show that convolving the target sequence corresponds to applying a polynomial to the hidden transition matrix. Building on this insight, we propose a universal preconditioning method that convolves the target with coefficients from orthogonal polynomials such as Chebyshev or Legendre. We prove that this approach reduces regret for two distinct prediction algorithms and yields the first ever sublinear and hidden-dimension-independent regret bounds (up to logarithmic factors) that hold for systems with marginally stable and asymmetric transition matrices. Finally, extensive synthetic and realworld experiments show that this simple preconditioning strategy improves the performance of a diverse range of algorithms, including recurrent neural networks, and generalizes to signals beyond linear dynamical systems.

artificial intelligence, coefficient, machine learning, (15 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (0.48)
Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Structured Sparse Transition Matrices to Enable State Tracking in State-Space Models

Neural Information Processing SystemsJun-18-2026, 16:27:27 GMT

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

North America > United States (0.45)
Europe (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

Bi-linearFactored/Block Diag.Bi-linearComplex DiagonalReal DiagonalPositive DiagonalParityArbitraryState MachinesModular AdditionAbelian Groups(e.g., Mamba)

Neural Information Processing SystemsJun-17-2026, 21:53:00 GMT

The role of hidden units in recurrent neural networks is typically seen as modeling memory, with research focusing on enhancing information retention through gating mechanisms. A less explored perspective views hidden units as active participants in the computation performed by the network, rather than passive memory stores. In this work, we revisit bilinear operations, which involve multiplicative interactions between hidden units and input embeddings. We demonstrate theoretically and empirically that they constitute a natural inductive bias for representing the evolution of hidden states in state tracking tasks. These are the simplest type of tasks that require hidden units to actively contribute to the behavior of the network. We also show that bilinear state updates form a natural hierarchy corresponding to state tracking tasks of increasing complexity, with popular linear recurrent networks such as Mamba residing at the lowest-complexity center of that hierarchy.

bilinear model, large language model, machine learning, (17 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Universal Sequence Preconditioning

Neural Information Processing SystemsJun-13-2026, 11:56:38 GMT

We study the problem of preconditioning in the setting of sequential prediction. From the theoretical lens of linear dynamical systems, we show that applying a convolution to the input sequence translates to applying a polynomial to the unknown transition matrix in the hidden space. With this insight, we develop a novel preconditioning method that convolves the input sequence with the coefficients of the Chebyshev or Legendre polynomials. We formally prove that this improves the regret of a wide family of prediction methods. We proceed to apply this preconditioning technique to the method of spectral filtering. This gives the first sublinear regret bound that is also hidden-dimension free (up to logarithmic factors) even when the hidden transition matrix is asymmetric. From rigorous experiments on synthetic data we show that our simple preconditioning method generalizes to both 1) settings where the data is \emph{not} from a linear dynamical system and 2) a broad range of learning algorithms, including recurrent neural networks.

artificial intelligence, machine learning, universal sequence preconditioning annie marsden, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.61)

Add feedback

Mamba Modulation: On the Length Generalization of Mamba Models

Neural Information Processing SystemsJun-10-2026, 21:58:12 GMT

The quadratic complexity of the attention mechanism in Transformer models has motivated the development of alternative architectures with sub-quadratic scaling, such as state-space models. Among these, Mamba has emerged as a leading architecture, achieving state-of-the-art results across a range of language modeling tasks. However, Mamba's performance significantly deteriorates when applied to contexts longer than those seen during pre-training, revealing a sharp sensitivity to context length extension. Through detailed analysis, we attribute this limitation to the out-of-distribution behavior of its state-space dynamics, particularly within the parameterization of the state transition matrix $A$. Unlike recent works which attribute this sensitivity to the vanished accumulation of discretization time steps, $\exp(-\sum_{t=1}^N{\Delta}_t)$, we establish a connection between state convergence behavior as the input length approaches infinity and the spectrum of the transition matrix $A$, offering a well-founded explanation of its role in length extension. Next, to overcome this challenge, we propose an approach that applies spectrum scaling to pre-trained Mamba models to enable robust long-context generalization by selectively modulating the spectrum of $A$ matrices in each layer. We show that this can significantly improve performance in settings where simply modulating ${\Delta}_t$ fails, validating our insights and providing avenues for better length generalization of state-space models with structured transition matrices.

artificial intelligence, machine learning, proceedings, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.39)

Add feedback

Empirical Transfer Operators and Finite-Sample Change Detection for Noisy Expanding Interval Maps

Rajput, Aparna

arXiv.org Machine LearningJun-8-2026

We study a finite-sample change-detection problem for one-dimensional noisy dynamical systems using partition-based empirical approximations of stationary behaviour. Given observations from an interval-valued process, we partition the state space into finitely many intervals and estimate a transition matrix from observed transitions between partition elements. After a small Doeblin-type regularisation, the resulting matrix has a unique stationary distribution. This stationary distribution is used as a finite-dimensional approximation of the invariant density, or stationary law, of the observed regime. Using an initial reference segment, we compute a baseline empirical stationary distribution bπ0,ρ. For each subsequent sliding window, we compute a window-based empirical stationary distribution bπt,ρ and define the score St = bπt,ρ bπ0,ρ 1. Large values of St indicate that the stationary behaviour of the observed regime has changed relative to the baseline. The statistic is therefore a detector of changes in stationary behaviour. It is not, by itself, a detector of all possible changes in transition dynamics that preserve the invariant density.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

2606.06785

Country: North America (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining (0.68)

Add feedback

Finite Time Adaptive Stabilization of LQ Systems

Faradonbeh, Mohamad Kazem Shirani, Tewari, Ambuj, Michailidis, George

arXiv.org Machine LearningJun-4-2026

Stabilization of linear systems with unknown dynamics is a canonical problem in adaptive control. Since the lack of knowledge of system parameters can cause it to become destabilized, an adaptive stabilization procedure is needed prior to regulation. Therefore, the adaptive stabilization needs to be completed in finite time. In order to achieve this goal, asymptotic approaches are not very helpful. There are only a few existing non-asymptotic results and a full treatment of the problem is not currently available. In this work, leveraging the novel method of random linear feedbacks, we establish high probability guarantees for finite time stabilization. Our results hold for remarkably general settings because we carefully choose a minimal set of assumptions. These include stabilizability of the underlying system and restricting the degree of heaviness of the noise distribution. To derive our results, we also introduce a number of new concepts and technical tools to address regularity and instability of the closed-loop matrix.

artificial intelligence, machine learning, matrix, (17 more...)

arXiv.org Machine Learning

1807.0912

Genre: