AITopics | Markov Models

Collaborating Authors

Markov Models

News Overviews Instructional Materials AI-Alerts Classics

A Bayesian model for identifying hierarchically organised states in neural population activity

Patrick Putzky, Florian Franzen, Giacomo Bassetto, Jakob H. Macke

Neural Information Processing SystemsFeb-8-2025, 19:49:50 GMT

Neural population activity in cortical circuits is not solely driven by external inputs, but is also modulated by endogenous states which vary on multiple time-scales. To understand information processing in cortical circuits, we need to understand the statistical structure of internal states and their interaction with sensory inputs. Here, we present a statistical model for extracting hierarchically organised neural population states from multi-channel recordings of neural spiking activity. Population states are modelled using a hidden Markov decision tree with state-dependent tuning parameters and a generalised linear observation model. We present a variational Bayesian inference algorithm for estimating the posterior distribution over parameters from neural population recordings. On simulated data, we show that we can identify the underlying sequence of population states and reconstruct the ground truth parameters. Using population recordings from visual cortex, we find that a model with two levels of population states outperforms both a one-state and a two-state generalised linear model. Finally, we find that modelling of state-dependence also improves the accuracy with which sensory stimuli can be decoded from the population response.

artificial intelligence, bayesian inference, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.15)
North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Middle East > Jordan (0.05)
(2 more...)

Genre: Research Report (0.48)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.52)

Add feedback

Multilabel Structured Output Learning with Random Spanning Trees of Max-Margin Markov Networks

Mario Marchand, Hongyu Su, Emilie Morvant, Juho Rousu, John S. Shawe-Taylor

Neural Information Processing SystemsFeb-8-2025, 18:58:19 GMT

We show that the usual score function for conditional Markov networks can be written as the expectation over the scores of their spanning trees. We also show that a small random sample of these output trees can attain a significant fraction of the margin obtained by the complete graph and we provide conditions under which we can perform tractable inference. The experimental results confirm that practical learning is scalable to realistic datasets using this approach.

artificial intelligence, inference, machine learning, (16 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Finland > Uusimaa > Helsinki (0.04)
North America > United States > New York (0.04)
(6 more...)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Parallel Sampling of HDPs using Sub-Cluster Splits

Neural Information Processing SystemsFeb-8-2025, 18:31:38 GMT

We develop a sampling technique for Hierarchical Dirichlet process models. The parallel algorithm builds upon [1] by proposing large split and merge moves based on learned sub-clusters. The additional global split and merge moves drastically improve convergence in the experimental results. Furthermore, we discover that cross-validation techniques do not adequately determine convergence, and that previous sampling methods converge slower than were previously expected.

algorithm, convergence, equation, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
South America > Paraguay > Asunción > Asunción (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

Model-based Reinforcement Learning and the Eluder Dimension

Ian Osband, Benjamin Van Roy

Neural Information Processing SystemsFeb-8-2025, 18:06:04 GMT

We consider the problem of learning to optimize an unknown Markov decision process (MDP). We show that, if the MDP can be parameterized within some known function class, we can obtain regret bounds that scale with the dimensionality, rather than cardinality, of the system.

dimension, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Add feedback

Near-optimal Reinforcement Learning in Factored MDPs

Ian Osband, Benjamin Van Roy

Neural Information Processing SystemsFeb-8-2025, 17:11:03 GMT

Any reinforcement learning algorithm that applies to all Markov decision processes (MDPs) will su er (Ô SAT) regret on some MDP, where T is the elapsed time and S and A are the cardinalities of the state and action spaces. This implies T = (SA) time to guarantee a near-optimal policy. In many settings of practical interest, due to the curse of dimensionality, S and A can be so enormous that this learning time is unacceptable. We establish that, if the system is known to be a factored MDP, it is possible to achieve regret that scales polynomially in the number of parameters encoding the factored MDP, which may be exponentially smaller than S or A. We provide two algorithms that satisfy near-optimal regret bounds in this context: posterior sampling reinforcement learning (PSRL) and an upper confidence bound algorithm (UCRL-Factored).

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country: North America > United States > California > Santa Clara County > Palo Alto (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Recurrent Models of Visual Attention

Volodymyr Mnih, Nicolas Heess, Alex Graves, koray kavukcuoglu

Neural Information Processing SystemsFeb-8-2025, 16:24:23 GMT

Applying convolutional neural networks to large images is computationally expensive because the amount of computation scales linearly with the number of image pixels. We present a novel recurrent neural network model that is capable of extracting information from an image or video by adaptively selecting a sequence of regions or locations and only processing the selected regions at high resolution. Like convolutional neural networks, the proposed model has a degree of translation invariance built-in, but the amount of computation it performs can be controlled independently of the input image size. While the model is non-differentiable, it can be trained using reinforcement learning methods to learn task-specific policies. We evaluate our model on several image classification tasks, where it significantly outperforms a convolutional neural network baseline on cluttered images, and on a dynamic visual control problem, where it learns to track a simple object without an explicit training signal for doing so.

agent, convolutional network, information, (16 more...)

Neural Information Processing Systems

Country: North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Add feedback

Fast Bidirectional Probability Estimation in Markov Models

Neural Information Processing SystemsFeb-8-2025, 09:59:20 GMT

We develop a new bidirectional algorithm for estimating Markov chain multi-step transition probabilities: given a Markov chain, we want to estimate the probability of hitting a given target state in \ell steps after starting from a given source distribution. Given the target state t, we use a (reverse) local power iteration to construct an expanded target distribution', which has the same mean as the quantity we want to estimate, but a smaller variance -- this can then be sampled efficiently by a Monte Carlo algorithm. Our method extends to any Markov chain on a discrete (finite or countable) state-space, and can be extended to compute functions of multi-step transition probabilities such as PageRank, graph diffusions, hitting/return times, etc. Our main result is that in sparse' Markov Chains -- wherein the number of transitions between states is comparable to the number of states -- the running time of our algorithm for a uniform-random target node is order-wise smaller than Monte Carlo and power iteration based algorithms; in particular, our method can estimate a probability p using only O(1/\sqrt{p}) running time.

algorithm, fast bidirectional probability estimation, markov model, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Managing Geological Uncertainty in Critical Mineral Supply Chains: A POMDP Approach with Application to U.S. Lithium Resources

Arief, Mansur, Alonso, Yasmine, Oshiro, CJ, Xu, William, Corso, Anthony, Yin, David Zhen, Caers, Jef K., Kochenderfer, Mykel J.

arXiv.org Artificial IntelligenceFeb-8-2025

The world is entering an unprecedented period of critical mineral demand, driven by the global transition to renewable energy technologies and electric vehicles. This transition presents unique challenges in mineral resource development, particularly due to geological uncertainty-a key characteristic that traditional supply chain optimization approaches do not adequately address. To tackle this challenge, we propose a novel application of Partially Observable Markov Decision Processes (POMDPs) that optimizes critical mineral sourcing decisions while explicitly accounting for the dynamic nature of geological uncertainty. Through a case study of the U.S. lithium supply chain, we demonstrate that POMDP-based policies achieve superior outcomes compared to traditional approaches, especially when initial reserve estimates are imperfect. Our framework provides quantitative insights for balancing domestic resource development with international supply diversification, offering policymakers a systematic approach to strategic decision-making in critical mineral supply chains.

artificial intelligence, geological uncertainty, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2502.0569

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.46)

Industry:

Transportation (1.00)
Materials > Metals & Mining > Lithium (1.00)
Energy > Renewable (1.00)
Energy > Oil & Gas > Upstream (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Sequential Stochastic Combinatorial Optimization Using Hierarchal Reinforcement Learning

Feng, Xinsong, Yu, Zihan, Xiong, Yanhai, Chen, Haipeng

arXiv.org Artificial IntelligenceFeb-8-2025

Reinforcement learning (RL) has emerged as a promising tool for combinatorial optimization (CO) problems due to its ability to learn fast, effective, and generalizable solutions. Nonetheless, existing works mostly focus on one-shot deterministic CO, while sequential stochastic CO (SSCO) has rarely been studied despite its broad applications such as adaptive influence maximization (IM) and infectious disease intervention. In this paper, we study the SSCO problem where we first decide the budget (e.g., number of seed nodes in adaptive IM) allocation for all time steps, and then select a set of nodes for each time step. The few existing studies on SSCO simplify the problems by assuming a uniformly distributed budget allocation over the time horizon, yielding suboptimal solutions. We propose a generic hierarchical RL (HRL) framework called wake-sleep option (WS-option), a two-layer option-based framework that simultaneously decides adaptive budget allocation on the higher layer and node selection on the lower layer. WS-option starts with a coherent formulation of the two-layer Markov decision processes (MDPs), capturing the interdependencies between the two layers of decisions. Building on this, WS-option employs several innovative designs to balance the model's training stability and computational efficiency, preventing the vicious cyclic interference issue between the two layers. Empirical results show that WS-option exhibits significantly improved effectiveness and generalizability compared to traditional methods. Moreover, the learned model can be generalized to larger graphs, which significantly reduces the overhead of computational resources.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2502.05537

Country:

Europe > Netherlands > South Holland > Delft (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

On the Convergence and Stability of Upside-Down Reinforcement Learning, Goal-Conditioned Supervised Learning, and Online Decision Transformers

Štrupl, Miroslav, Szehr, Oleg, Faccio, Francesco, Ashley, Dylan R., Srivastava, Rupesh Kumar, Schmidhuber, Jürgen

arXiv.org Machine LearningFeb-8-2025

This article provides a rigorous analysis of convergence and stability of Episodic Upside-Down Reinforcement Learning, Goal-Conditioned Supervised Learning and Online Decision Transformers. These algorithms performed competitively across various benchmarks, from games to robotic tasks, but their theoretical understanding is limited to specific environmental conditions. This work initiates a theoretical foundation for algorithms that build on the broad paradigm of approaching reinforcement learning through supervised learning or sequence modeling. At the core of this investigation lies the analysis of conditions on the underlying environment, under which the algorithms can identify optimal solutions. We also assess whether emerging solutions remain stable in situations where the environment is subject to tiny levels of noise. Specifically, we study the continuity and asymptotic convergence of command-conditioned policies, values and the goal-reaching objective depending on the transition kernel of the underlying Markov Decision Process. We demonstrate that near-optimal behavior is achieved if the transition kernel is located in a sufficiently small neighborhood of a deterministic kernel. The mentioned quantities are continuous (with respect to a specific topology) at deterministic kernels, both asymptotically and after a finite number of learning cycles. The developed methods allow us to present the first explicit estimates on the convergence and stability of policies and values in terms of the underlying transition kernels. On the theoretical side we introduce a number of new concepts to reinforcement learning, like working in segment spaces, studying continuity in quotient topologies and the application of the fixed-point theory of dynamical systems. The theoretical study is accompanied by a detailed investigation of example environments and numerical experiments.

continuity, machine learning, reinforcement learning, (20 more...)

arXiv.org Machine Learning

2502.05672

Country:

Europe > Switzerland (0.04)
Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)
North America > United States > Oregon > Benton County > Corvallis (0.04)
(5 more...)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.45)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback