AITopics | Markov Models

Collaborating Authors

Markov Models

News Overviews Instructional Materials AI-Alerts Classics

Reviews: Maximum Expected Hitting Cost of a Markov Decision Process and Informativeness of Rewards

Neural Information Processing SystemsJun-1-2025, 23:53:49 GMT

The paper introduces a new complexity measure for MDPs, the expected hitting costs. In contrast to former complexity measures, the hitting costs also depend on the reward of the MDP and can provide a tighter bound for UCRL2. The theory also provides an intersting connection between reward shapeing and the complexity of a MDP. All reviewers appreciated the strong theoretical contribution of the paper which improves our theoretical understanding of the complexity of MDPs. The reviewers also liked that the paper is well written and establishes connections to reward shaping, a method that has also a highly practical value. All reviewers recommend acceptance and I agree with their assessment.

complexity measure, hitting cost, markov decision process and informativeness, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Decision Support Systems (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.40)

Add feedback

Synthesis of MCMC and Belief Propagation

Sung-Soo Ahn, Michael Chertkov, Jinwoo Shin

Neural Information Processing SystemsJun-1-2025, 23:27:25 GMT

Markov Chain Monte Carlo (MCMC) and Belief Propagation (BP) are the most popular algorithms for computational inference in Graphical Models (GM). In principle, MCMC is an exact probabilistic method which, however, often suffers from exponentially slow mixing. In contrast, BP is a deterministic method, which is typically fast, empirically very successful, however in general lacking control of accuracy over loopy graphs. In this paper, we introduce MCMC algorithms correcting the approximation error of BP, i.e., we provide a way to compensate for BP errors via a consecutive BP-aware MCMC. Our framework is based on the Loop Calculus approach which allows to express the BP error as a sum of weighted generalized loops. Although the full series is computationally intractable, it is known that a truncated series, summing up all 2-regular loops, is computable in polynomial-time for planar pair-wise binary GMs and it also provides a highly accurate approximation empirically. Motivated by this, we first propose a polynomial-time approximation MCMC scheme for the truncated series of general (non-planar) pair-wise binary models. Our main idea here is to use the Worm algorithm, known to provide fast mixing in other (related) problems, and then design an appropriate rejection scheme to sample 2-regular loops.

artificial intelligence, generalized loop, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States (0.46)
Europe (0.46)
Asia (0.28)

Industry: Energy > Oil & Gas (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Review for NeurIPS paper: Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition

Neural Information Processing SystemsMay-31-2025, 17:08:22 GMT

Weaknesses: Soundness of the claims: While the general ideas seem somewhat clear, most of the proofs read more like sketches and some of the claims need to be verified by hand. For example in the proof of Lemma 9 the authors do not provide a derivation for the each of the elements of the Hessian but merely state that the result follows from a direct computation (which seems to be left as an exercise to the reader). Other examples where more details would not hurt are the proof of Lemma 13, where the authors bound \ q_t - \tilde \q_t\ in a self-bounding way, to achieve the upper bound but no mention of this is given other than the final result and the proof of Lemma 27, where derivations on lines 720-722 were not entirely straightforward. Overall the Appendix can benefit from more details in the proofs. Significance and novelty of contribution: The core ideas of this work are not novel.

algorithm, learning stochastic, stochastic and adversarial episodic mdp, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.40)

Add feedback

Review for NeurIPS paper: Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition

Neural Information Processing SystemsMay-31-2025, 17:08:13 GMT

This paper was well-received by the reviewers and the author response was effective in addressing the concerns raised in the initial reviews. As a result, several reviewers updated their scores. The paper is clearly suitable for publication without significant changes.

learning stochastic, neurips paper, stochastic and adversarial episodic mdp, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.40)

Add feedback

Finite-Sample Convergence Bounds for Trust Region Policy Optimization in Mean-Field Games

Ocello, Antonio, Tiapkin, Daniil, Mancini, Lorenzo, Laurière, Mathieu, Moulines, Eric

arXiv.org Machine LearningMay-30-2025

We introduce Mean-Field Trust Region Policy Optimization (MF-TRPO), a novel algorithm designed to compute approximate Nash equilibria for ergodic Mean-Field Games (MFG) in finite state-action spaces. Building on the well-established performance of TRPO in the reinforcement learning (RL) setting, we extend its methodology to the MFG framework, leveraging its stability and robustness in policy optimization. Under standard assumptions in the MFG literature, we provide a rigorous analysis of MF-TRPO, establishing theoretical guarantees on its convergence. Our results cover both the exact formulation of the algorithm and its sample-based counterpart, where we derive high-probability guarantees and finite sample complexity. This work advances MFG optimization by bridging RL techniques with mean-field decision-making, offering a theoretically grounded approach to solving complex multi-agent problems.

data mining, machine learning, reinforcement learning, (20 more...)

arXiv.org Machine Learning

2505.22781

Country:

Asia > China > Shanghai > Shanghai (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Italy (0.04)
(4 more...)

Genre: Research Report > New Finding (0.87)

Industry: Energy (1.00)

Technology:

Information Technology > Mathematics of Computing (1.00)
Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
(4 more...)

Add feedback

A False Discovery Rate Control Method Using a Fully Connected Hidden Markov Random Field for Neuroimaging Data

Kim, Taehyo, Jia, Qiran, de Leon, Mony J., Shu, Hai

arXiv.org Machine LearningMay-30-2025

False discovery rate (FDR) control methods are essential for voxel-wise multiple testing in neuroimaging data analysis, where hundreds of thousands or even millions of tests are conducted to detect brain regions associated with disease-related changes. Classical FDR control methods (e.g., BH, q-value, and LocalFDR) assume independence among tests and often lead to high false non-discovery rates (FNR). Although various spatial FDR control methods have been developed to improve power, they still fall short of jointly addressing three major challenges in neuroimaging applications: capturing complex spatial dependencies, maintaining low variability in both false discovery proportion (FDP) and false non-discovery proportion (FNP) across replications, and achieving computational scalability for high-resolution data. To address these challenges, we propose fcHMRF-LIS, a powerful, stable, and scalable spatial FDR control method for voxel-wise multiple testing. It integrates the local index of significance (LIS)-based testing procedure with a novel fully connected hidden Markov random field (fcHMRF) designed to model complex spatial structures using a parsimonious parameterization. We develop an efficient expectation-maximization algorithm incorporating mean-field approximation, the Conditional Random Fields as Recurrent Neural Networks (CRF-RNN) technique, and permutohedral lattice filtering, reducing the time complexity from quadratic to linear in the number of tests. Extensive simulations demonstrate that fcHMRF-LIS achieves accurate FDR control, lower FNR, reduced variability in FDP and FNP, and a higher number of true positives compared to existing methods. Applied to an FDG-PET dataset from the Alzheimer's Disease Neuroimaging Initiative, fcHMRF-LIS identifies neurobiologically relevant brain regions and offers notable advantages in computational efficiency.

artificial intelligence, fdr control method, machine learning, (18 more...)

arXiv.org Machine Learning

2505.20688

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > Canada (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.47)

Industry:

Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Signal attenuation enables scalable decentralized multi-agent reinforcement learning over networks

Suttle, Wesley A, Sharma, Vipul K, Sadler, Brian M

arXiv.org Artificial IntelligenceMay-30-2025

Multi-agent reinforcement learning (MARL) methods typically require that agents enjoy global state observability, preventing development of decentralized algorithms and limiting scalability. Recent work has shown that, under assumptions on decaying inter-agent influence, global observability can be replaced by local neighborhood observability at each agent, enabling decentralization and scalability. Real-world applications enjoying such decay properties remain underexplored, however, despite the fact that signal power decay, or signal attenuation, due to path loss is an intrinsic feature of many problems in wireless communications and radar networks. In this paper, we show that signal attenuation enables decentralization in MARL by considering the illustrative special case of performing power allocation for target detection in a radar network. To achieve this, we propose two new constrained multi-agent Markov decision process formulations of this power allocation problem, derive local neighborhood approximations for global value function and policy gradient estimates and establish corresponding error bounds, and develop decentralized saddle point policy gradient algorithms for solving the proposed problems. Our approach, though oriented towards the specific radar network problem we consider, provides a useful model for extensions to additional problems in wireless communications and radar networks.

machine learning, radar network, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2505.11461

Country: North America > United States (0.93)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Self-orthogonalizing attractor neural networks emerging from the free energy principle

Spisak, Tamas, Friston, Karl

arXiv.org Artificial IntelligenceMay-30-2025

Attractor dynamics are a hallmark of many complex systems, including the brain. Understanding how such self-organizing dynamics emerge from first principles is crucial for advancing our understanding of neuronal computations and the design of artificial intelligence systems. Here we formalize how attractor networks emerge from the free energy principle applied to a universal partitioning of random dynamical systems. Our approach obviates the need for explicitly imposed learning and inference rules and identifies emergent, but efficient and biologically plausible inference and learning dynamics for such self-organizing systems. These result in a collective, multi-level Bayesian active inference process. Attractors on the free energy landscape encode prior beliefs; inference integrates sensory data into posterior beliefs; and learning fine-tunes couplings to minimize long-term surprise. Analytically and via simulations, we establish that the proposed networks favor approximately orthogonalized attractor representations, a consequence of simultaneously optimizing predictive accuracy and model complexity. These attractors efficiently span the input subspace, enhancing generalization and the mutual information between hidden causes and observable effects. Furthermore, while random data presentation leads to symmetric and sparse couplings, sequential data fosters asymmetric couplings and non-equilibrium steady-state dynamics, offering a natural extension to conventional Boltzmann Machines. Our findings offer a unifying theory of self-organizing attractor networks, providing novel insights for AI and neuroscience.

artificial intelligence, bayesian inference, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2505.22749

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.87)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

BOFormer: Learning to Solve Multi-Objective Bayesian Optimization via Non-Markovian RL

Hung, Yu-Heng, Lin, Kai-Jie, Lin, Yu-Heng, Wang, Chien-Yi, Sun, Cheng, Hsieh, Ping-Chun

arXiv.org Artificial IntelligenceMay-30-2025

Bayesian optimization (BO) offers an efficient pipeline for optimizing black-box functions with the help of a Gaussian process prior and an acquisition function (AF). Recently, in the context of single-objective BO, learning-based AFs witnessed promising empirical results given its favorable non-myopic nature. Despite this, the direct extension of these approaches to multi-objective Bayesian optimization (MOBO) suffer from the \textit{hypervolume identifiability issue}, which results from the non-Markovian nature of MOBO problems. To tackle this, inspired by the non-Markovian RL literature and the success of Transformers in language modeling, we present a generalized deep Q-learning framework and propose \textit{BOFormer}, which substantiates this framework for MOBO via sequence modeling. Through extensive evaluation, we demonstrate that BOFormer constantly outperforms the benchmark rule-based and learning-based algorithms in various synthetic MOBO and real-world multi-objective hyperparameter optimization problems. We have made the source code publicly available to encourage further research in this direction.

machine learning, natural language, reinforcement learning, (21 more...)

arXiv.org Artificial Intelligence

2505.21974

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(3 more...)

Add feedback

Almost Linear Convergence under Minimal Score Assumptions: Quantized Transition Diffusion

Huang, Xunpeng, Lin, Yingyu, Kuang, Nikki Lijing, Dong, Hanze, Zou, Difan, Ma, Yian, Zhang, Tong

arXiv.org Machine LearningMay-29-2025

Continuous diffusion models have demonstrated remarkable performance in data generation across various domains, yet their efficiency remains constrained by two critical limitations: (1) the local adjacency structure of the forward Markov process, which restricts long-range transitions in the data space, and (2) inherent biases introduced during the simulation of time-inhomogeneous reverse denoising processes. To address these challenges, we propose Quantized Transition Diffusion (QTD), a novel approach that integrates data quantization with discrete diffusion dynamics. Our method first transforms the continuous data distribution $p_*$ into a discrete one $q_*$ via histogram approximation and binary encoding, enabling efficient representation in a structured discrete latent space. We then design a continuous-time Markov chain (CTMC) with Hamming distance-based transitions as the forward process, which inherently supports long-range movements in the original data space. For reverse-time sampling, we introduce a \textit{truncated uniformization} technique to simulate the reverse CTMC, which can provably provide unbiased generation from $q_*$ under minimal score assumptions. Through a novel KL dynamic analysis of the reverse CTMC, we prove that QTD can generate samples with $O(d\ln^2(d/ε))$ score evaluations in expectation to approximate the $d$--dimensional target distribution $p_*$ within an $ε$ error tolerance. Our method not only establishes state-of-the-art inference efficiency but also advances the theoretical foundations of diffusion-based generative modeling by unifying discrete and continuous diffusion paradigms.

artificial intelligence, log 2, machine learning, (19 more...)

arXiv.org Machine Learning

2505.21892

Country:

North America > United States > Illinois > Champaign County > Urbana (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Add feedback