AITopics | Markov Models

Collaborating Authors

Markov Models

News Overviews Instructional Materials AI-Alerts Classics

On the hardness of RL with Lookahead

Pla, Corentin, Richard, Hugo, Abeille, Marc, Merlis, Nadav, Perchet, Vianney

arXiv.org Machine LearningOct-23-2025

We study reinforcement learning (RL) with transition look-ahead, where the agent may observe which states would be visited upon playing any sequence of $\ell$ actions before deciding its course of action. While such predictive information can drastically improve the achievable performance, we show that using this information optimally comes at a potentially prohibitive computational cost. Specifically, we prove that optimal planning with one-step look-ahead ($\ell=1$) can be solved in polynomial time through a novel linear programming formulation. In contrast, for $\ell \geq 2$, the problem becomes NP-hard. Our results delineate a precise boundary between tractable and intractable cases for the problem of planning with transition look-ahead in reinforcement learning.

machine learning, reinforcement learning, transition look-ahead, (18 more...)

arXiv.org Machine Learning

2510.19372

Country:

North America > United States > New York (0.04)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(3 more...)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.68)
(2 more...)

Add feedback

QPPG: Quantum-Preconditioned Policy Gradient for Link Adaptation in Rayleigh Fading Channels

Giwa, Oluwaseyi, Mohsin, Muhammad Ahmed, Adesola, Folarin Jubril, Jamshed, Muhammad Ali

arXiv.org Artificial IntelligenceOct-23-2025

IRELESS communication over fading channels remains one of the fundamental challenges in modern networks. In particular, Rayleigh fading channels, which model rich-scattering non-line-of-sight environments, cause rapid and unpredictable fluctuations in signal strength that can significantly degrade throughput and reliability. To mitigate these effects, link adaptation techniques such as adaptive modulation and coding (AMC) and power control have been extensively studied as key enablers of efficient spectrum use [1], [2]. Early works on link adaptation for Rayleigh fading channels demonstrated how explicit channel estimation and threshold-based switching could improve throughput and maintain robustness under fading conditions [3]-[6]. Despite their success, these classical approaches rely on accurate channel estimation, fixed rules, and often compromise between average throughput and outage probability in a suboptimal manner [4]-[6]. Furthermore, as networks evolve toward 6G with denser topologies and stringent reliability demands, such schemes struggle to scale or adapt to system-level complexities [7], [8]. Recent works have explored deep reinforcement learning (DRL) and meta reinforcement learning (RL) for link adaptation and resource allocation, showing promising adaptability but still facing high sample complexity and training instability [9]-[12]. In this letter, we propose quantum-preconditioned policy gradient (QPPG), a natural actor-critic method for link adap-Oluwaseyi Giwa is with the African Institute for Mathematical Sciences, South Africa (e-mail: {oluwaseyi}@aims.ac.za). Muhammad Ahmed Mohsin is with Stanford University, Stanford, California, 94305, United States (e-mail: {muahmed}@stanford.edu).

link adaptation, machine learning, reinforcement learning, (11 more...)

arXiv.org Artificial Intelligence

2506.15753

Country:

North America > United States > California > Santa Clara County > Stanford (0.24)
North America > United States > California > Santa Clara County > Palo Alto (0.24)

Genre: Research Report (0.40)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.78)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

Add feedback

ARM-FM: Automated Reward Machines via Foundation Models for Compositional Reinforcement Learning

Castanyer, Roger Creus, Mohamed, Faisal, Castro, Pablo Samuel, Neary, Cyrus, Berseth, Glen

arXiv.org Artificial IntelligenceOct-23-2025

Reinforcement learning (RL) algorithms are highly sensitive to reward function specification, which remains a central challenge limiting their broad applicability. We present ARM-FM: Automated Reward Machines via Foundation Models, a framework for automated, compositional reward design in RL that leverages the high-level reasoning capabilities of foundation models (FMs). Reward machines (RMs) -- an automata-based formalism for reward specification -- are used as the mechanism for RL objective specification, and are automatically constructed via the use of FMs. The structured formalism of RMs yields effective task decompositions, while the use of FMs enables objective specifications in natural language. Concretely, we (i) use FMs to automatically generate RMs from natural language specifications; (ii) associate language embeddings with each RM automata-state to enable generalization across tasks; and (iii) provide empirical evidence of ARM-FM's effectiveness in a diverse suite of challenging environments, including evidence of zero-shot generalization.

large language model, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2510.14176

Country: North America > Canada (0.28)

Genre: Research Report > New Finding (0.93)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Add feedback

Test-time Prompt Intervention

Yang, Chenxu, Si, Qingyi, Dai, Mz, Yao, Dingyu, Zheng, Mingyu, Chen, Minghui, Lin, Zheng, Wang, Weiping

arXiv.org Artificial IntelligenceOct-23-2025

Test-time compute has led to remarkable success in the large language model (LLM) community, particularly for complex tasks, where longer chains of thought (CoTs) are generated to enhance reasoning capabilities. However, growing evidence reveals that such reasoning models often produce CoTs plagued by excessive redundancy, including unnecessary verification steps and repetitive reasoning shifts. The root cause lies in post-training of them that overly rely on outcome reward paradigms, as the data of process reward paradigms, which regulate intermediate reasoning steps, is difficult to construct at scale. To address this, we propose PI, a novel framework for Test-time Prompt Intervention. PI provides an interface to dynamically guide and regulate reasoning paths during inference through timely (When module) and proper (How module) interventions and post-intervention sampling (Which module). This allows human problem-solving expertise and cognitive science principles to be seamlessly integrated into LLMs' reasoning processes, enhancing controllability and interpretability. Extensive experiments across multiple models and datasets demonstrate that PI significantly shortens CoTs while reducing hallucination, yielding more concise and reliable reasoning.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2508.02511

Country: North America > United States > California (0.29)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.45)

Add feedback

A Markov Decision Process for Variable Selection in Branch & Bound

Strang, Paul, Alès, Zacharie, Bissuel, Côme, Juan, Olivier, Kedad-Sidhoum, Safia, Rachelson, Emmanuel

arXiv.org Artificial IntelligenceOct-23-2025

Mixed-Integer Linear Programming (MILP) is a powerful framework used to address a wide range of NP-hard combinatorial optimization problems, often solved by Branch and Bound (B&B). A key factor influencing the performance of B&B solvers is the variable selection heuristic governing branching decisions. Recent contributions have sought to adapt reinforcement learning (RL) algorithms to the B&B setting to learn optimal branching policies, through Markov Decision Processes (MDP) inspired formulations, and ad hoc convergence theorems and algorithms. In this work, we introduce BBMDP, a principled vanilla MDP formulation for variable selection in B&B, allowing to leverage a broad range of RL algorithms for the purpose of learning optimal B\&B heuristics. Computational experiments validate our model empirically, as our branching agent outperforms prior state-of-the-art RL agents on four standard MILP benchmarks.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2510.19348

Country: Europe > France (0.14)

Genre: Research Report (0.51)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.70)

Add feedback

SPOT: Scalable Policy Optimization with Trees for Markov Decision Processes

Xiong, Xuyuan, Chumpitaz-Flores, Pedro, Hua, Kaixun, Hua, Cheng

arXiv.org Artificial IntelligenceOct-23-2025

Interpretable reinforcement learning policies are essential for high-stakes decision-making, yet optimizing decision tree policies in Markov Decision Processes (MDPs) remains challenging. We propose SPOT, a novel method for computing decision tree policies, which formulates the optimization problem as a mixed-integer linear program (MILP). To enhance efficiency, we employ a reduced-space branch-and-bound approach that decouples the MDP dynamics from tree-structure constraints, enabling efficient parallel search. This significantly improves runtime and scalability compared to previous methods. Our approach ensures that each iteration yields the optimal decision tree. Experimental results on standard benchmarks demonstrate that SPOT achieves substantial speedup and scales to larger MDPs with a significantly higher number of states. The resulting decision tree policies are interpretable and compact, maintaining transparency without compromising performance. These results demonstrate that our approach simultaneously achieves interpretability and scalability, delivering high-quality policies an order of magnitude faster than existing approaches.

artificial intelligence, machine learning, optimization problem, (16 more...)

arXiv.org Artificial Intelligence

2510.19241

Country: North America > United States (0.68)

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.85)

Add feedback

DiSRouter: Distributed Self-Routing for LLM Selections

Zheng, Hang, Xu, Hongshen, Lin, Yongkai, Fan, Shuai, Chen, Lu, Yu, Kai

arXiv.org Artificial IntelligenceOct-23-2025

The proliferation of Large Language Models (LLMs) has created a diverse ecosystem of models with highly varying performance and costs, necessitating effective query routing to balance performance and expense. Current routing systems often rely on a centralized external router trained on a fixed set of LLMs, making them inflexible and prone to poor performance since the small router can not fully understand the knowledge boundaries of different LLMs. We introduce DiSRouter (Distributed Self-Router), a novel paradigm that shifts from centralized control to distributed routing. In DiSRouter, a query traverses a network of LLM agents, each independently deciding whether to answer or route to other agents based on its own self-awareness, its ability to judge its competence. This distributed design offers superior flexibility, scalability, and generalizability. To enable this, we propose a two-stage Self-Awareness Training pipeline that enhances each LLM's self-awareness. Extensive experiments demonstrate that DiSRouter significantly outperforms existing routing methods in utility across various scenarios, effectively distinguishes between easy and hard queries, and shows strong generalization to out-of-domain tasks. Our work validates that leveraging an LLM's intrinsic self-awareness is more effective than external assessment, paving the way for more modular and efficient multi-agent systems.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2510.19208

Country:

Asia > China (0.28)
North America > United States > Minnesota (0.28)

Genre: Research Report (1.00)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Error Analysis of Triangular Optimal Transport Maps for Filtering

Al-Jarrah, Mohammad, Hosseini, Bamdad, Jin, Niyizhen, Martino, Michele, Taghvaei, Amirhossein

arXiv.org Machine LearningOct-23-2025

We present a systematic analysis of estimation errors for a class of optimal transport based algorithms for filtering and data assimilation. Along the way, we extend previous error analyses of Brenier maps to the case of conditional Brenier maps that arise in the context of simulation based inference. We then apply these results in a filtering scenario to analyze the optimal transport filtering algorithm of Al-Jarrah et al. (2024, ICML). An extension of that algorithm along with numerical benchmarks on various non-Gaussian and high-dimensional examples are provided to demonstrate its effectiveness and practical potential.

artificial intelligence, inequality, machine learning, (16 more...)

arXiv.org Machine Learning

2510.19283

Country: North America > United States (0.45)

Genre: Research Report > New Finding (0.45)

Industry: Government (0.45)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Mathematics of Computing (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
(3 more...)

Add feedback

Joint Optimization of Cooperation Efficiency and Communication Covertness for Target Detection with AUVs

Zhang, Xueyao, Yang, Bo, Yu, Zhiwen, Cao, Xuelin, Xiang, Wei, Guo, Bin, Wang, Liang, Lau, Billy Pik Lik, Alexandropoulos, George C., Luo, Jun, Debbah, Mérouane, Han, Zhu, Yuen, Chau

arXiv.org Artificial IntelligenceOct-22-2025

This paper investigates underwater cooperative target detection using autonomous underwater vehicles (AUVs), with a focus on the critical trade-off between cooperation efficiency and communication covertness. To tackle this challenge, we first formulate a joint trajectory and power control optimization problem, and then present an innovative hierarchical action management framework to solve it. According to the hierarchical formulation, at the macro level, the master AUV models the agent selection process as a Markov decision process and deploys the proximal policy optimization algorithm for strategic task allocation. At the micro level, each selected agent's decentralized decision-making is modeled as a partially observable Markov decision process, and a multi-agent proximal policy optimization algorithm is used to dynamically adjust its trajectory and transmission power based on its local observations. Under the centralized training and decentralized execution paradigm, our target detection framework enables adaptive covert cooperation while satisfying both energy and mobility constraints. By comprehensively modeling the considered system, the involved signals and tasks, as well as energy consumption, theoretical insights and practical solutions for the efficient and secure operation of multiple AUVs are provided, offering significant implications for the execution of underwater covert communication tasks.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

2510.18225

Country:

North America > Canada (0.46)
Asia > China (0.28)
North America > United States (0.28)

Genre: Research Report (1.00)

Industry: Energy (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(2 more...)

Add feedback

Wonder Wins Ways: Curiosity-Driven Exploration through Multi-Agent Contextual Calibration

Pan, Yiyuan, Liu, Zhe, Wang, Hesheng

arXiv.org Artificial IntelligenceOct-22-2025

Autonomous exploration in complex multi-agent reinforcement learning (MARL) with sparse rewards critically depends on providing agents with effective intrinsic motivation. While artificial curiosity offers a powerful self-supervised signal, it often confuses environmental stochasticity with meaningful novelty. Moreover, existing curiosity mechanisms exhibit a uniform novelty bias, treating all unexpected observations equally. However, peer behavior novelty, which encode latent task dynamics, are often overlooked, resulting in suboptimal exploration in decentralized, communication-free MARL settings. To this end, inspired by how human children adaptively calibrate their own exploratory behaviors via observing peers, we propose a novel approach to enhance multi-agent exploration. We introduce CERMIC, a principled framework that empowers agents to robustly filter noisy surprise signals and guide exploration by dynamically calibrating their intrinsic curiosity with inferred multi-agent context. Additionally, CERMIC generates theoretically-grounded intrinsic rewards, encouraging agents to explore state transitions with high information gain. We evaluate CERMIC on benchmark suites including VMAS, Meltingpot, and SMACv2. Empirical results demonstrate that exploration with CERMIC significantly outperforms SoTA algorithms in sparse-reward environments.

artificial intelligence, bayesian inference, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2509.20648

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.87)

Industry:

Health & Medicine (0.48)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback