AITopics | Markov Models

Collaborating Authors

Markov Models

News Overviews Instructional Materials AI-Alerts Classics

Safe and Efficient Robot Action Planning in the Presence of Unconcerned Humans

arXiv.org Artificial IntelligenceJan-22-2025

This paper proposes a robot action planning scheme that provides an efficient and probabilistically safe plan for a robot interacting with an unconcerned human -- someone who is either unaware of the robot's presence or unwilling to engage in ensuring safety. The proposed scheme is predictive, meaning that the robot is required to predict human actions over a finite future horizon; such predictions are often inaccurate in real-world scenarios. One possible approach to reduce the uncertainties is to provide the robot with the capability of reasoning about the human's awareness of potential dangers. This paper discusses that by using a binary variable, so-called danger awareness coefficient, it is possible to differentiate between concerned and unconcerned humans, and provides a learning algorithm to determine this coefficient by observing human actions. Moreover, this paper argues how humans rely on predictions of other agents' future actions (including those of robots in human-robot interaction) in their decision-making. It also shows that ignoring this aspect in predicting human's future actions can significantly degrade the efficiency of the interaction, causing agents to deviate from their optimal paths. The proposed robot action planning scheme is verified and validated via extensive simulation and experimental studies on a LoCoBot WidowX-250.

artificial intelligence, machine learning, robot, (18 more...)

arXiv.org Artificial Intelligence

2501.13203

Country:

Europe > Italy (0.04)
Asia > China > Hong Kong (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(16 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Energy (0.46)
Transportation > Ground > Road (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.93)

Add feedback

An Offline Multi-Agent Reinforcement Learning Framework for Radio Resource Management

Eldeeb, Eslam, Alves, Hirley

arXiv.org Artificial IntelligenceJan-22-2025

Offline multi-agent reinforcement learning (MARL) addresses key limitations of online MARL, such as safety concerns, expensive data collection, extended training intervals, and high signaling overhead caused by online interactions with the environment. In this work, we propose an offline MARL algorithm for radio resource management (RRM), focusing on optimizing scheduling policies for multiple access points (APs) to jointly maximize the sum and tail rates of user equipment (UEs). We evaluate three training paradigms: centralized, independent, and centralized training with decentralized execution (CTDE). Our simulation results demonstrate that the proposed offline MARL framework outperforms conventional baseline approaches, achieving over a 15\% improvement in a weighted combination of sum and tail rates. Additionally, the CTDE framework strikes an effective balance, reducing the computational complexity of centralized methods while addressing the inefficiencies of independent training. These results underscore the potential of offline MARL to deliver scalable, robust, and efficient solutions for resource management in dynamic wireless networks.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2501.12991

Country:

Europe > Finland > Northern Ostrobothnia > Oulu (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

Optimizing Return Distributions with Distributional Dynamic Programming

Pires, Bernardo Ávila, Rowland, Mark, Borsa, Diana, Guo, Zhaohan Daniel, Khetarpal, Khimya, Barreto, André, Abel, David, Munos, Rémi, Dabney, Will

arXiv.org Artificial IntelligenceJan-22-2025

We introduce distributional dynamic programming (DP) methods for optimizing statistical functionals of the return distribution, with standard reinforcement learning as a special case. Previous distributional DP methods could optimize the same class of expected utilities as classic DP. To go beyond expected utilities, we combine distributional DP with stock augmentation, a technique previously introduced for classic DP in the context of risk-sensitive RL, where the MDP state is augmented with a statistic of the rewards obtained so far (since the first time step). We find that a number of recently studied problems can be formulated as stock-augmented return distribution optimization, and we show that we can use distributional DP to solve them. We analyze distributional value and policy iteration, with bounds and a study of what objectives these distributional DP methods can or cannot optimize. We describe a number of applications outlining how to use distributional DP to solve different stock-augmented return distribution optimization problems, for example maximizing conditional value-at-risk, and homeostatic regulation. To highlight the practical potential of stock-augmented return distribution optimization and distributional DP, we combine the core ideas of distributional value iteration with the deep RL agent DQN, and empirically evaluate it for solving instances of the applications discussed.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2501.13028

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Israel > Haifa District > Haifa (0.04)

Genre: Research Report > New Finding (0.67)

Industry: Information Technology (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

The regret lower bound for communicating Markov Decision Processes

Boone, Victor, Maillard, Odalric-Ambrym

arXiv.org Machine LearningJan-22-2025

This paper is devoted to the extension of the regret lower bound beyond ergodic Markov decision processes (MDPs) in the problem dependent setting. While the regret lower bound for ergodic MDPs is well-known and reached by tractable algorithms, we prove that the regret lower bound becomes significatively more complex in communicating MDPs. Our lower bound revisits the necessary explorative behavior of consistent learning agents and further explains that all optimal regions of the environment must be overvisited compared to sub-optimal ones, a phenomenon that we refer to as co-exploration. In tandem, we show that these two explorative and co-explorative behaviors are intertwined with navigation constraints obtained by scrutinizing the navigation structure at logarithmic scale. The resulting lower bound is expressed as the solution of an optimization problem that, in many standard classes of MDPs, can be specialized to recover existing results. From a computational perspective, it is provably $\Sigma_2^\textrm{P}$-hard in general and as a matter of fact, even testing the membership to the feasible region is coNP-hard. We further provide an algorithm to approximate the lower bound in a constructive way.

artificial intelligence, machine learning, markov decision process, (17 more...)

arXiv.org Machine Learning

2501.13013

Country: Europe > France (0.46)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.72)

Add feedback

Reviews: Non-Asymptotic Gap-Dependent Regret Bounds for Tabular MDPs

Neural Information Processing SystemsJan-21-2025, 17:38:30 GMT

The paper contributes useful structural results in regret minimization in the Markov Decision Process setting of RL, specifically for the class of tabular (i.e., unstructured) finite-horizon episodic MDPs. The paper is likely to stimulate the finite-sample analysis of online learning in MDPs via its new theoretical techniques.

non-asymptotic gap-dependent regret bound, tabular mdp

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.93)

Add feedback

Review for NeurIPS paper: Planning in Markov Decision Processes with Gap-Dependent Sample Complexity

Neural Information Processing SystemsJan-21-2025, 13:04:00 GMT

Additional Feedback: Post-rebuttal The authors addressed some of my concerns. As the authors would redesign some of the experiments in the revision, I'd raise my score to 6. Comments and questions: 1. Are there any lower bound results on the sample complexity of planning? Are there any particular reasons, and what is the high-level idea of this algorithm? If I understand correctly this rule is to get the gap-dependent sample complexity. What if we use the simple greedy policy for the first action, and what will go wrong in the proof?

algorithm, gap-dependent sample complexity, sample complexity, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.40)

Add feedback

Review for NeurIPS paper: Planning in Markov Decision Processes with Gap-Dependent Sample Complexity

Neural Information Processing SystemsJan-21-2025, 13:03:54 GMT

The reviewers unanimously agree that the paper makes a nice contribution to the MCTS literature and offers novel algorithmic and analytical ideas.

gap-dependent sample complexity, markov decision process, neurips paper

Neural Information Processing Systems

Technology:

Information Technology > Decision Support Systems (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.40)

Add feedback

Reviews: Differentially Private Markov Chain Monte Carlo

Neural Information Processing SystemsJan-21-2025, 09:12:20 GMT

This work provides a detailed Renyi DP analysis of a modified MCMC acceptance test, and empirically demonstrates its efficacy. Originality: the RDP analysis and modified acceptance test is a novel contribution. Quality: the work is a complete piece on exploring this MCMC method, with a detailed analysis and experiments. Clarity: the work is fairly clearly written, but it can be easy to lose track of exactly what parameters remain as choices to be tuned in a list of various corrective factors and approximations. Significance: the work gives an MCMC method with privacy without convergence, which permits privacy guarantees to be given over a multitude of problems without doubts or guess work about when to stop the chain.

acceptance test, mcmc method, private markov chain monte carlo

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.40)

Add feedback

Reviews: Differentially Private Markov Chain Monte Carlo

Neural Information Processing SystemsJan-21-2025, 09:12:10 GMT

Although the analysis follows some known ideas from the literature on private SGD, there are a number of new tricks which make the current approach interesting, most notably the observation that one can use randomized acceptance tests to preserve privacy in an MCMC algorithm. When preparing the final version of this manuscript the authors should carefully consider the points raised in the reviews regarding: clarifying where the contributions lie with respect to previous work; provide high-level intuitions of the proofs to help a reader navigate the derivations; discuss the role of approximations used in the paper, where they affect the privacy or utility of the method, and where there is some room left for improvement.

private markov chain monte carlo

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.40)

Add feedback

Reviews: State Aggregation Learning from Markov Transition Data

Neural Information Processing SystemsJan-21-2025, 08:49:42 GMT

This paper studies the problem of learning soft state aggregation of a Markov model, where there are r hidden meta states, each corresponds to a distribution over the observed state of the Markov model. Under the anchor state assumption, the authors propose an algorithm that provably learns the state aggregation model from the Markov chain's trajectory. They evaluated their algorithm on a Manhattan taxi-trip dataset which yields interesting discoveries. There has been lots of work on estimating the low rank transition matrix itself and on matrix factorization in the topic modelling setting, and this work seems to be connecting the two problems. The paper is presented well and easy to follow. I have the following questions regarding the novelty and impact of this paper.

algorithm, markov transition data, state aggregation learning, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.89)

Add feedback