AITopics | Markov Models

Collaborating Authors

Markov Models

News Overviews Instructional Materials AI-Alerts Classics

Multiple Ships Cooperative Navigation and Collision Avoidance using Multi-agent Reinforcement Learning with Communication

arXiv.org Artificial IntelligenceOct-12-2024

In the real world, unmanned surface vehicles (USV) often need to coordinate with each other to accomplish specific tasks. However, achieving cooperative control in multi-agent systems is challenging due to issues such as non-stationarity and partial observability. Recent advancements in Multi-Agent Reinforcement Learning (MARL) provide new perspectives to address these challenges. Therefore, we propose using the multi-agent deep deterministic policy gradient (MADDPG) algorithm with communication to address multiple ships' cooperation problems under partial observability. We developed two tasks based on OpenAI's gym environment: cooperative navigation and cooperative collision avoidance. In these tasks, ships must not only learn effective control strategies but also establish communication protocols with other agents. We analyze the impact of external noise on communication, the effect of inter-agent communication on performance, and the communication patterns learned by the agents. The results demonstrate that our proposed framework effectively addresses cooperative navigation and collision avoidance among multiple vessels, significantly outperforming traditional single-agent algorithms. Agents establish a consistent communication protocol, enabling them to compensate for missing information through shared observations and achieve better coordination.

machine learning, reinforcement learning, ship, (18 more...)

arXiv.org Artificial Intelligence

2410.2129

Country:

Pacific Ocean > North Pacific Ocean > East China Sea > Yellow Sea (0.04)
Asia > Singapore > Central Region > Singapore (0.04)

Genre: Research Report > New Finding (0.87)

Industry: Transportation > Marine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Add feedback

Solving the Challenge Set without Solving the Task: On Winograd Schemas as a Test of Pronominal Coreference Resolution

Porada, Ian, Cheung, Jackie Chi Kit

arXiv.org Artificial IntelligenceOct-12-2024

Challenge sets such as the Winograd Schema Challenge (WSC) are used to benchmark systems' ability to resolve ambiguities in natural language. If one assumes as in existing work that solving a given challenge set is at least as difficult as solving some more general task, then high performance on the challenge set should indicate high performance on the general task overall. However, we show empirically that this assumption of difficulty does not always hold. In particular, we demonstrate that despite the strong performance of prompted language models (LMs) on the WSC and its variants, these same modeling techniques perform relatively poorly at resolving certain pronominal ambiguities attested in OntoNotes and related datasets that are perceived to be easier. Motivated by these findings, we propose a method for ensembling a prompted LM with a supervised, task-specific system that is overall more accurate at resolving pronominal coreference across datasets. Finally, we emphasize that datasets involving the same linguistic phenomenon draw on distinct, but overlapping, capabilities, and evaluating on any one dataset alone does not provide a complete picture of a system's overall capability.

computational linguistic, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2410.09448

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Italy > Tuscany > Florence (0.04)
North America > United States > Oregon > Multnomah County > Portland (0.04)
(29 more...)

Genre: Research Report > New Finding (0.93)

Industry:

Education (0.67)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

Conformal Prediction: A Data Perspective

Zhou, Xiaofan, Chen, Baiting, Gui, Yu, Cheng, Lu

arXiv.org Artificial IntelligenceOct-12-2024

The recent rapid development of well-designed and powerful machine learning (ML) models has significantly transformed our lives. However, the success of these models is often evaluated based on the accuracy of their predictions, which, while important, is not sufficient in many real-world scenarios. In high-stakes applications, it is equally critical to assess the uncertainty of model outputs. Uncertainty quantification (UQ) has long been a central problem in fields like statistics and ML. Several well-established methods, such as Bayesian inference and resampling techniques, have been widely adopted to address UQ. However, Bayesian posterior intervals are only valid if the parametric assumptions of the model are correctly specified, which may not always be the case in practical applications.

conformal prediction, prediction, prediction interval, (12 more...)

arXiv.org Artificial Intelligence

2410.06494

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > Illinois > Cook County > Chicago (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Health & Medicine (1.00)
Education (0.67)
Information Technology (0.67)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(6 more...)

Add feedback

Provable Convergence and Limitations of Geometric Tempering for Langevin Dynamics

Chehab, Omar, Korba, Anna, Stromme, Austin, Vacher, Adrien

arXiv.org Machine LearningOct-12-2024

Sampling from a target distribution π whose density is known up to a normalizing constant is a challenging problem in statistics and machine learning, and is currently the subject of intense interest due to applications in Bayesian statistics [19] and energy-based models in deep learning [68], among other areas. In these settings, the normalizing constant of the target distribution π is typically intractable, and Markov Chain Monte Carlo (MCMC) algorithms [58, 57] are commonly used to generate Markov chains in the ambient space, whose law eventually approximates the target distribution. Among MCMC algorithms, the Unadjusted Langevin Algorithm (ULA), which corresponds to a time discretization of a Langevin diffusion process, has attracted considerable attention due to its simplicity, theoretical grounding, and utility in practice [59, 76, 23, 66]. For example, ULA can be proven to converge quickly when the target distribution π is smooth and strongly log-concave [23]. However, many cases in practice require to sample from distributions which are not log-concave, and indeed potentially even multi-modal [54, 82]. In such settings, the convergence of ULA is governed by functional inequalities which effectively quantify the convexity, or lack thereof, of the target distribution [75]. Nonetheless, truly multi-modal target distributions generally have poor functional inequalities, thus leading to weak convergence guarantees for ULA. This phenomenon is not merely a theoretical artifact, and it is wellknown amongst practitioners that when sampling from multi-modal distributions, algorithms based on ULA can get stuck in local modes and suffer from slow convergence [20]. Tempering or annealing is a popular technique [51, 27, 69] to overcome the deficiencies of ULA and other MCMC methods in the multi-modal setting.

convergence, inequality, langevin dynamic, (13 more...)

arXiv.org Machine Learning

2410.09697

Country:

North America > United States > New York (0.04)
Asia > Japan (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Banking & Finance (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes

Neural Information Processing SystemsOct-11-2024, 15:46:25 GMT

We study minimax optimal reinforcement learning in episodic factored Markov decision processes (FMDPs), which are MDPs with conditionally independent transition components. Assuming the factorization is known, we propose two model-based algorithms. The first one achieves minimax optimal regret guarantees for a rich class of factored structures, while the second one enjoys better computational complexity with a slightly worse regret. A key new ingredient of our algorithms is the design of a bonus term to guide exploration. We complement our algorithms by presenting several structure dependent lower bounds on regret for FMDPs that reveal the difficulty hiding in the intricacy of the structures.

algorithm, factored markov decision process, minimax optimal reinforcement learning, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Add feedback

Model-based Reinforcement Learning for Semi-Markov Decision Processes with Neural ODEs

Neural Information Processing SystemsOct-11-2024, 15:44:31 GMT

We present two elegant solutions for modeling continuous-time dynamics, in a novel model-based reinforcement learning (RL) framework for semi-Markov decision processes (SMDPs), using neural ordinary differential equations (ODEs). Our models accurately characterize continuous-time dynamics and enable us to develop high-performing policies using a small amount of data. We also develop a model-based approach for optimizing time schedules to reduce interaction rates with the environment while maintaining the near-optimal performance, which is not possible for model-free methods. We experimentally demonstrate the efficacy of our methods across various continuous-time domains.

model-based reinforcement learning, neural ode, semi-markov decision process

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)

Add feedback

Sample-Efficient Reinforcement Learning of Partially Observable Markov Games

Neural Information Processing SystemsOct-11-2024, 15:22:34 GMT

This paper considers the challenging tasks of Multi-Agent Reinforcement Learning (MARL) under partial observability, where each agent only sees her own individual observations and actions that reveal incomplete information about the underlying state of system. This paper studies these tasks under the general model of multiplayer general-sum Partially Observable Markov Games (POMGs), which is significantly larger than the standard model of Imperfect Information Extensive-Form Games (IIEFGs). We identify a rich subclass of POMGs---weakly revealing POMGs---in which sample-efficient learning is tractable. In the self-play setting, we prove that a simple algorithm combining optimism and Maximum Likelihood Estimation (MLE) is sufficient to find approximate Nash equilibria, correlated equilibria, as well as coarse correlated equilibria of weakly revealing POMGs, in a polynomial number of samples when the number of agents is small. In the setting of playing against adversarial opponents, we show that a variant of our optimistic MLE algorithm is capable of achieving sublinear regret when being compared against the optimal maximin policies.

observable markov game, pomg, sample-efficient reinforcement learning, (2 more...)

Neural Information Processing Systems

Genre: Play > Prospect > Charge (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.65)
(2 more...)

Add feedback

Reinforcement Learning with State Observation Costs in Action-Contingent Noiselessly Observable Markov Decision Processes

Neural Information Processing SystemsOct-11-2024, 14:01:40 GMT

Many real-world problems that require making optimal sequences of decisions under uncertainty involve costs when the agent wishes to obtain information about its environment. We design and analyze algorithms for reinforcement learning (RL) in Action-Contingent Noiselessly Observable MDPs (ACNO-MDPs), a special class of POMDPs in which the agent can choose to either (1) fully observe the state at a cost and then act; or (2) act without any immediate observation information, relying on past observations to infer the underlying state. ACNO-MDPs arise frequently in important real-world application domains like healthcare, in which clinicians must balance the value of information gleaned from medical tests (e.g., blood-based biomarkers) with the costs of gathering that information (e.g., the costs of labor and materials required to administer such tests). We develop a PAC RL algorithm for tabular ACNO-MDPs that provides substantially tighter bounds, compared to generic POMDP-RL algorithms, on the total number of episodes exhibiting worse than near-optimal performance. For continuous-state ACNO-MDPs, we propose a novel method of incorporating observation information that, when coupled with modern RL algorithms, yields significantly faster learning compared to other POMDP-RL algorithms in several simulated environments.

algorithm, information, noiselessly observable markov decision process, (6 more...)

Neural Information Processing Systems

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Particle Gibbs for Infinite Hidden Markov Models

Neural Information Processing SystemsOct-11-2024, 13:06:25 GMT

Infinite Hidden Markov Models (iHMM's) are an attractive, nonparametric generalization of the classical Hidden Markov Model which can automatically infer the number of hidden states in the system. However, due to the infinite-dimensional nature of the transition dynamics, performing inference in the iHMM is difficult. In this paper, we present an infinite-state Particle Gibbs (PG) algorithm to resample state trajectories for the iHMM. The proposed algorithm uses an efficient proposal optimized for iHMMs, and leverages ancestor sampling to improve the mixing of the standard PG algorithm. Our algorithm demonstrates significant convergence improvements on synthetic and real world data sets.

ihmm, infinite hidden markov model, particle gibbs, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Bayesian Risk Markov Decision Processes

Neural Information Processing SystemsOct-11-2024, 13:06:15 GMT

We consider finite-horizon Markov Decision Processes where parameters, such as transition probabilities, are unknown and estimated from data. The popular distributionally robust approach to addressing the parameter uncertainty can sometimes be overly conservative. In this paper, we propose a new formulation, Bayesian risk Markov decision process (BR-MDP), to address parameter uncertainty in MDPs, where a risk functional is applied in nested form to the expected total cost with respect to the Bayesian posterior distributions of the unknown parameters. The proposed formulation provides more flexible risk attitudes towards parameter uncertainty and takes into account the availability of data in future time stages. To solve the proposed formulation with the conditional value-at-risk (CVaR) risk functional, we propose an efficient approximation algorithm by deriving an analytical approximation of the value function and utilizing the convexity of CVaR.

bayesian risk markov decision process, markov decision process, parameter uncertainty, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Decision Support Systems (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.91)

Add feedback