AITopics | Markov Models

Collaborating Authors

Markov Models

News Overviews Instructional Materials AI-Alerts Classics

MARL-LNS: Cooperative Multi-agent Reinforcement Learning via Large Neighborhoods Search

Chen, Weizhe, Koenig, Sven, Dilkina, Bistra

arXiv.org Artificial IntelligenceApr-3-2024

Cooperative multi-agent reinforcement learning (MARL) has been an increasingly important research topic in the last half-decade because of its great potential for real-world applications. Because of the curse of dimensionality, the popular "centralized training decentralized execution" framework requires a long time in training, yet still cannot converge efficiently. In this paper, we propose a general training framework, MARL-LNS, to algorithmically address these issues by training on alternating subsets of agents using existing deep MARL algorithms as low-level trainers, while not involving any additional parameters to be trained. Based on this framework, we provide three algorithm variants based on the framework: random large neighborhood search (RLNS), batch large neighborhood search (BLNS), and adaptive large neighborhood search (ALNS), which alternate the subsets of agents differently. We test our algorithms on both the StarCraft Multi-Agent Challenge and Google Research Football, showing that our algorithms can automatically reduce at least 10% of training time while reaching the same final skill level as the original algorithm.

agent, algorithm, neighborhood, (15 more...)

arXiv.org Artificial Intelligence

2404.03101

Country:

Europe > Sweden > Stockholm > Stockholm (0.04)
North America > United States > California (0.04)
North America > Mexico (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry:

Transportation (0.93)
Leisure & Entertainment > Games > Computer Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Reinforcement Learning in Categorical Cybernetics

Hedges, Jules, Sakamoto, Riu Rodríguez

arXiv.org Artificial IntelligenceApr-3-2024

We show that several major algorithms of reinforcement learning (RL) fit into the framework of categorical cybernetics, that is to say, parametrised bidirectional processes. We build on our previous work in which we show that value iteration can be represented by precomposition with a certain optic. The outline of the main construction in this paper is: (1) We extend the Bellman operators to parametrised optics that apply to action-value functions and depend on a sample. (2) We apply a representable contravariant functor, obtaining a parametrised function that applies the Bellman iteration. (3) This parametrised function becomes the backward pass of another parametrised optic that represents the model, which interacts with an environment via an agent. Thus, parametrised optics appear in two different ways in our construction, with one becoming part of the other. As we show, many of the major classes of algorithms in RL can be seen as different extremal cases of this general setup: dynamic programming, Monte Carlo methods, temporal difference learning, and deep RL. We see this as strong evidence that this approach is a natural one and believe that it will be a fruitful way to think about RL in the future.

category, morphism, value function, (14 more...)

arXiv.org Artificial Intelligence

2404.02688

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Add feedback

Variance-Reduced Policy Gradient Approaches for Infinite Horizon Average Reward Markov Decision Processes

Ganesh, Swetha, Mondal, Washim Uddin, Aggarwal, Vaneet

arXiv.org Artificial IntelligenceApr-2-2024

We present two Policy Gradient-based methods with general parameterization in the context of infinite horizon average reward Markov Decision Processes. The first approach employs Implicit Gradient Transport for variance reduction, ensuring an expected regret of the order $\tilde{\mathcal{O}}(T^{3/5})$. The second approach, rooted in Hessian-based techniques, ensures an expected regret of the order $\tilde{\mathcal{O}}(\sqrt{T})$. These results significantly improve the state of the art of the problem, which achieves a regret of $\tilde{\mathcal{O}}(T^{3/4})$.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2404.02108

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)

Add feedback

AD4RL: Autonomous Driving Benchmarks for Offline Reinforcement Learning with Value-based Dataset

Lee, Dongsu, Eom, Chanin, Kwon, Minhae

arXiv.org Artificial IntelligenceApr-2-2024

Offline reinforcement learning has emerged as a promising technology by enhancing its practicality through the use of pre-collected large datasets. Despite its practical benefits, most algorithm development research in offline reinforcement learning still relies on game tasks with synthetic datasets. To address such limitations, this paper provides autonomous driving datasets and benchmarks for offline reinforcement learning research. We provide 19 datasets, including real-world human driver's datasets, and seven popular offline reinforcement learning algorithms in three realistic driving scenarios. We also provide a unified decision-making process model that can operate effectively across different scenarios, serving as a reference framework in algorithm design. Our research lays the groundwork for further collaborations in the community to explore practical aspects of existing reinforcement learning methods. Dataset and codes can be found in https://sites.google.com/view/ad4rl.

dataset, reinforcement, vehicle, (14 more...)

arXiv.org Artificial Intelligence

2404.02429

Country:

North America > United States (0.28)
Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report (1.00)

Industry: Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

Unifying Qualitative and Quantitative Safety Verification of DNN-Controlled Systems

Zhi, Dapeng, Wang, Peixin, Liu, Si, Ong, Luke, Zhang, Min

arXiv.org Artificial IntelligenceApr-2-2024

The rapid advance of deep reinforcement learning techniques enables the oversight of safety-critical systems through the utilization of Deep Neural Networks (DNNs). This underscores the pressing need to promptly establish certified safety guarantees for such DNN-controlled systems. Most of the existing verification approaches rely on qualitative approaches, predominantly employing reachability analysis. However, qualitative verification proves inadequate for DNN-controlled systems as their behaviors exhibit stochastic tendencies when operating in open and adversarial environments. In this paper, we propose a novel framework for unifying both qualitative and quantitative safety verification problems of DNN-controlled systems. This is achieved by formulating the verification tasks as the synthesis of valid neural barrier certificates (NBCs). Initially, the framework seeks to establish almost-sure safety guarantees through qualitative verification. In cases where qualitative verification fails, our quantitative verification method is invoked, yielding precise lower and upper bounds on probabilistic safety across both infinite and finite time horizons. To facilitate the synthesis of NBCs, we introduce their $k$-inductive variants. We also devise a simulation-guided approach for training NBCs, aiming to achieve tightness in computing precise certified lower and upper bounds. We prototype our approach into a tool called $\textsf{UniQQ}$ and showcase its efficacy on four classic DNN-controlled systems.

barrier certificate, nbc, time horizon, (9 more...)

arXiv.org Artificial Intelligence

2404.01769

Country:

Asia > China > Shanghai > Shanghai (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
Asia > Singapore (0.04)

Genre: Research Report (1.00)

Industry: Information Technology (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Distributed Autonomous Swarm Formation for Dynamic Network Bridging

Galliera, Raffaele, Möhlenhof, Thies, Amato, Alessandro, Duran, Daniel, Venable, Kristen Brent, Suri, Niranjan

arXiv.org Artificial IntelligenceApr-1-2024

Effective operation and seamless cooperation of robotic systems are a fundamental component of next-generation technologies and applications. In contexts such as disaster response, swarm operations require coordinated behavior and mobility control to be handled in a distributed manner, with the quality of the agents' actions heavily relying on the communication between them and the underlying network. In this paper, we formulate the problem of dynamic network bridging in a novel Decentralized Partially Observable Markov Decision Process (Dec-POMDP), where a swarm of agents cooperates to form a link between two distant moving targets. Furthermore, we propose a Multi-Agent Reinforcement Learning (MARL) approach for the problem based on Graph Convolutional Reinforcement Learning (DGN) which naturally applies to the networked, distributed nature of the task. The proposed method is evaluated in a simulated environment and compared to a centralized heuristic baseline showing promising results. Moreover, a further step in the direction of sim-to-real transfer is presented, by additionally evaluating the proposed approach in a near Live Virtual Constructive (LVC) UAV framework.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2404.01557

Country: North America > United States (0.94)

Genre: Research Report > New Finding (0.46)

Industry:

Telecommunications (0.47)
Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Versatile Navigation under Partial Observability via Value-guided Diffusion Policy

Zhang, Gengyu, Tang, Hao, Yan, Yan

arXiv.org Artificial IntelligenceApr-1-2024

Route planning for navigation under partial observability plays a crucial role in modern robotics and autonomous driving. Existing route planning approaches can be categorized into two main classes: traditional autoregressive and diffusion-based methods. The former often fails due to its myopic nature, while the latter either assumes full observability or struggles to adapt to unfamiliar scenarios, due to strong couplings with behavior cloning from experts. To address these deficiencies, we propose a versatile diffusion-based approach for both 2D and 3D route planning under partial observability. Specifically, our value-guided diffusion policy first generates plans to predict actions across various timesteps, providing ample foresight to the planning. It then employs a differentiable planner with state estimations to derive a value function, directing the agent's exploration and goal-seeking behaviors without seeking experts while explicitly addressing partial observability. During inference, our policy is further enhanced by a best-plan-selection strategy, substantially boosting the planning success rate. Moreover, we propose projecting point clouds, derived from RGB-D inputs, onto 2D grid-based bird-eye-view maps via semantic segmentation, generalizing to 3D environments. This simple yet effective adaption enables zero-shot transfer from 2D-trained policy to 3D, cutting across the laborious training for 3D policy, and thus certifying our versatility. Experimental results demonstrate our superior performance, particularly in navigating situations beyond expert demonstrations, surpassing state-of-the-art autoregressive and diffusion-based baselines for both 2D and 3D scenarios.

diffusion model, navigation, trajectory, (14 more...)

arXiv.org Artificial Intelligence

2404.02176

Country: North America > United States > Illinois (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

AIOps Solutions for Incident Management: Technical Guidelines and A Comprehensive Literature Review

Remil, Youcef, Bendimerad, Anes, Mathonat, Romain, Kaytoue, Mehdi

arXiv.org Artificial IntelligenceApr-1-2024

The management of modern IT systems poses unique challenges, necessitating scalability, reliability, and efficiency in handling extensive data streams. Traditional methods, reliant on manual tasks and rule-based approaches, prove inefficient for the substantial data volumes and alerts generated by IT systems. Artificial Intelligence for Operating Systems (AIOps) has emerged as a solution, leveraging advanced analytics like machine learning and big data to enhance incident management. AIOps detects and predicts incidents, identifies root causes, and automates healing actions, improving quality and reducing operational costs. However, despite its potential, the AIOps domain is still in its early stages, decentralized across multiple sectors, and lacking standardized conventions. Research and industrial contributions are distributed without consistent frameworks for data management, target problems, implementation details, requirements, and capabilities. This study proposes an AIOps terminology and taxonomy, establishing a structured incident management procedure and providing guidelines for constructing an AIOps framework. The research also categorizes contributions based on criteria such as incident management tasks, application areas, data sources, and technical approaches. The goal is to provide a comprehensive review of technical and research aspects in AIOps for incident management, aiming to structure knowledge, identify gaps, and establish a foundation for future developments in the field.

acm sigkdd international conference, european software engineering conference, software fault localization, (14 more...)

arXiv.org Artificial Intelligence

2404.01363

Country:

North America > United States (0.14)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
Europe > France > Auvergne-Rhône-Alpes > Lyon > Lyon (0.04)
(13 more...)

Genre:

Research Report > Promising Solution (1.00)
Research Report > Experimental Study (1.00)
Overview (1.00)
Research Report > New Finding (0.92)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Information Technology > Services (0.92)
(3 more...)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Quality (1.00)
(16 more...)

Add feedback

Data-Driven Knowledge Transfer in Batch $Q^*$ Learning

Chen, Elynn, Chen, Xi, Jing, Wenbo

arXiv.org Machine LearningMar-31-2024

In data-driven decision-making in marketing, healthcare, and education, it is desirable to utilize a large amount of data from existing ventures to navigate high-dimensional feature spaces and address data scarcity in new ventures. We explore knowledge transfer in dynamic decision-making by concentrating on batch stationary environments and formally defining task discrepancies through the lens of Markov decision processes (MDPs). We propose a framework of Transferred Fitted $Q$-Iteration algorithm with general function approximation, enabling the direct estimation of the optimal action-state function $Q^*$ using both target and source data. We establish the relationship between statistical performance and MDP task discrepancy under sieve approximation, shedding light on the impact of source and target sample sizes and task discrepancy on the effectiveness of knowledge transfer. We show that the final learning error of the $Q^*$ function is significantly improved from the single task rate both theoretically and empirically.

algorithm, source task, target task, (15 more...)

arXiv.org Machine Learning

2404.15209

Country:

North America > United States > New York (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.81)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.66)

Add feedback

Embodied Active Defense: Leveraging Recurrent Feedback to Counter Adversarial Patches

Wu, Lingxuan, Yang, Xiao, Dong, Yinpeng, Xie, Liuwei, Su, Hang, Zhu, Jun

arXiv.org Artificial IntelligenceMar-30-2024

The vulnerability of deep neural networks to adversarial patches has motivated numerous defense strategies for boosting model robustness. However, the prevailing defenses depend on single observation or pre-established adversary information to counter adversarial patches, often failing to be confronted with unseen or adaptive adversarial attacks and easily exhibiting unsatisfying performance in dynamic 3D environments. Inspired by active human perception and recurrent feedback mechanisms, we develop Embodied Active Defense (EAD), a proactive defensive strategy that actively contextualizes environmental information to address misaligned adversarial patches in 3D real-world settings. To achieve this, EAD develops two central recurrent sub-modules, i.e., a perception module and a policy module, to implement two critical functions of active vision. These models recurrently process a series of beliefs and observations, facilitating progressive refinement of their comprehension of the target object and enabling the development of strategic actions to counter adversarial patches in 3D environments. To optimize learning efficiency, we incorporate a differentiable approximation of environmental dynamics and deploy patches that are agnostic to the adversary strategies. Extensive experiments demonstrate that EAD substantially enhances robustness against a variety of patches within just a few steps through its action policy in safety-critical tasks (e.g., face recognition and object detection), without compromising standard accuracy. Furthermore, due to the attack-agnostic characteristic, EAD facilitates excellent generalization to unseen attacks, diminishing the averaged attack success rate by 95 percent across a range of unseen adversarial attacks.

adversarial patch, conference paper, ead, (14 more...)

arXiv.org Artificial Intelligence

2404.0054

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Netherlands > North Holland > Amsterdam (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
(4 more...)

Genre: Research Report (0.81)

Industry:

Information Technology > Security & Privacy (1.00)
Education > Educational Setting > Online (0.46)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.89)
(2 more...)

Add feedback