AITopics | Markov Models

Collaborating Authors

Markov Models

News Overviews Instructional Materials AI-Alerts Classics

Agent-Based Markov Modeling for Improved COVID-19 Mitigation Policies

Capobianco, Roberto (Sony AI & Sapienza University of Rome) | Kompella, Varun (Sony AI) | Ault, James (Texas A&M University) | Sharon, Guni (Texas A&M University) | Jong, Stacy (The University of Texas at Austin) | Fox, Spencer (The University of Texas at Austin) | Meyers, Lauren (The University of Texas at Austin) | Wurman, Peter R. (Sony AI) | Stone, Peter (Sony AI & The University of Texas at Austin)

Journal of Artificial Intelligence ResearchAug-18-2021

The year 2020 saw the covid-19 virus lead to one of the worst global pandemics in history. As a result, governments around the world have been faced with the challenge of protecting public health while keeping the economy running to the greatest extent possible. Epidemiological models provide insight into the spread of these types of diseases and predict the effects of possible intervention policies. However, to date, even the most data-driven intervention policies rely on heuristics. In this paper, we study how reinforcement learning (RL) and Bayesian inference can be used to optimize mitigation policies that minimize economic impact without overwhelming hospital capacity. Our main contributions are (1) a novel agent-based pandemic simulator which, unlike traditional models, is able to model fine-grained interactions among people at specific locations in a community; (2) an RLbased methodology for optimizing fine-grained mitigation policies within this simulator; and (3) a Hidden Markov Model for predicting infected individuals based on partial observations regarding test results, presence of symptoms, and past physical contacts. This article is part of the special track on AI and COVID-19.

agent-based markov modeling, infection probability, probability, (13 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1.12632

AI Access Foundation

12632

Journal of Artificial Intelligence Research

Country:

Europe > Sweden (0.14)
North America > United States > Texas > Travis County > Austin (0.14)
North America > United States > Texas > Brazos County > College Station (0.14)
(8 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Smoother Entropy for Active State Trajectory Estimation and Obfuscation in POMDPs

Molloy, Timothy L., Nair, Girish N.

arXiv.org Artificial IntelligenceAug-18-2021

We study the problem of controlling a partially observed Markov decision process (POMDP) to either aid or hinder the estimation of its state trajectory by optimising the conditional entropy of the state trajectory given measurements and controls, a quantity we dub the smoother entropy. Our consideration of the smoother entropy contrasts with previous active state estimation and obfuscation approaches that instead resort to measures of marginal (or instantaneous) state uncertainty due to tractability concerns. By establishing novel expressions of the smoother entropy in terms of the usual POMDP belief state, we show that our active estimation and obfuscation problems can be reformulated as Markov decision processes (MDPs) that are fully observed in the belief state. Surprisingly, we identify belief-state MDP reformulations of both active estimation and obfuscation with concave cost and cost-to-go functions, which enables the use of standard POMDP techniques to construct tractable bounded-error (approximate) solutions. We show in simulations that optimisation of the smoother entropy leads to superior trajectory estimation and obfuscation compared to alternative approaches. Index Terms Partially observed Markov decision process (POMDP), entropy, estimation, directed information. The problem of controlling a stochastic dynamical system to either aid or hinder the estimation of its time-varying state arises across numerous applications in automatic control, signal processing, and robotics.

cost-to-go function, obfuscation problem, smoother entropy, (10 more...)

arXiv.org Artificial Intelligence

2108.10227

Country:

Oceania > Australia (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
(4 more...)

Genre: Research Report (0.82)

Industry: Energy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Revisiting State Augmentation methods for Reinforcement Learning with Stochastic Delays

Nath, Somjit, Baranwal, Mayank, Khadilkar, Harshad

arXiv.org Artificial IntelligenceAug-17-2021

Several real-world scenarios, such as remote control and sensing, are comprised of action and observation delays. The presence of delays degrades the performance of reinforcement learning (RL) algorithms, often to such an extent that algorithms fail to learn anything substantial. This paper formally describes the notion of Markov Decision Processes (MDPs) with stochastic delays and shows that delayed MDPs can be transformed into equivalent standard MDPs (without delays) with significantly simplified cost structure. We employ this equivalence to derive a model-free Delay-Resolved RL framework and show that even a simple RL algorithm built upon this framework achieves near-optimal rewards in environments with stochastic delays in actions and observations. The delay-resolved deep Q-network (DRDQN) algorithm is bench-marked on a variety of environments comprising of multi-step and stochastic delays and results in better performance, both in terms of achieving near-optimal rewards and minimizing the computational overhead thereof, with respect to the currently established algorithms.

agent, algorithm, mdp, (12 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3459637.3482386

2108.07555

Country:

Europe > Switzerland > Basel-City > Basel (0.04)
Asia > India > Maharashtra > Mumbai (0.04)
Oceania > Australia (0.04)
(4 more...)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Epidemiology (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

The Emergence of Wireless MAC Protocols with Multi-Agent Reinforcement Learning

Mota, Mateus P., Valcarce, Alvaro, Gorce, Jean-Marie, Hoydis, Jakob

arXiv.org Artificial IntelligenceAug-17-2021

In this paper, we propose a new framework, exploiting the multi-agent deep deterministic policy gradient (MADDPG) algorithm, to enable a base station (BS) and user equipment (UE) to come up with a medium access control (MAC) protocol in a multiple access scenario. In this framework, the BS and UEs are reinforcement learning (RL) agents that need to learn to cooperate in order to deliver data. The network nodes can exchange control messages to collaborate and deliver data across the network, but without any prior agreement on the meaning of the control messages. In such a framework, the agents have to learn not only the channel access policy, but also the signaling policy. The collaboration between agents is shown to be important, by comparing the proposed algorithm to ablated versions where either the communication between agents or the central critic is removed. The comparison with a contention-free baseline shows that our framework achieves a superior performance in terms of goodput and can effectively be used to learn a new protocol.

agent, communication, protocol, (14 more...)

arXiv.org Artificial Intelligence

2108.07144

Country:

North America > United States > California > San Diego County > San Diego (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)
Europe > France > Auvergne-Rhône-Alpes > Lyon > Lyon (0.04)

Genre: Research Report (0.64)

Industry:

Telecommunications (0.66)
Commercial Services & Supplies > Security & Alarm Services (0.55)
Information Technology > Security & Privacy (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

Reinforcement Learning for Robot Navigation with Adaptive ExecutionDuration (AED) in a Semi-Markov Model

Chen, Yu'an, Ye, Ruosong, Tao, Ziyang, Liu, Hongjian, Chen, Guangda, Peng, Jie, Ma, Jun, Zhang, Yu, Zhang, Yanyong, Ji, Jianmin

arXiv.org Artificial IntelligenceAug-13-2021

Deep reinforcement learning (DRL) algorithms have proven effective in robot navigation, especially in unknown environments, through directly mapping perception inputs into robot control commands. Most existing methods adopt uniform execution duration with robots taking commands at fixed intervals. As such, the length of execution duration becomes a crucial parameter to the navigation algorithm. In particular, if the duration is too short, then the navigation policy would be executed at a high frequency, with increased training difficulty and high computational cost. Meanwhile, if the duration is too long, then the policy becomes unable to handle complex situations, like those with crowded obstacles. It is thus tricky to find the "sweet" duration range; some duration values may render a DRL model to fail to find a navigation path. In this paper, we propose to employ adaptive execution duration to overcome this problem. Specifically, we formulate the navigation task as a Semi-Markov Decision Process (SMDP) problem to handle adaptive execution duration. We also improve the distributed proximal policy optimization (DPPO) algorithm and provide its theoretical guarantee for the specified SMDP problem. We evaluate our approach both in the simulator and on an actual robot. The results show that our approach outperforms the other DRL-based method (with fixed execution duration) by 10.3% in terms of the navigation success rate.

adaptive executionduration, reinforcement learning, semi-markov model, (2 more...)

arXiv.org Artificial Intelligence

2108.06161

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)

Add feedback

Q-Mixing Network for Multi-Agent Pathfinding in Partially Observable Grid Environments

Davydov, Vasilii, Skrynnik, Alexey, Yakovlev, Konstantin, Panov, Aleksandr I.

arXiv.org Artificial IntelligenceAug-13-2021

In this paper, we consider the problem of multi-agent navigation in partially observable grid environments. This problem is challenging for centralized planning approaches as they, typically, rely on the full knowledge of the environment. We suggest utilizing the reinforcement learning approach when the agents, first, learn the policies that map observations to actions and then follow these policies to reach their goals. To tackle the challenge associated with learning cooperative behavior, i.e. in many cases agents need to yield to each other to accomplish a mission, we use a mixing Q-network that complements learning individual policies. In the experimental evaluation, we show that such approach leads to plausible results and scales well to large number of agents.

agent, algorithm, reinforcement, (15 more...)

arXiv.org Artificial Intelligence

2108.06148

Country:

Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.05)
Asia > Russia (0.05)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

Pathfinder: Parallel quasi-Newton variational inference

Zhang, Lu, Carpenter, Bob, Gelman, Andrew, Vehtari, Aki

arXiv.org Machine LearningAug-11-2021

We introduce Pathfinder, a variational method for approximately sampling from differentiable log densities. Starting from a random initialization, Pathfinder locates normal approximations to the target density along a quasi-Newton optimization path, with local covariance estimated using the inverse Hessian estimates produced by the optimizer. Pathfinder returns draws from the approximation with the lowest estimated Kullback-Leibler (KL) divergence to the true posterior. We evaluate Pathfinder on a wide range of posterior distributions, demonstrating that its approximate draws are better than those from automatic differentiation variational inference (ADVI) and comparable to those produced by short chains of dynamic Hamiltonian Monte Carlo (HMC), as measured by 1-Wasserstein distance. Compared to ADVI and short dynamic HMC runs, Pathfinder requires one to two orders of magnitude fewer log density and gradient evaluations, with greater reductions for more challenging posteriors. Importance resampling over multiple runs of Pathfinder improves the diversity of approximate draws, reducing 1-Wasserstein distance further and providing a measure of robustness to optimization failures on plateaus, saddle points, or in minor modes. The Monte Carlo KL-divergence estimates are embarrassingly parallelizable in the core Pathfinder algorithm, as are multiple runs in the resampling version, further increasing Pathfinder's speed advantage with multiple cores.

approximation, optimization path, pathfinder, (15 more...)

arXiv.org Machine Learning

2108.03782

Country:

North America > United States (0.28)
Europe > Austria > Vienna (0.14)
Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Imitation Learning by Reinforcement Learning

Ciosek, Kamil

arXiv.org Machine LearningAug-10-2021

Typically, Reinforcement Learning (RL) assumes access to a pre-specified reward and then learns a policy maximizing the expected average of this reward along a trajectory. However, specifying rewards is difficult for many practical tasks (Atkeson & Schaal, 1997; Zhang et al., 2018; Ibarz et al., 2018). In such cases, it is convenient to instead perform Imitation Learning (IL), learning a policy from expert demonstrations. There are two major categories of Imitation Learning algorithms: Behavioral Cloning and Inverse Reinforcement Learning. Behavioral Cloning learns the policy by supervised learning on expert data, but is not robust to training errors, failing in settings where expert data is limited (Ross & Bagnell, 2010). Inverse Reinforcement Learning (IRL) achieves improved performance on limited data by constructing reward signals and calling an RL oracle to maximize these rewards (Ng et al., 2000).

algorithm, imitation learner, learning, (13 more...)

arXiv.org Machine Learning

2108.04763

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

Add feedback

A Novel Markovian Framework for Integrating Absolute and Relative Ordinal Emotion Information

Wu, Jingyao, Dang, Ting, Sethu, Vidhyasaharan, Ambikairajah, Eliathamby

arXiv.org Artificial IntelligenceAug-10-2021

There is growing interest in affective computing for the representation and prediction of emotions along ordinal scales. However, the term ordinal emotion label has been used to refer to both absolute notions such as low or high arousal, as well as relation notions such as arousal is higher at one instance compared to another. In this paper, we introduce the terminology absolute and relative ordinal labels to make this distinction clear and investigate both with a view to integrate them and exploit their complementary nature. We propose a Markovian framework referred to as Dynamic Ordinal Markov Model (DOMM) that makes use of both absolute and relative ordinal information, to improve speech based ordinal emotion prediction. Finally, the proposed framework is validated on two speech corpora commonly used in affective computing, the RECOLA and the IEMOCAP databases, across a range of system configurations. The results consistently indicate that integrating relative ordinal information improves absolute ordinal emotion prediction.

arousal, probability, threshold, (15 more...)

arXiv.org Artificial Intelligence

2108.04605

Country:

Oceania > Australia > New South Wales > Sydney (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Oklahoma > Tulsa County > Tulsa (0.04)
(7 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science > Emotion (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.47)

Add feedback

Globally Optimal Hierarchical Reinforcement Learning for Linearly-Solvable Markov Decision Processes

Infante, Guillermo, Jonsson, Anders, Gómez, Vicenç

arXiv.org Artificial IntelligenceAug-10-2021

In this work we present a novel approach to hierarchical reinforcement learning for linearly-solvable Markov decision processes. Our approach assumes that the state space is partitioned, and the subtasks consist in moving between the partitions. We represent value functions on several levels of abstraction, and use the compositionality of subtasks to estimate the optimal values of the states in each partition. The policy is implicitly defined on these optimal value estimates, rather than being decomposed among the subtasks. As a consequence, our approach can learn the globally optimal policy, and does not suffer from the non-stationarity of high-level decisions. If several partitions have equivalent dynamics, the subtasks of those partitions can be shared. If the set of boundary states is smaller than the entire state space, our approach can have significantly smaller sample complexity than that of a flat learner, and we validate this empirically in several experiments.

lmdp, subtask, terminal state, (15 more...)

arXiv.org Artificial Intelligence

2106.1538

Country:

North America > United States (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Asia > Japan > Honshū > Chūbu > Toyama Prefecture > Toyama (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.61)

Add feedback