AITopics | Markov Models

Collaborating Authors

Markov Models

News Overviews Instructional Materials AI-Alerts Classics

Regret Bounds for Learning State Representations in Reinforcement Learning

Ortner, Ronald, Pirotta, Matteo, Lazaric, Alessandro, Fruit, Ronan, Maillard, Odalric-Ambrym

Neural Information Processing SystemsMar-19-2020, 01:48:12 GMT

We consider the problem of online reinforcement learning when several state representations (mapping histories to a discrete state space) are available to the learning agent. At least one of these representations is assumed to induce a Markov decision process (MDP), and the performance of the agent is measured in terms of cumulative regret against the optimal policy giving the highest average reward in this MDP representation. We propose an algorithm (UCB-MS) with O(sqrt(T)) regret in any communicating Markov decision process. The regret bound shows that UCB-MS automatically adapts to the Markov model. This improves over the currently known best results in the literature that gave regret bounds of order O(T (2/3)).

learning state representation, regret bound, reinforcement learning, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)

Add feedback

Planning in entropy-regularized Markov decision processes and games

Grill, Jean-Bastien, Domingues, Omar Darwiche, Menard, Pierre, Munos, Remi, Valko, Michal

Neural Information Processing SystemsMar-19-2020, 01:46:13 GMT

We propose SmoothCruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the SmoothCruiser. SmoothCruiser makes use of the smoothness of the Bellman operator promoted by the regularization to achieve problem-independent sample complexity of order $\tilde{\mathcal{O}}(1/\epsilon 4)$ for a desired accuracy $\epsilon$, whereas for non-regularized settings there are no known algorithms with guaranteed polynomial sample complexity in the worst case. Papers published at the Neural Information Processing Systems Conference.

entropy-regularized markov decision process, markov decision process and game, sample complexity, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.70)

Add feedback

Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies

Efroni, Yonathan, Merlis, Nadav, Ghavamzadeh, Mohammad, Mannor, Shie

Neural Information Processing SystemsMar-19-2020, 01:33:09 GMT

State-of-the-art efficient model-based Reinforcement Learning (RL) algorithms typically act by iteratively solving empirical models, i.e., by performing full-planning on Markov Decision Processes (MDPs) built by the gathered experience. In this paper, we focus on model-based RL in the finite-state finite-horizon MDP setting and establish that exploring with greedy policies -- act by 1-step planning -- can achieve tight minimax performance in terms of regret, O(\sqrt{HSAT}). Thus, full-planning in model-based RL can be avoided altogether without any performance degradation, and, by doing so, the computational complexity decreases by a factor of S. The results are based on a novel analysis of real-time dynamic programming, then extended to model-based RL. Specifically, we generalize existing algorithms that perform full-planning to such that act by 1-step planning. For these generalizations, we prove regret bounds with the same rate as their full-planning counterparts.

model-based reinforcement learning, reinforcement learning, tight regret bound, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.65)

Add feedback

Statistical Model Aggregation via Parameter Matching

Yurochkin, Mikhail, Agarwal, Mayank, Ghosh, Soumya, Greenewald, Kristjan, Hoang, Nghia

Neural Information Processing SystemsMar-19-2020, 01:03:03 GMT

We consider the problem of aggregating models learned from sequestered, possibly heterogeneous datasets. Exploiting tools from Bayesian nonparametrics, we develop a general meta-modeling framework that learns shared global latent structures by identifying correspondences among local model parameterizations. Our proposed framework is model-independent and is applicable to a wide range of model types. After verifying our approach on simulated data, we demonstrate its utility in aggregating Gaussian topic models, hierarchical Dirichlet process based hidden Markov models, and sparse Gaussian processes with applications spanning text summarization, motion capture analysis, and temperature forecasting. Papers published at the Neural Information Processing Systems Conference.

parameter matching, statistical model aggregation

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Sampling Networks and Aggregate Simulation for Online POMDP Planning

Cui, Hao(Jackson), Khardon, Roni

Neural Information Processing SystemsMar-19-2020, 00:17:35 GMT

The paper introduces a new algorithm for planning in partially observable Markov decision processes (POMDP) based on the idea of aggregate simulation. The algorithm uses product distributions to approximate the belief state and shows how to build a representation graph of an approximate action-value function over belief space. The algorithm supports large observation spaces using sampling networks, a representation of the process of sampling values of observations, which is integrated into the graph representation. Following previous work in MDPs this approach enables action selection in POMDPs through gradient optimization over the graph representation. This approach complements recent algorithms for POMDPs which are based on particle representations of belief states and an explicit search for action selection.

online pomdp planning, representation, sampling network and aggregate simulation, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Parameter elimination in particle Gibbs sampling

Wigren, Anna, Risuleo, Riccardo Sven, Murray, Lawrence, Lindsten, Fredrik

Neural Information Processing SystemsMar-19-2020, 00:16:00 GMT

Bayesian inference in state-space models is challenging due to high-dimensional state trajectories. A viable approach is particle Markov chain Monte Carlo (PMCMC), combining MCMC and sequential Monte Carlo to form exact approximations'' to otherwise-intractable MCMC methods. The performance of the approximation is limited to that of the exact method. We focus on particle Gibbs (PG) and particle Gibbs with ancestor sampling (PGAS), improving their performance beyond that of the ideal Gibbs sampler (which they approximate) by marginalizing out one or more parameters. This is possible when the parameter(s) has a conjugate prior relationship with the complete data likelihood.

Add feedback

Robust Deep Reinforcement Learning against Adversarial Perturbations on Observations

Zhang, Huan, Chen, Hongge, Xiao, Chaowei, Li, Bo, Boning, Duane, Hsieh, Cho-Jui

arXiv.org Machine LearningMar-19-2020

Deep Reinforcement Learning (DRL) is vulnerable to small adversarial perturbations on state observations. These perturbations do not alter the environment directly but can mislead the agent into making suboptimal decisions. We analyze the Markov Decision Process (MDP) under this threat model and utilize tools from the neural net-work verification literature to enable robust train-ing for DRL under observational perturbations. Our techniques are general and can be applied to both Deep Q Networks (DQN) and Deep Deterministic Policy Gradient (DDPG) algorithms for discrete and continuous action control problems. We demonstrate that our proposed training procedure significantly improves the robustness of DQN and DDPG agents under a suite of strong white-box attacks on observations, including a few novel attacks we specifically craft. Additionally, our training procedure can produce provable certificates for the robustness of a Deep RL agent.

adversarial perturbation, adversary, robust deep reinforcement learning, (8 more...)

arXiv.org Machine Learning

2003.08938

Country:

North America > United States > Michigan (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

Rashid, Tabish, Samvelyan, Mikayel, de Witt, Christian Schroeder, Farquhar, Gregory, Foerster, Jakob, Whiteson, Shimon

arXiv.org Machine LearningMar-19-2020

In many real-world settings, a team of agents must coordinate its behaviour while acting in a decentralised fashion. At the same time, it is often possible to train the agents in a centralised fashion where global state information is available and communication constraints are lifted. Learning joint action-values conditioned on extra state information is an attractive way to exploit centralised learning, but the best strategy for then extracting decentralised policies is unclear. Our solution is QMIX, a novel value-based method that can train decentralised policies in a centralised end-to-end fashion. QMIX employs a mixing network that estimates joint action-values as a monotonic combination of per-agent values. We structurally enforce that the joint-action value is monotonic in the per-agent values, through the use of non-negative weights in the mixing network, which guarantees consistency between the centralised and decentralised policies. To evaluate the performance of QMIX, we propose the StarCraft Multi-Agent Challenge (SMAC) as a new benchmark for deep multi-agent reinforcement learning. We evaluate QMIX on a challenging set of SMAC scenarios and show that it significantly outperforms existing multi-agent reinforcement learning methods.

agent, qmix, scenario, (15 more...)

arXiv.org Machine Learning

2003.08839

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > Denmark (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Probabilistic Logic Neural Networks for Reasoning

Qu, Meng, Tang, Jian

Neural Information Processing SystemsMar-18-2020, 23:46:09 GMT

Knowledge graph reasoning, which aims at predicting missing facts through reasoning with observed facts, is critical for many applications. Such a problem has been widely explored by traditional logic rule-based approaches and recent knowledge graph embedding methods. A principled logic rule-based approach is the Markov Logic Network (MLN), which is able to leverage domain knowledge with first-order logic and meanwhile handle uncertainty. However, the inference in MLNs is usually very difficult due to the complicated graph structures. TransE, DistMult) learn effective entity and relation embeddings for reasoning, which are much more effective and efficient.

knowledge graph, probabilistic logic neural network, reasoning, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.70)

Add feedback

Maximum Expected Hitting Cost of a Markov Decision Process and Informativeness of Rewards

Dai, Falcon, Walter, Matthew

Neural Information Processing SystemsMar-18-2020, 23:45:59 GMT

We propose a new complexity measure for Markov decision processes (MDPs), the maximum expected hitting cost (MEHC). This measure tightens the closely related notion of diameter [JOA10] by accounting for the reward structure. We show that this parameter replaces diameter in the upper bound on the optimal value span of an extended MDP, thus refining the associated upper bounds on the regret of several UCRL2-like algorithms. Furthermore, we show that potential-based reward shaping [NHR99] can induce equivalent reward functions with varying informativeness, as measured by MEHC. By analyzing the change in the maximum expected hitting cost, this work presents a formal understanding of the effect of potential-based reward shaping on regret (and sample complexity) in the undiscounted average reward setting.

hitting cost, markov decision process and informativeness, potential-based reward, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Decision Support Systems (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.66)

Add feedback