AITopics | Markov Models

Collaborating Authors

Markov Models

News Overviews Instructional Materials AI-Alerts Classics

Mixing Time Estimation in Reversible Markov Chains from a Single Sample Path

Hsu, Daniel, Kontorovich, Aryeh, Szepesvári, Csaba

arXiv.org Machine LearningNov-2-2015

This article provides the first procedure for computing a fully data-dependent interval that traps the mixing time $t_{\text{mix}}$ of a finite reversible ergodic Markov chain at a prescribed confidence level. The interval is computed from a single finite-length sample path from the Markov chain, and does not require the knowledge of any parameters of the chain. This stands in contrast to previous approaches, which either only provide point estimates, or require a reset mechanism, or additional prior knowledge. The interval is constructed around the relaxation time $t_{\text{relax}}$, which is strongly related to the mixing time, and the width of the interval converges to zero roughly at a $\sqrt{n}$ rate, where $n$ is the length of the sample path. Upper and lower bounds are given on the number of samples required to achieve constant-factor multiplicative accuracy. The lower bounds indicate that, unless further restrictions are placed on the chain, no procedure can achieve this accuracy level before seeing each state at least $\Omega(t_{\text{relax}})$ times on the average. Finally, future directions of research are identified.

confidence interval, diag, markov chain, (17 more...)

arXiv.org Machine Learning

1506.02903

Country:

North America > Canada > Alberta (0.14)
Europe > Montenegro (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

How Is Cooperation/Collusion Sustained in Repeated Multimarket Contact with Observation Errors?

Iwasaki, Atsushi (University of Electro-Communications) | Sekiguchi, Tadashi (Kyoto University) | Yamamoto, Shun (Kyushu University) | Yokoo, Makoto (Kyushu University)

AAAI ConferencesNov-1-2015

This paper analyzes repeated multimarket contact with observation errors where two players operate in multiple markets simultaneously. Multimarket contact has received much attention from the literature of economics,management, and information systems. Despite vast empirical studies that examine whether multimarket contact fosters cooperation/collusion, little is theoretically known as to how players behave in an equilibrium when each player receives a noisy observation of other firms’ actions. This paper tackles an essentially realistic situation where the players do not share common information; each player may observe a different signal (private monitoring). Thus, players have difficulty in having a common understanding about which market their opponent should be punished in and when punishment should be started and ended. We first theoretically show that an extension of 1-period mutual punishment (1MP) for an arbitrary number of markets can be an equilibrium. Second, by applying a verification method, we identify a simple equilibrium strategy called "locally cautioning (LC)" that restores collusion after observation error or deviation. We then numerically reveal that LC significantly outperforms 1MP and achieves the highest degree of collusion.

equilibrium, monitoring, payoff, (16 more...)

AAAI Conferences

2015 AAAI Fall Symposium Series

Country:

Asia > Japan > Kyūshū & Okinawa > Kyūshū (0.04)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

Hierarchical Factored POMDP for Joint Tasks: Application to Escort Tasks

Ferrari, Fabio-Valerio (University of Caen Basse-Normandie) | Mouaddib, Abdel-Illah (University of Caen Basse-Normandie)

AAAI ConferencesNov-1-2015

The number of applications of service robotics in public spaces such as hospitals, museums and malls is a growing trend. Public spaces, however, provide several challenges to the robot, and specifically with its planning capabilities: they need to cope with a dynamic and uncertain environment and are subject to particular human-robot interaction constraints. A major challenge is the Joint Intention problem. When cooperating with humans, a persistent commitment to achieve a shared goal cannot be always assumed, since they have an unpredictable behavior and may be distracted in environments as dynamic and uncertain as public spaces, and even more so if the human agents are customers,visitors or bystanders. In order to address such issues in a decision-making context, we present a framework based on Hierarchical Factored POMDPs. We describe the general method for ensuring the Joint Intention between human and robot , the hierarchical structure and the Value Decomposition method adopted to build it.We also provide an example application scenario: an Escort Task in a shopping mall for guiding a customer towards a desired point of interest.

agent, probability, robot, (15 more...)

AAAI Conferences

2015 AAAI Fall Symposium Series

Country:

Europe > France (0.04)
North America > United States > New York > New York County > New York City (0.04)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Online Transfer Learning in Reinforcement Learning Domains

Zhan, Yusen (Washington State University) | Taylor, Mattew E. (Washington State University)

AAAI ConferencesNov-1-2015

This paper proposes an online transfer framework to capture the interaction among agents and shows that current transfer learning in reinforcement learning is a special case of online transfer. Furthermore, this paper re-characterizes existing agents-teaching-agents methods as online transfer and analyze one such teaching method in three ways. First, the convergence of Q-learning and Sarsa with tabular representation with a finite budget is proven. Second, the convergence of Q-learning and Sarsa with linear function approximation is established. Third, the we show the asymptotic performance cannot be hurt through teaching. Additionally, all theoretical results are empirically validated.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

AAAI Conferences

2015 AAAI Fall Symposium Series

Country:

North America > United States > Washington (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.93)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

A Parallel Point-Based POMDP Algorithm Leveraging GPUs

Wray, Kyle Hollins (University of Massachusetts Amherst) | Zilberstein, Shlomo (University of Massachusetts Amherst)

AAAI ConferencesNov-1-2015

We parallelize the Point-Based Value Iteration (PBVI) algorithm, which approximates the solution to Partially Observable Markov Decision Processes (POMDPs), using a Graphics Processing Unit (GPU). We detail additional optimizations, such as leveraging the bounded size of non-zero values over all belief point vectors, usable by serial and parallel algorithms. We compare serial (CPU) and parallel (GPU) implementations on 10 distinct problem domains, and demonstrate that our approach provides an order of magnitude improvement.

algorithm, artificial intelligence, machine learning, (12 more...)

AAAI Conferences

2015 AAAI Fall Symposium Series

Country: North America > United States > Massachusetts > Hampshire County > Amherst (0.15)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Exploiting Anonymity in Approximate Linear Programming: Scaling to Large Multiagent MDPs

Robbel, Philipp (Massachusetts Institute of Technology) | Oliehoek, Frans A. (University of Amsterdam) | Kochenderfer, Mykel J. (Stanford University)

AAAI ConferencesNov-1-2015

The Markov Decision Process (MDP) framework is a versatile method for addressing single and multiagent sequential decision making problems. Many exact and approximate solution methods attempt to exploit structure in the problem and are based on value factorization. Especially multiagent settings (MAS), however, are known to suffer from an exponential increase in value component sizes as interactions become denser, meaning that approximation architectures are overly restricted in the problem sizes and types they can handle. We present an approach to mitigate this limitation for certain types of MASs, exploiting a property that can be thought of as "anonymous influence" in the factored MDP. In particular, we show how anonymity can lead to representational and computational efficiencies, both for general variable elimination in a factor graph but also for the approximate linear programming solution to factored MDPs. The latter allows to scale linear programming to factored MDPs that were previously unsolvable. Our results are shown for a disease control domain over a graph with 50 nodes that are each connected with up to 15 neighbors.

artificial intelligence, machine learning, optimization problem, (18 more...)

AAAI Conferences

2015 AAAI Fall Symposium Series

Country:

Europe > Netherlands > North Holland > Amsterdam (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > New York (0.04)
(3 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine > Public Health (0.35)
Health & Medicine > Epidemiology (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Open Questions for Building Optimal Operation Policies for Dam Management Using Factored Markov Decision Processes

Reyes, Alberto (Instituto de Investigaciones Electricas) | Ibarguengoytia, Pablo H. (Instituto de Investigaciones Electricas) | Romero, Inés (Instituto de Investigaciones Electricas) | Pech, David (Instituto de Investigaciones Electricas) | Borunda, Mónica (Instituto de Investigaciones Electricas)

AAAI ConferencesNov-1-2015

In this paper, we present the conceptual model of a realworld application of Markov Decision Processes to dam management. The idea is to demonstrate that it is possible to efficiently automate the construction of operation policies by modelling the problem as a sequential decision problem that can be easily solved using stochastic dynamic programming. We will explain the problem domain and provide an analysis of the resulting value and policy functions. We will also present a useful discussion about the issues that will appear when the conceptual model to be extended into a real-world application.

artificial intelligence, decision support system, machine learning, (16 more...)

AAAI Conferences

2015 AAAI Fall Symposium Series

Country:

North America > United States > California > San Mateo County > Menlo Park (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
South America > Ecuador (0.05)
(4 more...)

Industry: Energy (0.94)

Technology:

Information Technology > Decision Support Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.62)

Add feedback

The MADP Toolbox: An Open-Source Library for Planning and Learning in (Multi-)Agent Systems

Oliehoek, Frans A. (University of Liverpool, University of Amsterdam) | Spaan, Matthijs T. J. (Delft University of Technology) | Robbel, Philipp (Massachusetts Institute of Technology) | Messias, Joao (University of Amsterdam)

AAAI ConferencesNov-1-2015

This article describes the MultiAgent Decision Process (MADP) toolbox, a software library to support planning and learning for intelligent agents and multiagent systems in uncertain environments. Some of its key features are that it supports partially observable environments and stochastic transition models; has unified support for single- and multiagent systems; provides a large number of models for decision-theoretic decision making, including one-shot decision making (e.g., Bayesian games) and sequential decision making under various assumptions of observability and cooperation, such as Dec-POMDPs and POSGs; provides tools and parsers to quickly prototype new problems; provides an extensive range of planning and learning algorithms for single-and multiagent systems; and is written in C++ and designed to be extensible via the object-oriented paradigm.

agent, artificial intelligence, machine learning, (13 more...)

AAAI Conferences

2015 AAAI Fall Symposium Series

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > Netherlands > North Holland > Amsterdam (0.05)
Europe > Netherlands > South Holland > Delft (0.05)
Europe > Germany > Berlin (0.04)

Genre: Overview (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.80)

Add feedback

MDPVIS: An Interactive Visualization for Testing Markov Decision Processes

McGregor, Sean (Oregon State University) | Buckingham, Hailey (Oregon State University) | Houtman, Rachel (Oregon State University) | Montgomery, Claire (Oregon State University) | Metoyer, Ronald (Oregon State University ) | Dietterich, Thomas G. (Oregon State University)

AAAI ConferencesNov-1-2015

Whereas computational steering traditionally A common approach for solving Markov Decision Processes refers to modifying a computer process during its execution is to implement a simulator of the stochastic dynamics of (Mulder, van Wijk, and van Liere 1999), we treat optimization the MDP and a Monte Carlo optimization algorithm that invokes as an open-ended process whose parameters are repeatedly this simulator. The resulting software system is often changed for testing and debugging.

artificial intelligence, human computer interaction, machine learning, (15 more...)

AAAI Conferences

2015 AAAI Fall Symposium Series

Country: North America > United States > Oregon > Benton County > Corvallis (0.05)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Human Computer Interaction > Interfaces (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.63)

Add feedback

Deep Recurrent Q-Learning for Partially Observable MDPs

Hausknecht, Matthew (University of Texas at Austin) | Stone, Peter (University of Texas at Austin)

AAAI ConferencesNov-1-2015

Deep Reinforcement Learning has yielded proficient controllers for complex tasks. However, these controllers have limited memory and rely on being able to perceive the complete game screen at each decision point. To address these shortcomings, this article investigates the effects of adding recurrency to a Deep Q-Network (DQN) by replacing the first post-convolutional fully-connected layer with a recurrent LSTM. The resulting Deep Recurrent Q-Network (DRQN), although capable of seeing only a single frame at each timestep, successfully integrates information through time and replicates DQN's performance on standard Atari games and partially observed equivalents featuring flickering game screens. Additionally, when trained with partial observations and evaluated with incrementally more complete observations, DRQN's performance scales as a function of observability. Conversely, when trained with full observations and evaluated with partial observations, DRQN's performance degrades less than DQN's. Thus, given the same length of history, recurrency is a viable alternative to stacking a history of frames in the DQN's input layer and while recurrency confers no systematic advantage when learning to play the game, the recurrent net can better adapt at evaluation time if the quality of observations changes.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

AAAI Conferences

2015 AAAI Fall Symposium Series

Country: North America > United States > Texas > Travis County > Austin (0.04)

Genre: Research Report (0.86)

Industry:

Leisure & Entertainment > Sports (1.00)
Leisure & Entertainment > Games > Computer Games (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.50)

Add feedback