AITopics

The number of applications of service robotics in public spaces such as hospitals, museums and malls is a growing trend. Public spaces, however, provide several challenges to the robot, and specifically with its planning capabilities: they need to cope with a dynamic and uncertain environment and are subject to particular human-robot interaction constraints. A major challenge is the Joint Intention problem. When cooperating with humans, a persistent commitment to achieve a shared goal cannot be always assumed, since they have an unpredictable behavior and may be distracted in environments as dynamic and uncertain as public spaces, and even more so if the human agents are customers,visitors or bystanders. In order to address such issues in a decision-making context, we present a framework based on Hierarchical Factored POMDPs. We describe the general method for ensuring the Joint Intention between human and robot , the hierarchical structure and the Value Decomposition method adopted to build it.We also provide an example application scenario: an Escort Task in a shopping mall for guiding a customer towards a desired point of interest.

agent, probability, robot, (15 more...)

Country:

Europe > France (0.04)
North America > United States > New York > New York County > New York City (0.04)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Zagorecki, Adam (Cranfield University and Defence Academy of the United Kingdom) | Kozniewski, Marcin (University of Pittsburgh) | Druzdzel, Marek (University of Pittsburgh)

An Approximation of Surprise Index as a Measure of Confidence

Probabilistic graphical models, such as Bayesian networks, are intuitive and theoretically sound tools for modeling uncertainty. A major problem with applying Bayesian networks in practice is that it is hard to judge whether a model fits well a case that it is supposed to solve. One way of expressing a possible dissonance between a model and a case is the {\em surprise index}, proposed by Habbema, which expresses the degree of surprise by the evidence given the model. While this measure reflects the intuition that the probability of a case should be judged in the context of a model, it is computationally intractable. In this paper, we propose an efficient way of approximating the surprise index.

probability, scenario, surprise index, (14 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > United Kingdom (0.14)
Europe > Poland > Podlaskie Province > Bialystok (0.05)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.58)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.58)

Using Lanchester Attrition Laws for Combat Prediction in StarCraft

Stanescu, Marius Adrian (University of Alberta) | Barriga, Nicolas (University of Alberta) | Buro, Michael (University of Alberta)

Smart decision making at the tactical level is important for Artificial Intelligence (AI) agents to perform well in the domain of real-time strategy (RTS) games. Winning battles is crucial in RTS games, and while humans can decide when and how to attack based on their experience, it is challenging for AI agents to estimate combat outcomes accurately. A few existing models address this problem in the game of StarCraft but present many restrictions, such as not modeling injured units, supporting only a small number of unit types, or being able to predict the winner of a fight but not the remaining army. Prediction using simulations is a popular method, but generally slow and requires extensive coding to model the game engine accurately. This paper introduces a model based on Lanchester's attrition laws which addresses the mentioned limitations while being faster than running simulations. Unit strength values are learned using maximum likelihood estimation from past recorded battles. We present experiments that use a StarCraft simulator for generating battles for both training and testing, and show that the model is capable of making accurate predictions. Furthermore, we implemented our method in a StarCraft bot that uses either this or traditional simulations to decide when to attack or to retreat. We present tournament results (against top bots from 2014 AIIDE competition) comparing the performances of the two versions, and show increased winning percentages for our method.

army, lanchester, unit type, (16 more...)

Eleventh Artificial Intelligence and Interactive Digital Entertainment Conference

Country: North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.87)

Zhan, Yusen (Washington State University) | Taylor, Mattew E. (Washington State University)

Online Transfer Learning in Reinforcement Learning Domains

This paper proposes an online transfer framework to capture the interaction among agents and shows that current transfer learning in reinforcement learning is a special case of online transfer. Furthermore, this paper re-characterizes existing agents-teaching-agents methods as online transfer and analyze one such teaching method in three ways. First, the convergence of Q-learning and Sarsa with tabular representation with a finite budget is proven. Second, the convergence of Q-learning and Sarsa with linear function approximation is established. Third, the we show the asymptotic performance cannot be hurt through teaching. Additionally, all theoretical results are empirically validated.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Country:

North America > United States > Washington (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.93)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Wray, Kyle Hollins (University of Massachusetts Amherst) | Zilberstein, Shlomo (University of Massachusetts Amherst)

A Parallel Point-Based POMDP Algorithm Leveraging GPUs

We parallelize the Point-Based Value Iteration (PBVI) algorithm, which approximates the solution to Partially Observable Markov Decision Processes (POMDPs), using a Graphics Processing Unit (GPU). We detail additional optimizations, such as leveraging the bounded size of non-zero values over all belief point vectors, usable by serial and parallel algorithms. We compare serial (CPU) and parallel (GPU) implementations on 10 distinct problem domains, and demonstrate that our approach provides an order of magnitude improvement.

algorithm, artificial intelligence, machine learning, (12 more...)

Country: North America > United States > Massachusetts > Hampshire County > Amherst (0.15)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Exploiting Anonymity in Approximate Linear Programming: Scaling to Large Multiagent MDPs

Robbel, Philipp (Massachusetts Institute of Technology) | Oliehoek, Frans A. (University of Amsterdam) | Kochenderfer, Mykel J. (Stanford University)

The Markov Decision Process (MDP) framework is a versatile method for addressing single and multiagent sequential decision making problems. Many exact and approximate solution methods attempt to exploit structure in the problem and are based on value factorization. Especially multiagent settings (MAS), however, are known to suffer from an exponential increase in value component sizes as interactions become denser, meaning that approximation architectures are overly restricted in the problem sizes and types they can handle. We present an approach to mitigate this limitation for certain types of MASs, exploiting a property that can be thought of as "anonymous influence" in the factored MDP. In particular, we show how anonymity can lead to representational and computational efficiencies, both for general variable elimination in a factor graph but also for the approximate linear programming solution to factored MDPs. The latter allows to scale linear programming to factored MDPs that were previously unsolvable. Our results are shown for a disease control domain over a graph with 50 nodes that are each connected with up to 15 neighbors.

artificial intelligence, machine learning, optimization problem, (18 more...)

Country:

Europe > Netherlands > North Holland > Amsterdam (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > New York (0.04)
(3 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine > Public Health (0.35)
Health & Medicine > Epidemiology (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Reyes, Alberto (Instituto de Investigaciones Electricas) | Ibarguengoytia, Pablo H. (Instituto de Investigaciones Electricas) | Romero, Inés (Instituto de Investigaciones Electricas) | Pech, David (Instituto de Investigaciones Electricas) | Borunda, Mónica (Instituto de Investigaciones Electricas)

Open Questions for Building Optimal Operation Policies for Dam Management Using Factored Markov Decision Processes

In this paper, we present the conceptual model of a realworld application of Markov Decision Processes to dam management. The idea is to demonstrate that it is possible to efficiently automate the construction of operation policies by modelling the problem as a sequential decision problem that can be easily solved using stochastic dynamic programming. We will explain the problem domain and provide an analysis of the resulting value and policy functions. We will also present a useful discussion about the issues that will appear when the conceptual model to be extended into a real-world application.

artificial intelligence, decision support system, machine learning, (16 more...)

Country:

North America > United States > California > San Mateo County > Menlo Park (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
South America > Ecuador (0.05)
(4 more...)

Industry: Energy (0.94)

Technology:

Information Technology > Decision Support Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.62)

Oliehoek, Frans A. (University of Liverpool, University of Amsterdam) | Spaan, Matthijs T. J. (Delft University of Technology) | Robbel, Philipp (Massachusetts Institute of Technology) | Messias, Joao (University of Amsterdam)

The MADP Toolbox: An Open-Source Library for Planning and Learning in (Multi-)Agent Systems

This article describes the MultiAgent Decision Process (MADP) toolbox, a software library to support planning and learning for intelligent agents and multiagent systems in uncertain environments. Some of its key features are that it supports partially observable environments and stochastic transition models; has unified support for single- and multiagent systems; provides a large number of models for decision-theoretic decision making, including one-shot decision making (e.g., Bayesian games) and sequential decision making under various assumptions of observability and cooperation, such as Dec-POMDPs and POSGs; provides tools and parsers to quickly prototype new problems; provides an extensive range of planning and learning algorithms for single-and multiagent systems; and is written in C++ and designed to be extensible via the object-oriented paradigm.

agent, artificial intelligence, machine learning, (13 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > Netherlands > North Holland > Amsterdam (0.05)
Europe > Netherlands > South Holland > Delft (0.05)
Europe > Germany > Berlin (0.04)

Genre: Overview (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.80)

MDPVIS: An Interactive Visualization for Testing Markov Decision Processes

McGregor, Sean (Oregon State University) | Buckingham, Hailey (Oregon State University) | Houtman, Rachel (Oregon State University) | Montgomery, Claire (Oregon State University) | Metoyer, Ronald (Oregon State University ) | Dietterich, Thomas G. (Oregon State University)

Whereas computational steering traditionally A common approach for solving Markov Decision Processes refers to modifying a computer process during its execution is to implement a simulator of the stochastic dynamics of (Mulder, van Wijk, and van Liere 1999), we treat optimization the MDP and a Monte Carlo optimization algorithm that invokes as an open-ended process whose parameters are repeatedly this simulator. The resulting software system is often changed for testing and debugging.

artificial intelligence, human computer interaction, machine learning, (15 more...)

Country: North America > United States > Oregon > Benton County > Corvallis (0.05)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Human Computer Interaction > Interfaces (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.63)

Hausknecht, Matthew (University of Texas at Austin) | Stone, Peter (University of Texas at Austin)

Deep Recurrent Q-Learning for Partially Observable MDPs

Deep Reinforcement Learning has yielded proficient controllers for complex tasks. However, these controllers have limited memory and rely on being able to perceive the complete game screen at each decision point. To address these shortcomings, this article investigates the effects of adding recurrency to a Deep Q-Network (DQN) by replacing the first post-convolutional fully-connected layer with a recurrent LSTM. The resulting Deep Recurrent Q-Network (DRQN), although capable of seeing only a single frame at each timestep, successfully integrates information through time and replicates DQN's performance on standard Atari games and partially observed equivalents featuring flickering game screens. Additionally, when trained with partial observations and evaluated with incrementally more complete observations, DRQN's performance scales as a function of observability. Conversely, when trained with full observations and evaluated with partial observations, DRQN's performance degrades less than DQN's. Thus, given the same length of history, recurrency is a viable alternative to stacking a history of frames in the DQN's input layer and while recurrency confers no systematic advantage when learning to play the game, the recurrent net can better adapt at evaluation time if the quality of observations changes.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

Country: North America > United States > Texas > Travis County > Austin (0.04)

Genre: Research Report (0.86)

Industry:

Leisure & Entertainment > Sports (1.00)
Leisure & Entertainment > Games > Computer Games (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.50)