AITopics

This article describes the MultiAgent Decision Process (MADP) toolbox, a software library to support planning and learning for intelligent agents and multiagent systems in uncertain environments. Some of its key features are that it supports partially observable environments and stochastic transition models; has unified support for single- and multiagent systems; provides a large number of models for decision-theoretic decision making, including one-shot decision making (e.g., Bayesian games) and sequential decision making under various assumptions of observability and cooperation, such as Dec-POMDPs and POSGs; provides tools and parsers to quickly prototype new problems; provides an extensive range of planning and learning algorithms for single-and multiagent systems; and is written in C++ and designed to be extensible via the object-oriented paradigm.

agent, artificial intelligence, machine learning, (13 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > Netherlands > North Holland > Amsterdam (0.05)
Europe > Netherlands > South Holland > Delft (0.05)
Europe > Germany > Berlin (0.04)

Genre: Overview (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.80)

MDPVIS: An Interactive Visualization for Testing Markov Decision Processes

McGregor, Sean (Oregon State University) | Buckingham, Hailey (Oregon State University) | Houtman, Rachel (Oregon State University) | Montgomery, Claire (Oregon State University) | Metoyer, Ronald (Oregon State University ) | Dietterich, Thomas G. (Oregon State University)

Whereas computational steering traditionally A common approach for solving Markov Decision Processes refers to modifying a computer process during its execution is to implement a simulator of the stochastic dynamics of (Mulder, van Wijk, and van Liere 1999), we treat optimization the MDP and a Monte Carlo optimization algorithm that invokes as an open-ended process whose parameters are repeatedly this simulator. The resulting software system is often changed for testing and debugging.

artificial intelligence, human computer interaction, machine learning, (15 more...)

Country: North America > United States > Oregon > Benton County > Corvallis (0.05)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Human Computer Interaction > Interfaces (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.63)

Hausknecht, Matthew (University of Texas at Austin) | Stone, Peter (University of Texas at Austin)

Deep Recurrent Q-Learning for Partially Observable MDPs

Deep Reinforcement Learning has yielded proficient controllers for complex tasks. However, these controllers have limited memory and rely on being able to perceive the complete game screen at each decision point. To address these shortcomings, this article investigates the effects of adding recurrency to a Deep Q-Network (DQN) by replacing the first post-convolutional fully-connected layer with a recurrent LSTM. The resulting Deep Recurrent Q-Network (DRQN), although capable of seeing only a single frame at each timestep, successfully integrates information through time and replicates DQN's performance on standard Atari games and partially observed equivalents featuring flickering game screens. Additionally, when trained with partial observations and evaluated with incrementally more complete observations, DRQN's performance scales as a function of observability. Conversely, when trained with full observations and evaluated with partial observations, DRQN's performance degrades less than DQN's. Thus, given the same length of history, recurrency is a viable alternative to stacking a history of frames in the DQN's input layer and while recurrency confers no systematic advantage when learning to play the game, the recurrent net can better adapt at evaluation time if the quality of observations changes.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

Country: North America > United States > Texas > Travis County > Austin (0.04)

Genre: Research Report (0.86)

Industry:

Leisure & Entertainment > Sports (1.00)
Leisure & Entertainment > Games > Computer Games (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.50)

Commitment Semantics for Sequential Decision Making Under Reward Uncertainty

Durfee, Edmund H. (University of Michigan) | Singh, Satinder (University of Michigan)

A commitment represents an agent's intention to attempt to bring about some state of the world that is desired by some agent (possibly itself) in the future. Thus, by making a commitment, an agent is agreeing to make sequential decisions that it believes can cause the desired state to arise. In general, though, an agent's actions will have uncertain outcomes, and thus reaching the desired state cannot be guaranteed. For such sequential decision settings with uncertainty, therefore, commitments can only be probabilistic. We argue that standard notions of commitment are insufficient for probabilistic commitments, and propose a new semantics that judges commitment fulfillment not in terms of whether the agent achieved the desired state, but rather in terms of whether the agent made sequential decisions that in expectation would have achieved the desired state with (at least) the promised probability. We have devised various algorithms that operationalize our semantics, to capture problem contexts with probabilistic commitments arising because action outcomes are uncertain, as well as arising because an agent might realize over time that it does not want to fulfill the commitment.

agent, artificial intelligence, machine learning, (17 more...)

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
Europe > Slovenia > Central Slovenia > Municipality of Komenda > Komenda (0.04)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Probabilistic Planning for Decentralized Multi-Robot Systems

Amato, Christopher (University of New Hampshire) | Konidaris, George (Duke University) | Omidshafiei, Shayegan (Massachusetts Institute of Technology) | Agha-mohammadi, Ali-akbar (Qualcomm Research) | How, Jonathan P. (Massachusetts Institute of Technology) | Kaelbling, Leslie P. (Massachusetts Institute of Technology)

Multi-robot systems are an exciting application domain for AI research and Dec-POMDPs, specifically. MacDec-POMDP methods can produce high-quality general solutions for realistic heterogeneous multi-robot coordination problems by automatically generating control and communication policies, given a model. In contrast to most existing multi-robot methods that are specialized to a particular problem class, our approach can synthesize policies that exploit any opportunities for coordination that are present in the problem, while balancing uncertainty, sensor information, and information about other agents.

artificial intelligence, machine learning, robot, (18 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.06)
North America > United States > North Carolina > Durham County > Durham (0.05)
North America > United States > New Hampshire (0.05)
North America > United States > California > San Diego County > San Diego (0.05)

Industry: Transportation (0.31)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.98)

Allen, Marty (University of Wisconsin-La Crosse)

Complexity of Self-Preserving, Team-Based Competition in Partially Observable Stochastic Games

Partially observable stochastic games (POSGs) are a robust and precise model for decentralized decision making under conditions of imperfect information, and extend popular Markov decision problem models. Complexity results for a wide range of such problems are known when agents work cooperatively to pursue common interests. When agents compete, things are less well understood. We show that under one understanding of rational competition, such problems are complete for the class NEXP^NP. This result holds for any such problem comprised of two competing teams of agents, where teams may be of any size whatsoever.

agent, artificial intelligence, machine learning, (18 more...)

Country:

North America > United States > Wisconsin > La Crosse County > La Crosse (0.14)
North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > United States > Massachusetts > Middlesex County > Reading (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

Walraven, Erwin (Delft University of Technology) | Spaan, Matthijs T. J. (Delft University of Technology)

Planning Under Uncertainty with Weighted State Scenarios

External factors are hard to model using a Markovian state in several real-world planning domains. Although planning can be difficult in such domains, it may be possible to exploit long-term dependencies between states of the environment during planning. We introduce weighted state scenarios to model long-term sequences of states, and we use a model based on a Partially Observable Markov Decision Process to reason about scenarios during planning. Experiments show that our model outperforms other methods for decision making in two real-world domains.

artificial intelligence, machine learning, scenario, (14 more...)

Country:

Europe > Netherlands > South Holland > Delft (0.07)
Europe > Spain (0.05)

Industry: Energy > Power Industry (0.73)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Robotic Social Feedback for Object Specification

Issuing and following instructions is a common task in many forms of both human-human and human-robot collaboration. With two human participants, the accuracy of instruction following increases if the collaborators can monitor the state of their partners and respond to them through conversation (Clark and Krych 2004), a process we call social feedback. Despite this benefit in human-human interaction, current human-robot collaboration systems process instructions in non-incremental batches, which can achieve good accuracy but does not allow for reactive feedback (Tellex et al. 2011; Matuszek et al. 2012; Tellex et al. 2012; Misra et al.2014). In this paper, we show that giving a robot the ability to ask the user questions results in responsive conversations and allows the robot to quickly determine the object that the user desires. This social feedback loop between person and robot allows a person to create an internal model for the robot’s mental state and adapt their own behavior to better inform the robot. To close the human-robot feedback loop, we employ a Partially Observable Markov Decision Process (POMDP) to produce a policy which will lead to the determination of the object in the shortest amount of time. To test our approach, we perform user studies to measure our robot’s ability to deliver common household items requested by the participant. We compare delivery speed and accuracy both with and without social feedback.

artificial intelligence, machine learning, robot, (18 more...)

Country:

North America > United States > Rhode Island > Providence County > Providence (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Los Angeles County > Santa Monica (0.04)
(4 more...)

Genre: Research Report (0.93)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Freedman, Richard G. (University of Massachusetts Amherst) | Jung, Hee-Tae (University of Massachusetts Amherst) | Zilberstein, Shlomo (University of Massachusetts Amherst)

Temporal and Object Relations in Unsupervised Plan and Activity Recognition

We consider ways to improve the performance of unsupervised plan and activity recognition techniques by considering temporal and object relations in addition to postural data. Temporal relationships can help recognize activities with cyclic structure and are often implicit because plans have degrees of ordering actions. Relations with objects can help disambiguate observed activities that otherwise share a user's posture and position. We develop and investigate graphical models that extend the popular latent Dirichlet allocation approach with temporal and object relations, examine the relative performance and runtime trade-offs using a standard dataset, and consider the cost/benefit trade-offs these extensions offer in the context of human-robot and humancomputer interaction.

lda, machine learning, natural language, (20 more...)

Country:

Asia > Middle East > Jordan (0.04)
South America > Paraguay > Asunción > Asunción (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.94)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)

Minecraft as an Experimental World for AI in Robotics

Aluru, Krishna Chaitanya (Brown University) | Tellex, Stefanie (Brown University) | Oberlin, John (Brown University) | MacGlashan, James (Brown University)

Performing experimental research on robotic platforms involves numerous practical complications, while studying collaborative interactions and efficiently collecting data from humans benefit from real time response. Roboticists can circumvent some complications by using simulators like Gazebo to test algorithms and building games like the Mars Escape game to collect data. Making use of existing resources for simulation and game creation requires the development of assets and algorithms along with the recruitment and training of users. We have created a Minecraft mod called BurlapCraft which enables the use of the reinforcement learning and planning library BURLAP to model and solve different tasks within Minecraft. BurlapCraft makes AI-HRI development easier in three core ways: the underlying Minecraft environment makes the construction of experiments simple for the developer and so allows the rapid prototyping of experimental setup; BURLAP contributes a wide variety of extensible algorithms for learning and planning, allowing easy iteration and development of task models and algorithms; and the familiarity and ubiquity of Minecraft trivializes the recruitment and training of users. To validate BurlapCraft as a platform for AI development, we demonstrate the execution of A*, BFS, RMax, language understanding, and learning language groundings from user demonstrations in five Minecraft "dungeons."

artificial intelligence, machine learning, natural language, (18 more...)

Country:

North America > United States > Rhode Island > Providence County > Providence (0.04)
North America > United States > New Jersey > Middlesex County > Piscataway (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Games > Computer Games (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)