AITopics

We consider the problem of how to plan efficiently in low-level, continuous state spaces with temporally abstract actions (or skills), by constructing abstract representations of the problem suitable for task-level planning.The central question this effort poses is which abstract representations are required to express and evaluate plans composed of sequences of skills. We show that classifiers can be used as a symbolic representation system, and that the ability to represent the preconditions and effects of an agent's skills is both necessary and sufficient for task-level planning.The resulting representations allow a reinforcement learning agent to acquire a symbolic representation appropriate for planning from experience.

machine learning, reinforcement learning, task-level planning, (2 more...)

Workshops at the Twenty-Seventh AAAI Conference on Artificial Intelligence

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.53)

GSMDPs for Multi-Robot Sequential Decision-Making

Messias, João Vicente (Instituto Superior Técnico) | Spaan, Matthijs (Delft University of Technology) | Lima, Pedro (Instituto Superior Técnico)

Markov Decision Processes (MDPs) provide an extensive theoretical background for problems of decision-making under uncertainty. In order to maintain computational tractability, however, real-world problems are typically discretized in states and actions as well as in time. Assuming synchronous state transitions and actions at fixed rates may result in models which are not strictly Markovian, or where agents are forced to idle between actions, losing their ability to react to sudden changes in the environment. In this work, we explore the application of Generalized Semi-Markov Decision Processes (GSMDPs) to a realistic multi-robot scenario. A case study will be presented in the domain of cooperative robotics, where real-time reactivity must be preserved, and synchronous discrete-time approaches are therefore sub-optimal. This case study is tested on a team of real robots, and also in realistic simulation. By allowing asynchronous events to be modeled over continuous time, the GSMDP approach is shown to provide greater solution quality than its discrete-time counterparts, while still being approximately solvable by existing methods.

machine learning, multi-robot sequential decision-making, reinforcement learning, (2 more...)

Twenty-Seventh AAAI Conference on Artificial Intelligence

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Bruno, Danilo (Istituto Italiano di Tecnologia (IIT)) | Calinon, Sylvain (Istituto Italiano di Tecnologia (IIT)) | Caldwell, Darwin G. (Istituto Italiano di Tecnologia (IIT))

Bayesian Nonparametric Multi-Optima Policy Search in Reinforcement Learning

bayesian nonparametric multi-optima policy search, machine learning, reinforcement learning, (1 more...)

Skills can often be performed in many different ways. In order to provide robots with human-like adaptation capabilities, it is of great interest to learn several ways of achieving the same skills in parallel, since eventual changes in the environment or in the robot can make some solutions unfeasible. In this case, the knowledge of multiple solutions can avoid relearning the task. This problem is addressed in this paper within the framework of Reinforcement Learning, as the automatic determination of multiple optimal parameterized policies. For this purpose, a model handling a variable number of policies is built using a Bayesian non-parametric approach. The algorithm is first compared to single policy algorithms on known benchmarks. It is then applied to a typical robotic problem presenting multiple solutions.

Twenty-Seventh AAAI Conference on Artificial Intelligence

Technology:

Information Technology > Artificial Intelligence > Robots (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.60)

PAC Optimal Planning for Invasive Species Management: Improved Exploration for Reinforcement Learning from Simulator-Defined MDPs

Dietterich, Thomas G. (Oregon State University) | Taleghan, Majid Alkaee (Oregon State University) | Crowley, Mark (Oregon State University)

Often the most practical way to define a Markov Decision Process (MDP) is as a simulator that, given a state and an action, produces a resulting state and immediate reward sampled from the corresponding distributions. Simulators in natural resource management can be very expensive to execute, so that the time required to solve such MDPs is dominated by the number of calls to the simulator. This paper presents an algorithm, DDV, that combines improved confidence intervals on the Q values (as in interval estimation) with a novel upper bound on the discounted state occupancy probabilities to intelligently choose state-action pairs to explore. We prove that this algorithm terminates with a policy whose value is within epsilon of the optimal policy (with probability 1-delta) after making only polynomially-many calls to the simulator. Experiments on one benchmark MDP and on an MDP for invasive species management show very large reductions in the number of simulator calls required.

artificial intelligence, machine learning, reinforcement learning, (4 more...)

Twenty-Seventh AAAI Conference on Artificial Intelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Gold, Kevin (MIT Lincoln Laboratory)

Learning to Efficiently Pursue Communication Goals on the Web with the GOSMR Architecture

We present GOSMR ("goal oriented scenario modeling robots"), a cognitive architecture designed to show coordinated, goal-directed behavior over the Internet, focusing on the web browser as a case study. The architecture combines a variety of artificial intelligence techniques, including planning, temporal difference learning, elementary reasoning over uncertainty, and natural language parsing, but is designed to be computationally lightweight. Its intended use is to be deployed on virtual machines in large-scale network experiments in which simulated users' adaptation in the face of resource denial should be intelligent but varied. The planning system performs temporal difference learning of action times, discounts goals according to hyperbolic discounting of time-to-completion and chance of success, takes into account the assertions of other agents, and separates abstract action from site-specific affordances. Our experiment, in which agents learn to prefer a social networking style site for sending and receiving messages, shows that utility-proportional goal selection is a reasonable alternative to Boltzmann goal selection for producing a rational mix of behavior.

artificial intelligence, machine learning, reinforcement learning, (3 more...)

Twenty-Seventh AAAI Conference on Artificial Intelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.73)

Mahadevan, Sridhar (University of Massachusetts, Amherst) | Giguere, Stephen (University of Massachusetts, Amherst) | Jacek, Nicholas (University of Massachusetts, Amherst)

Basis Adaptation for Sparse Nonlinear Reinforcement Learning

This paper presents a new approach to representation discovery in reinforcement learning (RL) using basis adaptation. We introduce a general framework for basis adaptation as {\em nonlinear separable least-squares value function approximation} based on finding Frechet gradients of an error function using variable projection functionals. We then present a scalable proximal gradient-based approach for basis adaptation using the recently proposed mirror-descent framework for RL. Unlike traditional temporal-difference (TD) methods for RL, mirror descent based RL methods undertake proximal gradient updates of weights in a dual space, which is linked together with the primal space using a Legendre transform involving the gradient of a strongly convex function. Mirror descent RL can be viewed as a proximal TD algorithm using Bregman divergence as the distance generating function. We present a new class of regularized proximal-gradient based TD methods, which combine feature selection through sparse L1 regularization and basis adaptation. Experimental results are provided to illustrate and validate the approach.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Twenty-Seventh AAAI Conference on Artificial Intelligence

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)

Structured Kernel-Based Reinforcement Learning

Kveton, Branislav (Technicolor Labs) | Theocharous, Georgios (Adobe)

Kernel-based reinforcement learning (KBRL) is a popular approach to learning non-parametric value function approximations. In this paper, we present structured KBRL, a paradigm for kernel-based RL that allows for modeling independencies in the transition and reward models of problems. Real-world problems often exhibit this structure and can be solved more efficiently when it is modeled. We make three contributions. First, we motivate our work, define a structured backup operator, and prove that it is a contraction. Second, we show how to evaluate our operator efficiently. Our analysis reveals that the fixed point of the operator is the optimal value function in a special factored MDP. Finally, we evaluate our method on a synthetic problem and compare it to two KBRL baselines. In most experiments, we learn better policies than the baselines from an order of magnitude less training data.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Twenty-Seventh AAAI Conference on Artificial Intelligence

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Banerjee, Bikramjit (The University of Southern Mississippi)

Pruning for Monte Carlo Distributed Reinforcement Learning in Decentralized POMDPs

Decentralized partially observable Markov decision processes (Dec-POMDPs) offer a powerful modeling technique for realistic multi-agent coordination problems under uncertainty. Prevalent solution techniques are centralized and assume prior knowledge of the model. Recently a Monte Carlo based distributed reinforcement learning approach was proposed, where agents take turns to learn best responses to each other’s policies. This promotes decentralization of the policy computation problem, and relaxes reliance on the full knowledge of the problem parameters. However, this Monte Carlo approach has a large sample complexity, which we address in this paper. In particular, we propose and analyze a modified version of the previous algorithm that adaptively eliminates parts of the experience tree from further exploration, thus requiring fewer samples while ensuring unchanged confidence in the learned value function. Experiments demonstrate significant reduction in sample complexity – the maximum reductions ranging from 61% to 91% over different benchmark Dec-POMDP problems – with the final policies being often better due to more focused exploration.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Twenty-Seventh AAAI Conference on Artificial Intelligence

Country: North America > United States > Mississippi > Forrest County > Hattiesburg (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

AAAI ConferencesJul-5-2013

Efficient Abstraction Selection in Reinforcement Learning (Extended Abstract)

Seijen, Harm van (University of Alberta) | Whiteson, Shimon (University of Amsterdam) | Kester, Leon (TNO)

This paper introduces a novel approach for abstraction selection in reinforcement learning problems modelled as factored Markov decision processes (MDPs), for which a state is described via a set of state components. In abstraction selection, an agent must choose an abstraction from a set of candidate abstractions, each build up from a different combination of state components.

efficient abstraction selection, extended abstract, reinforcement learning

Tenth Symposium of Abstraction, Reformulation, and Approximation

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.60)

Dimitrakakis, Christos, Tziortziotis, Nikolaos

ABC Reinforcement Learning

arXiv.org Machine LearningJun-28-2013

This paper introduces a simple, general framework for likelihood-free Bayesian reinforcement learning, through Approximate Bayesian Computation (ABC). The main advantage is that we only require a prior distribution on a class of simulators (generative models). This is useful in domains where an analytical probabilistic model of the underlying process is too complex to formulate, but where detailed simulation models are available. ABC-RL allows the use of any Bayesian reinforcement learning technique, even in this case. In addition, it can be seen as an extension of rollout algorithms to the case where we do not know what the correct model to draw rollouts from is. We experimentally demonstrate the potential of this approach in a comparison with LSPI. Finally, we introduce a theorem showing that ABC is a sound methodology in principle, even when non-sufficient statistics are used.

machine learning, reinforcement, reinforcement learning, (14 more...)

arXiv.org Machine Learning

1303.6977

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.94)