AITopics | Barto, Andrew G.

Collaborating Authors

Barto, Andrew G.

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Reinforcement Learning: Connections, Surprises, and Challenge

Barto, Andrew G. (University of Massachusetts Amherst)

AI MagazineApr-4-2019

The idea of implementing reinforcement learning in a computer was one of the earliest ideas about the possibility of AI, but reinforcement learning remained on the margin of AI until relatively recently. Today we see reinforcement learning playing essential roles in some of the most impressive AI applications. This article presents observations from the author’s personal experience with reinforcement learning over the most recent 40 years of its history in AI, focusing on striking connections that emerged between largely separate disciplines and on some of the findings that surprised him along the way. These connections and surprises place reinforcement learning in a historical context, and they help explain the success it is finding in modern AI. The article concludes by discussing some of the challenges that need to be faced as reinforcement learning moves out into real world.

algorithm, deep learning, neural network, (22 more...)

AI Magazine

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.46)

Industry:

Leisure & Entertainment > Games (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

TD-DeltaPi: A Model-Free Algorithm for Efficient Exploration

Silva, Bruno C. da (University of Massachusetts Amherst) | Barto, Andrew G. (University of Massachusetts Amherst)

AAAI ConferencesJul-21-2012

We study the problem of finding efficient exploration policies for the case in which an agent is momentarily not concerned with exploiting, and instead tries to compute a policy for later use. We first formally define the Optimal Exploration Problem as one of sequential sampling and show that its solutions correspond to paths of minimum expected length in the space of policies. We derive a model-free, local linear approximation to such solutions and use it to construct efficient exploration policies. We compare our model-free approach to other exploration techniques, including one with the best known PAC bounds, and show that ours is both based on a well-defined optimization problem and empirically efficient.

artificial intelligence, exploration, reinforcement learning, (19 more...)

AAAI Conferences

Twenty-Sixth AAAI Conference on Artificial Intelligence

Country: North America > United States > Massachusetts > Hampshire County > Amherst (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)

Add feedback

Clustering via Dirichlet Process Mixture Models for Portable Skill Discovery

Niekum, Scott, Barto, Andrew G.

Neural Information Processing SystemsDec-31-2011

Skill discovery algorithms in reinforcement learning typically identify single states or regions in state space that correspond to task-specific subgoals. However, such methods do not directly address the question of how many distinct skills are appropriate for solving the tasks that the agent faces. This can be highly inefficient when many identified subgoals correspond to the same underlying skill, but are all used individually as skill goals. Furthermore, skills created in this manner are often only transferable to tasks that share identical state spaces, since corresponding subgoals across tasks are not merged into a single skill goal. We show that these problems can be overcome by clustering subgoal data defined in an agent-space and using the resulting clusters as templates for skill termination conditions. Clustering via a Dirichlet process mixture model is used to discover a minimal, sufficient collection of portable skills.

agent, artificial intelligence, machine learning, (19 more...)

Neural Information Processing Systems

Country: North America > United States > Massachusetts > Hampshire County > Amherst (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.51)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)
(2 more...)

Add feedback

Constructing Skill Trees for Reinforcement Learning Agents from Demonstration Trajectories

Konidaris, George, Kuindersma, Scott, Grupen, Roderic, Barto, Andrew G.

Neural Information Processing SystemsDec-31-2010

We introduce CST, an algorithm for constructing skill trees from demonstration trajectories in continuous reinforcement learning domains. CST uses a changepoint detection method to segment each trajectory into a skill chain by detecting a change of appropriate abstraction, or that a segment is too complex to model as a single skill. The skill chains from each trajectory are then merged to form a skill tree. We demonstrate that CST constructs an appropriate skill tree that can be further refined through learning in a challenging continuous domain, and that it can be used to segment demonstration trajectories on a mobile manipulator into chains of skills where each skill is assigned an appropriate abstraction.

artificial intelligence, reinforcement learning, trajectory, (16 more...)

Neural Information Processing Systems

Country: North America > United States > Massachusetts (0.28)

Industry: Education (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Skill Characterization Based on Betweenness

Şimşek, Özgür, Barto, Andrew G.

Neural Information Processing SystemsDec-31-2009

We present a characterization of a useful class of skills based on a graphical representation ofan agent's interaction with its environment. Our characterization uses betweenness, a measure of centrality on graphs. It captures and generalizes (at least intuitively) the bottleneck concept, which has inspired many of the existing skill-discovery algorithms. Our characterization may be used directly to form a set of skills suitable for a given task. More importantly, it serves as a useful guide for developing incremental skill-discovery algorithms that do not rely on knowing or representing the interaction graph in its entirety.

artificial intelligence, betweenness, book review, (18 more...)

Neural Information Processing Systems

Country: North America > United States > Massachusetts > Hampshire County > Amherst (0.14)

Genre:

Summary/Review (0.54)
Research Report (0.47)

Industry:

Transportation > Passenger (0.35)
Leisure & Entertainment > Games (0.30)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)

Add feedback

Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining

Konidaris, George, Barto, Andrew G.

Neural Information Processing SystemsDec-31-2009

We introduce skill chaining, a skill discovery method for reinforcement learning agents in continuous domains, that builds chains of skills leading to an end-of-task reward. We demonstrate experimentally that it creates skills that result in performance benefits in a challenging continuous domain.

agent, artificial intelligence, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States > Massachusetts > Hampshire County > Amherst (0.15)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Intrinsically Motivated Reinforcement Learning

Chentanez, Nuttapong, Barto, Andrew G., Singh, Satinder P.

Neural Information Processing SystemsDec-31-2005

Psychologists call behavior intrinsically motivated when it is engaged in for its own sake rather than as a step toward solving a specific problem of clear practical value. But what we learn during intrinsically motivated behavior is essential for our development as competent autonomous entities able to efficiently solve a wide range of practical problems as they arise. In this paper we present initial results from a computational study of intrinsically motivated reinforcement learning aimed at allowing artificial agents to construct and extend hierarchies of reusable skills that are needed for competent autonomy.

agent, artificial intelligence, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts (0.14)
Europe > United Kingdom > Scotland (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Intrinsically Motivated Reinforcement Learning

Chentanez, Nuttapong, Barto, Andrew G., Singh, Satinder P.

Neural Information Processing SystemsDec-31-2005

Psychologists call behavior intrinsically motivated when it is engaged in for its own sake rather than as a step toward solving a specific problem of clear practical value. But what we learn during intrinsically motivated behavior is essential for our development as competent autonomous entities ableto efficiently solve a wide range of practical problems as they arise. In this paper we present initial results from a computational study of intrinsically motivated reinforcement learning aimed at allowing artificial agentsto construct and extend hierarchies of reusable skills that are needed for competent autonomy.

agent, artificial intelligence, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts (0.14)
Europe > United Kingdom > Scotland (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

The Emergence of Multiple Movement Units in the Presence of Noise and Feedback Delay

Kositsky, Michael, Barto, Andrew G.

Neural Information Processing SystemsDec-31-2002

Tangential hand velocity profiles of rapid human arm movements often appear as sequences of several bell-shaped acceleration-deceleration phases called submovements or movement units. This suggests how the nervous system might efficiently control a motor plant in the presence of noise and feedback delay. Another critical observation is that stochasticity in a motor control problem makes the optimal control policy essentially different from the optimal control policy for the deterministic case. We use a simplified dynamic model of an arm and address rapid aimed arm movements. We use reinforcement learning as a tool to approximate the optimal policy in the presence of noise and feedback delay. Using a simplified model we show that multiple submovements emerge as an optimal policy in the presence of noise and feedback delay. The optimal policy in this situation is to drive the arm's end point close to the target by one fast submovement and then apply a few slow submovements to accurately drive the arm's end point into the target region. In our simulations, the controller sometimes generates corrective submovements before the initial fast submovement is completed, much like the predictive corrections observed in a number of psychophysical experiments.

neurology, submovement, upstream oil & gas, (21 more...)

Neural Information Processing Systems

Country: North America > United States > Massachusetts > Hampshire County > Amherst (0.14)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.68)
Energy > Oil & Gas > Upstream (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.51)

Add feedback

The Emergence of Multiple Movement Units in the Presence of Noise and Feedback Delay

Kositsky, Michael, Barto, Andrew G.

Neural Information Processing SystemsDec-31-2002

Tangential hand velocity profiles of rapid human arm movements often appearas sequences of several bell-shaped acceleration-deceleration phases called submovements or movement units. This suggests how the nervous system might efficiently control a motor plant in the presence of noise and feedback delay. Another critical observation is that stochasticity ina motor control problem makes the optimal control policy essentially differentfrom the optimal control policy for the deterministic case. We use a simplified dynamic model of an arm and address rapid aimed arm movements. We use reinforcement learning as a tool to approximate the optimal policy in the presence of noise and feedback delay. Using a simplified model we show that multiple submovements emerge as an optimal policy in the presence of noise and feedback delay. The optimal policy in this situation is to drive the arm's end point close to the target by one fast submovement and then apply a few slow submovements to accurately drivethe arm's end point into the target region. In our simulations, the controller sometimes generates corrective submovements before the initial fast submovement is completed, much like the predictive corrections observedin a number of psychophysical experiments.

controller, neurology, upstream oil & gas, (21 more...)

Neural Information Processing Systems

Country: North America > United States > Massachusetts > Hampshire County > Amherst (0.14)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.50)
Energy > Oil & Gas > Upstream (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.51)

Add feedback