AITopics | Peters, Jan

Plotting

Peters, Jan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Multi-Task Policy Search

Deisenroth, Marc Peter, Englert, Peter, Peters, Jan, Fox, Dieter

arXiv.org Artificial IntelligenceFeb-12-2014

Learning policies that generalize across multiple tasks is an important and challenging research topic in reinforcement learning and robotics. Training individual policies for every single potential task is often impractical, especially for continuous task variations, requiring more principled approaches to share and transfer knowledge among similar tasks. We present a novel approach for learning a nonlinear feedback policy that generalizes across multiple tasks. The key idea is to define a parametrized policy as a function of both the state and the task, which allows learning a single policy that generalizes across multiple known and unknown tasks. Applications of our novel approach to reinforcement and imitation learning in real-robot experiments are shown.

artificial intelligence, controller, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

1307.0813

Country: Europe > Germany (0.28)

Industry: Leisure & Entertainment > Sports (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Add feedback

Data-Efficient Generalization of Robot Skills with Contextual Policy Search

Kupcsik, Andras Gabor (National University of Singapore) | Deisenroth, Marc Peter (Technische Universität Darmstadt) | Peters, Jan (Technische Universität Darmstadt) | Neumann, Gerhard (Technische Universität Darmstadt)

AAAI ConferencesJul-9-2013

In robotics, controllers make the robot solve a task within a specific context. The context can describe the objectives of the robot or physical properties of the environment and is always specified before task execution. To generalize the controller to multiple contexts, we follow a hierarchical approach for policy learning: A lower-level policy controls the robot for a given context and an upper-level policy generalizes among contexts. Current approaches for learning such upper-level policies are based on model-free policy search, which require an excessive number of interactions of the robot with its environment. More data-efficient policy search approaches are model based but, thus far, without the capability of learning hierarchical policies. We propose a new model-based policy search approach that can also learn contextual upper-level policies. Our approach is based on learning probabilistic forward models for long-term predictions. Using these predictions, we use information-theoretic insights to improve the upper-level policy. Our method achieves a substantial improvement in learning speed compared to existing methods on simulated and real robotic tasks.

contextual policy search, data-efficient generalization, robot skill

AAAI Conferences

Twenty-Seventh AAAI Conference on Artificial Intelligence

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

Information-Theoretic Motor Skill Learning

Neumann, Gerhard (TU Darmstadt) | Daniel, Christian (TU Darmstadt) | Kupcsik, Andras (National University of Singapore) | Deisenroth, Marc (TU Darmstadt) | Peters, Jan (TU Darmstadt and Max Planck Institute for Intelligent Systems)

AAAI ConferencesJul-9-2013

While there has been recent successes in learning single control policies, there are several several open challenges in robot motor skill learning. Firstly, many motor tasks can be solved in multiple ways, and, hence, we need to be able to learn each of these solutions as separate options from which the agent can choose from. Furthermore, we need to learn how to adapt an option to the current situation. Finally, we need to be able to combine several options sequentially in order to solve an overall-task. As we want to use our method for real robots, a high data efficiency is a natural additional requirement for motor skill learning. In this paper we summarize our work on information-theoretic motor skill learning. We show how to adapt the relative entropy policy search (REPS) algorithm for learning parametrized options and extend the algorithm in a mathematical sound way such that it can meet all these requirements. Finally, we summarize our experiments conducted on real robots.

information-theoretic motor skill learning

AAAI Conferences

Workshops at the Twenty-Seventh AAAI Conference on Artificial Intelligence

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

Learning to Select and Generalize Striking Movements in Robot Table Tennis

Muelling, Katharina (Max Planck Institute for Intelligent Systems) | Kober, Jens (Max Planck Institute for Intelligent Systems) | Kroemer, Oliver (Technische Universitaet Darmstadt) | Peters, Jan (Technische Universitaet Darmstadt)

AAAI ConferencesNov-5-2012

Learning new motor tasks autonomously from interaction with a human being is an important goal for both robotics and machine learning. However, when moving beyond basic skills, most monolithic machine learning approaches fail to scale. In this paper, we take the task of learning table tennis as an example and present a new framework which allows a robot to learn cooperative table tennis from interaction with a human. Therefore, the robot first learns a set of elementary table tennis hitting movements from a human teacher by kinesthetic teach-in, which is compiled into a set of dynamical system motor primitives (DMPs). Subsequently, the system generalizes these movements to a wider range of situations using our mixture of motor primitives (MoMP) approach. The resulting policy enables the robot to select appropriate motor primitives as well as to generalize between them. Finally, the robot plays with a human table tennis partner and learns online to improve its behavior.

motor primitive, survey article, tennis, (21 more...)

AAAI Conferences

2012 AAAI Fall Symposium Series

Country: Europe > Germany (0.28)

Industry: Leisure & Entertainment > Sports > Tennis (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Balancing Safety and Exploitability in Opponent Modeling

Wang, Zhikun (Max Planck Institute for Intelligent Systems) | Boularias, Abdeslam (Max Planck Institute for Intelligent Systems) | Mülling, Katharina (Max Planck Institute for Intelligent Systems) | Peters, Jan (Max Planck Institute for Intelligent Systems)

AAAI ConferencesAug-4-2011

Opponent modeling is a critical mechanism in repeated games. It allows a player to adapt its strategy in order to better respond to the presumed preferences of his opponents. We introduce a new modeling technique that adaptively balances exploitability and risk reduction. An opponent’s strategy is modeled with a set of possible strategies that contain the actual strategy with a high probability. The algorithm is safe as the expected payoff is above the minimax payoff with a high probability, and can exploit the opponents’ preferences when sufficient observations have been obtained. We apply them to normal-form games and stochastic games with a finite number of stages. The performance of the proposed approach is first demonstrated on repeated rock-paper-scissors games. Subsequently, the approach is evaluated in a human-robot table-tennis setting where the robot player learns to prepare to return a served ball. By modeling the human players, the robot chooses a forehand, backhand or middle preparation pose before they serve. The learned strategies can exploit the opponent’s preferences, leading to a higher rate of successful returns.

computer game, opponent, tennis, (19 more...)

AAAI Conferences

Twenty-Fifth AAAI Conference on Artificial Intelligence

Country: Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)

Industry:

Leisure & Entertainment > Sports > Tennis (0.73)
Leisure & Entertainment > Games > Computer Games (0.71)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.97)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.69)

Add feedback

Modeling Opponent Actions for Table-Tennis Playing Robot

AAAI ConferencesAug-4-2011

Opponent modeling is a critical mechanism in repeated games. It allows a player to adapt its strategy in order to better respond to the presumed preferences of its opponents. We introduce a modeling technique that adaptively balances safety and exploitability. The opponent's strategy is modeled with a set of possible strategies that contains the actual one with high probability. The algorithm is safe as the expected payoff is above the minimax payoff with high probability, and can exploit the opponent's preferences when sufficient observations are obtained. We apply the algorithm to a robot table-tennis setting where the robot player learns to prepare to return a served ball. By modeling the human players, the robot chooses a forehand, backhand or middle preparation pose before they serve. The learned strategies can exploit the opponent's preferences, leading to a higher rate of successful returns.

artificial intelligence, opponent, tennis, (16 more...)

AAAI Conferences

Twenty-Fifth AAAI Conference on Artificial Intelligence

Country: Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.15)

Industry: Leisure & Entertainment > Sports > Tennis (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.48)

Add feedback

PAC-Bayesian Analysis of the Exploration-Exploitation Trade-off

Seldin, Yevgeny, Cesa-Bianchi, Nicolò, Laviolette, François, Auer, Peter, Shawe-Taylor, John, Peters, Jan

arXiv.org Machine LearningMay-23-2011

We develop a coherent framework for integrative simultaneous analysis of the exploration-exploitation and model order selection trade-offs. We improve over our preceding results on the same subject (Seldin et al., 2011) by combining PAC-Bayesian analysis with Bernstein-type inequality for martingales. Such a combination is also of independent interest for studies of multiple simultaneously evolving martingales.

big data, pac-bayesian analysis, upstream oil & gas, (22 more...)

arXiv.org Machine Learning

1105.4585

Country: Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)

Genre: Research Report (0.64)

Industry: Energy > Oil & Gas > Upstream (0.72)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)

Add feedback

PAC-Bayesian Analysis of Martingales and Multiarmed Bandits

Seldin, Yevgeny, Laviolette, François, Shawe-Taylor, John, Peters, Jan, Auer, Peter

arXiv.org Machine LearningMay-19-2011

We present two alternative ways to apply PAC-Bayesian analysis to sequences of dependent random variables. The first is based on a new lemma that enables to bound expectations of convex functions of certain dependent random variables by expectations of the same functions of independent Bernoulli random variables. This lemma provides an alternative tool to Hoeffding-Azuma inequality to bound concentration of martingale values. Our second approach is based on integration of Hoeffding-Azuma inequality with PAC-Bayesian analysis. We also introduce a way to apply PAC-Bayesian analysis in situation of limited feedback. We combine the new tools to derive PAC-Bayesian generalization and regret bounds for the multiarmed bandit problem. Although our regret bound is not yet as tight as state-of-the-art regret bounds based on other well-established techniques, our results significantly expand the range of potential applications of PAC-Bayesian analysis and introduce a new analysis tool to reinforcement learning and many other fields, where martingales and limited feedback are encountered.

artificial intelligence, bayesian analysis, reinforcement learning, (18 more...)

arXiv.org Machine Learning

1105.2416

Country:

Europe > Germany (0.14)
North America > Canada (0.14)
Europe > Austria (0.14)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.47)

Add feedback

Relative Entropy Policy Search

Peters, Jan (Max Planck Institute for Biological Cybernetics) | Mulling, Katharina (Max Planck Institute for Biological Cybernetics) | Altun, Yasemin (Max Planck Institute for Biological Cybernetics)

AAAI ConferencesJul-15-2010

Policy search is a successful approach to reinforcement learning. However, policy improvements often result in the loss of information. Hence, it has been marred by premature convergence and implausible solutions. As first suggested in the context of covariant policy gradients, many of these problems may be addressed by constraining the information loss. In this paper, we continue this path of reasoning and suggest the Relative Entropy Policy Search (REPS) method. The resulting method differs significantly from previous policy gradient approaches and yields an exact update step. It can be shown to work well on typical reinforcement learning benchmark problems.

artificial intelligence, reinforcement learning, relative entropy policy search, (12 more...)

AAAI Conferences

Twenty-Fourth AAAI Conference on Artificial Intelligence

Country: Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback