AITopics

2410.02891

Genre: Research Report (0.64)

Industry: Transportation > Air (0.35)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

arXiv.org Artificial IntelligenceMay-22-2023

End-to-End Stable Imitation Learning via Autonomous Neural Dynamic Policies

Totsila, Dionis, Chatzilygeroudis, Konstantinos, Hadjivelichkov, Denis, Modugno, Valerio, Hatzilygeroudis, Ioannis, Kanoulas, Dimitrios

State-of-the-art sensorimotor learning algorithms offer policies that can often produce unstable behaviors, damaging the robot and/or the environment. Traditional robot learning, on the contrary, relies on dynamical system-based policies that can be analyzed for stability/safety. Such policies, however, are neither flexible nor generic and usually work only with proprioceptive sensor states. In this work, we bridge the gap between generic neural network policies and dynamical system-based policies, and we introduce Autonomous Neural Dynamic Policies (ANDPs) that: (a) are based on autonomous dynamical systems, (b) always produce asymptotically stable behaviors, and (c) are more flexible than traditional stable dynamical system-based policies. ANDPs are fully differentiable, flexible generic-policies that can be used in imitation learning setups while ensuring asymptotic stability. In this paper, we explore the flexibility and capacity of ANDPs in several imitation learning tasks including experiments with image observations. The results show that ANDPs combine the benefits of both neural network-based and dynamical system-based methods.

andp, artificial intelligence, machine learning, (18 more...)

2305.12886

Country:

Europe > Greece (0.14)
Europe > United Kingdom (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)

arXiv.org Artificial IntelligenceDec-7-2022

Combining Planning, Reasoning and Reinforcement Learning to solve Industrial Robot Tasks

Mayr, Matthias, Ahmad, Faseeh, Chatzilygeroudis, Konstantinos, Nardi, Luigi, Krueger, Volker

One of today's goals for industrial robot systems is to allow fast and easy provisioning for new tasks. Skill-based systems that use planning and knowledge representation have long been one possible answer to this. However, especially with contact-rich robot tasks that need careful parameter settings, such reasoning techniques can fall short if the required knowledge not adequately modeled. We show an approach that provides a combination of task-level planning and reasoning with targeted learning of skill parameters for a task at hand. Starting from a task goal formulated in PDDL, the learnable parameters in the plan are identified and an operator can choose reward functions and parameters for the learning process. A tight integration with a knowledge framework allows to form a prior for learning and the usage of multi-objective Bayesian optimization eases to balance aspects such as safety and task performance that can often affect each other. We demonstrate the efficacy and versatility of our approach by learning skill parameters for two different contact-rich tasks and show their successful execution on a real 7-DOF KUKA-iiwa.

artificial intelligence, machine learning, planning & scheduling, (17 more...)

2212.0357

Country: Europe (0.46)

Genre: Research Report (0.64)

Industry: Education > Focused Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(2 more...)

arXiv.org Machine LearningDec-16-2020

Quality-Diversity Optimization: a novel branch of stochastic optimization

Chatzilygeroudis, Konstantinos, Cully, Antoine, Vassiliades, Vassilis, Mouret, Jean-Baptiste

Traditional optimization algorithms search for a single global optimum that maximizes (or minimizes) the objective function. Multimodal optimization algorithms search for the highest peaks in the search space that can be more than one. Quality-Diversity algorithms are a recent addition to the evolutionary computation toolbox that do not only search for a single set of local optima, but instead try to illuminate the search space. In effect, they provide a holistic view of how high-performing solutions are distributed throughout a search space. The main differences with multimodal optimization algorithms are that (1) Quality-Diversity typically works in the behavioral space (or feature space), and not in the genotypic (or parameter) space, and (2) Quality-Diversity attempts to fill the whole behavior space, even if the niche is not a peak in the fitness landscape. In this chapter, we provide a gentle introduction to Quality-Diversity optimization, discuss the main representative algorithms, and the main current topics under consideration in the community. Throughout the chapter, we also discuss several successful applications of Quality-Diversity algorithms, including deep learning, robotics, and reinforcement learning.

algorithm, deep learning, neural network, (22 more...)

2012.04322

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Middle East > Cyprus > Nicosia > Nicosia (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

arXiv.org Machine LearningJul-11-2018

A survey on policy search algorithms for learning robot controllers in a handful of trials

Chatzilygeroudis, Konstantinos, Vassiliades, Vassilis, Stulp, Freek, Calinon, Sylvain, Mouret, Jean-Baptiste

Most policy search algorithms require thousands of training episodes to find an effective policy, which is often infeasible with a physical robot. This survey article focuses on the extreme other end of the spectrum: how can a robot adapt with only a handful of trials (a dozen) and a few minutes? By analogy with the word "big-data", we refer to this challenge as "micro-data reinforcement learning". We show that a first strategy is to leverage prior knowledge on the policy structure (e.g., dynamic movement primitives), on the policy parameters (e.g., demonstrations), or on the dynamics (e.g., simulators). A second strategy is to create data-driven surrogate models of the expected reward (e.g., Bayesian optimization) or the dynamical model (e.g., model-based policy search), so that the policy optimizer queries the model instead of the real system. Overall, all successful micro-data algorithms combine these two strategies by varying the kind of model and prior knowledge. The current scientific challenges essentially revolve around scaling up to complex robots (e.g., humanoids), designing generic priors, and optimizing the computing time.

air transportation, deep learning, policy search, (21 more...)

1807.02303

Country:

Europe (0.92)
Asia > Japan (0.28)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Energy > Oil & Gas (0.68)
Transportation > Air (0.46)
Leisure & Entertainment > Sports (0.46)
Leisure & Entertainment > Games (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
(4 more...)

arXiv.org Artificial IntelligenceJun-25-2018

Multi-objective Model-based Policy Search for Data-efficient Learning with Sparse Rewards

Kaushik, Rituraj, Chatzilygeroudis, Konstantinos, Mouret, Jean-Baptiste

The most data-efficient algorithms for reinforcement learning in robotics are model-based policy search algorithms, which alternate between learning a dynamical model of the robot and optimizing a policy to maximize the expected return given the model and its uncertainties. However, the current algorithms lack an effective exploration strategy to deal with sparse or misleading reward scenarios: if they do not experience any state with a positive reward during the initial random exploration, it is very unlikely to solve the problem. Here, we propose a novel model-based policy search algorithm, Multi-DEX, that leverages a learned dynamical model to efficiently explore the task space and solve tasks with sparse rewards in a few episodes. To achieve this, we frame the policy search problem as a multi-objective, model-based policy optimization problem with three objectives: (1) generate maximally novel state trajectories, (2) maximize the expected return and (3) keep the system in state-space regions for which the model is as accurate as possible. We then optimize these objectives using a Pareto-based multi-objective optimization algorithm. The experiments show that Multi-DEX is able to solve sparse reward scenarios (with a simulated robotic arm) in much lower interaction time than VIME, TRPO, GEP-PG, CMA-ES and Black-DROPS.

optimization problem, trajectory, upstream oil & gas, (19 more...)

1806.09351

Country: North America > United States (0.46)

Genre: Research Report > Promising Solution (0.34)

Industry:

Leisure & Entertainment (0.67)
Energy > Oil & Gas > Upstream (0.66)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.89)

arXiv.org Machine LearningMar-13-2018

Bayesian Optimization with Automatic Prior Selection for Data-Efficient Direct Policy Search

Pautrat, Rémi, Chatzilygeroudis, Konstantinos, Mouret, Jean-Baptiste

One of the most interesting features of Bayesian optimization for direct policy search is that it can leverage priors (e.g., from simulation or from previous tasks) to accelerate learning on a robot. In this paper, we are interested in situations for which several priors exist but we do not know in advance which one fits best the current situation. We tackle this problem by introducing a novel acquisition function, called Most Likely Expected Improvement (MLEI), that combines the likelihood of the priors and the expected improvement. We evaluate this new acquisition function on a transfer learning task for a 5-DOF planar arm and on a possibly damaged, 6-legged robot that has to learn to walk on flat ground and on stairs, with priors corresponding to different stairs and different kinds of damages. Our results show that MLEI effectively identifies and exploits the priors, even when there is no obvious match between the current situations and the priors.

artificial intelligence, reinforcement learning, robot, (18 more...)

1709.06919

Country: Europe (0.28)

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Artificial Intelligence > Robots > Locomotion (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

arXiv.org Machine LearningMar-13-2018

Using Parameterized Black-Box Priors to Scale Up Model-Based Policy Search for Robotics

Chatzilygeroudis, Konstantinos, Mouret, Jean-Baptiste

The most data-efficient algorithms for reinforcement learning in robotics are model-based policy search algorithms, which alternate between learning a dynamical model of the robot and optimizing a policy to maximize the expected return given the model and its uncertainties. Among the few proposed approaches, the recently introduced Black-DROPS algorithm exploits a black-box optimization algorithm to achieve both high data-efficiency and good computation times when several cores are used; nevertheless, like all model-based policy search approaches, Black-DROPS does not scale to high dimensional state/action spaces. In this paper, we introduce a new model learning procedure in Black-DROPS that leverages parameterized black-box priors to (1) scale up to high-dimensional systems, and (2) be robust to large inaccuracies of the prior information. We demonstrate the effectiveness of our approach with the "pendubot" swing-up task in simulation and with a physical hexapod robot (48D state space, 18D action space) that has to walk forward as fast as possible. The results show that our new algorithm is more data-efficient than previous model-based policy search algorithms (with and without priors) and that it can allow a physical 6-legged robot to learn new gaits in only 16 to 30 seconds of interaction time.

air transportation, black-drop, optimization problem, (16 more...)

1709.06917

Country: Europe (0.28)

Genre: Research Report > New Finding (0.34)

Industry: Transportation > Air (0.91)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)