AITopics | Colas, Cédric

Collaborating Authors

Colas, Cédric

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Intrinsically Motivated Goal-Conditioned Reinforcement Learning: a Short Survey

Colas, Cédric, Karch, Tristan, Sigaud, Olivier, Oudeyer, Pierre-Yves

arXiv.org Artificial IntelligenceDec-17-2020

Building autonomous machines that can explore open-ended environments, discover possible interactions and autonomously build repertoires of skills is a general objective of artificial intelligence. Developmental approaches argue that this can only be achieved by autonomous and intrinsically motivated learning agents that can generate, select and learn to solve their own problems. In recent years, we have seen a convergence of developmental approaches, and developmental robotics in particular, with deep reinforcement learning (RL) methods, forming the new domain of developmental machine learning. Within this new domain, we review here a set of methods where deep RL algorithms are trained to tackle the developmental robotics problem of the autonomous acquisition of open-ended repertoires of skills. Intrinsically motivated goal-conditioned RL algorithms train agents to learn to represent, generate and pursue their own goals. The self-generation of goals requires the learning of compact goal encodings as well as their associated goal-achievement functions, which results in new challenges compared to traditional RL algorithms designed to tackle pre-defined sets of goals using external reward signals. This paper proposes a typology of these methods at the intersection of deep RL and developmental approaches, surveys recent approaches and discusses future avenues.

agent, artificial intelligence, survey article, (13 more...)

arXiv.org Artificial Intelligence

2012.0983

Country: North America > United States > California (0.14)

Genre: Overview (0.87)

Industry: Education > Educational Setting (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

EpidemiOptim: A Toolbox for the Optimization of Control Policies in Epidemiological Models

Colas, Cédric, Hejblum, Boris, Rouillon, Sébastien, Thiébaut, Rodolphe, Oudeyer, Pierre-Yves, Moulin-Frier, Clément, Prague, Mélanie

arXiv.org Artificial IntelligenceOct-9-2020

Epidemiologists model the dynamics of epidemics in order to propose control strategies based on pharmaceutical and non-pharmaceutical interventions (contact limitation, lock down, vaccination, etc). Hand-designing such strategies is not trivial because of the number of possible interventions and the difficulty to predict long-term effects. This task can be cast as an optimization problem where state-of-the-art machine learning algorithms such as deep reinforcement learning, might bring significant value. However, the specificity of each domain -- epidemic modelling or solving optimization problem -- requires strong collaborations between researchers from different fields of expertise. This is why we introduce EpidemiOptim, a Python toolbox that facilitates collaborations between researchers in epidemiology and optimization. EpidemiOptim turns epidemiological models and cost functions into optimization problems via a standard interface commonly used by optimization practitioners (OpenAI Gym). Reinforcement learning algorithms based on Q-Learning with deep neural networks (DQN) and evolutionary algorithms (NSGA-II) are already implemented. We illustrate the use of EpidemiOptim to find optimal policies for dynamical on-off lock-down control under the optimization of death toll and economic recess using a Susceptible-Exposed-Infectious-Removed (SEIR) model for COVID-19. Using EpidemiOptim and its interactive visualization platform in Jupyter notebooks, epidemiologists, optimization practitioners and others (e.g. economists) can easily compare epidemiological models, costs functions and optimization algorithms to address important choices to be made by health decision-makers.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1613/jair.1.12588

2010.04452

Country:

Europe (0.68)
North America > United States (0.28)

Genre: Research Report > Experimental Study (0.48)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Epidemiology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

DECSTR: Learning Goal-Directed Abstract Behaviors using Pre-Verbal Spatial Predicates in Intrinsically Motivated Agents

Akakzia, Ahmed, Colas, Cédric, Oudeyer, Pierre-Yves, Chetouani, Mohamed, Sigaud, Olivier

arXiv.org Artificial IntelligenceJun-12-2020

Intrinsically motivated agents freely explore their environment and set their own goals. Such goals are traditionally represented as specific states, but recent works introduced the use of language to facilitate abstraction. Language can, for example, represent goals as sets of general properties that surrounding objects should verify. However, language-conditioned agents are trained simultaneously to understand language and to act, which seems to contrast with how children learn: infants demonstrate goal-oriented behaviors and abstract spatial concepts very early in their development, before language mastery. Guided by these findings from developmental psychology, we introduce a high-level state representation based on natural semantic predicates that describe spatial relations between objects and that are known to be present early in infants. In a robotic manipulation environment, our DECSTR system explores this representation space by manipulating objects, and efficiently learns to achieve any reachable configuration within it. It does so by leveraging an object-centered modular architecture, a symmetry inductive bias, and a new form of automatic curriculum learning for goal selection and policy learning. As with children, language acquisition takes place in a second phase, independently from goal-oriented sensorimotor learning. This is done via a new goal generation module, conditioned on instructions describing expected transformations in object relations. We present ablations studies for each component and highlight several advantages of targeting abstract goals over specific ones. We further show that using this intermediate representation enables efficient language grounding by evaluating agents on sequences of language instructions and their logical combinations.

artificial intelligence, configuration, neural network, (18 more...)

arXiv.org Artificial Intelligence

2006.07185

Genre: Research Report (0.82)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
(2 more...)

Add feedback

Language-Conditioned Goal Generation: a New Approach to Language Grounding for RL

Colas, Cédric, Akakzia, Ahmed, Oudeyer, Pierre-Yves, Chetouani, Mohamed, Sigaud, Olivier

arXiv.org Artificial IntelligenceJun-12-2020

In the real world, linguistic agents are also embodied agents: they perceive and act in the physical world. The notion of Language Grounding questions the interactions between language and embodiment: how do learning agents connect or ground linguistic representations to the physical world ? This question has recently been approached by the Reinforcement Learning community under the framework of instruction-following agents. In these agents, behavioral policies or reward functions are conditioned on the embedding of an instruction expressed in natural language. This paper proposes another approach: using language to condition goal generators. Given any goal-conditioned policy, one could train a language-conditioned goal generator to generate language-agnostic goals for the agent. This method allows to decouple sensorimotor learning from language acquisition and enable agents to demonstrate a diversity of behaviors for any given instruction. We propose a particular instantiation of this approach and demonstrate its benefits.

configuration, logic programming, neural network, (18 more...)

arXiv.org Artificial Intelligence

2006.07043

Genre: Research Report (0.64)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Robots (0.94)
(2 more...)

Add feedback

Teacher algorithms for curriculum learning of Deep RL in continuously parameterized environments

Portelas, Rémy, Colas, Cédric, Hofmann, Katja, Oudeyer, Pierre-Yves

arXiv.org Machine LearningOct-16-2019

We consider the problem of how a teacher algorithm can enable an unknown Deep Reinforcement Learning (DRL) student to become good at a skill over a wide range of diverse environments. To do so, we study how a teacher algorithm can learn to generate a learning curriculum, whereby it sequentially samples parameters controlling a stochastic procedural generation of environments. Because it does not initially know the capacities of its student, a key challenge for the teacher is to discover which environments are easy, difficult or unlearnable, and in what order to propose them to maximize the efficiency of learning over the learnable ones. To achieve this, this problem is transformed into a surrogate continuous bandit problem where the teacher samples environments in order to maximize absolute learning progress of its student. We present a new algorithm modeling absolute learning progress with Gaussian mixture models (ALP-GMM). We also adapt existing algorithms and provide a complete study in the context of DRL. Using parameterized variants of the BipedalWalker environment, we study their efficiency to personalize a learning curriculum for different learners (embodiments), their robustness to the ratio of learnable/unlearnable environments, and their scalability to non-linear and high-dimensional parameter spaces. Videos and code are available at https://github.com/flowersteam/teachDeepRL.

artificial intelligence, educational technology, parameter space, (18 more...)

arXiv.org Machine Learning

1910.07224

Country: North America > United States > California (0.28)

Genre:

Research Report > Experimental Study (0.47)
Research Report > New Finding (0.46)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

CLIC: Curriculum Learning and Imitation for feature Control in non-rewarding environments

Fournier, Pierre, Sigaud, Olivier, Colas, Cédric, Chetouani, Mohamed

arXiv.org Machine LearningJan-31-2019

In this paper, we propose an unsupervised reinforcement learning agent called CLIC for Curriculum Learning and Imitation for Control. This agent learns to control features in its environment without external rewards, and observes the actions of a third party agent, Bob, who does not necessarily provide explicit guidance. CLIC selects which feature to train on and what to imitate from Bob's behavior by maximizing its learning progress. We show that CLIC can effectively identify helpful behaviors in Bob's actions, and imitate them to control the environment faster. CLIC can also follow Bob when he acts as a mentor and provides ordered demonstrations. Finally, when Bob controls features than the agent cannot, or in presence of a hierarchy between aspects of the environment, we show that CLIC ignores non-reproducible and already mastered behaviors, resulting in a greater benefit from imitation.

agent, artificial intelligence, reinforcement learning, (13 more...)

arXiv.org Machine Learning

1901.0972

Country:

Europe > France (0.28)
Asia (0.28)

Genre: Research Report (0.82)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

CURIOUS: Intrinsically Motivated Multi-Task, Multi-Goal Reinforcement Learning

Colas, Cédric, Fournier, Pierre, Sigaud, Olivier, Oudeyer, Pierre-Yves

arXiv.org Artificial IntelligenceOct-24-2018

In open-ended and changing environments, agents face a wide range of potential tasks that may or may not come with associated reward functions. Such autonomous learning agents must be able to generate their own tasks through a process of intrinsically motivated exploration, some of which might prove easy, others impossible. For this reason, they should be able to actively select which task to practice at any given moment, to maximize their overall mastery on the set of learnable tasks. This paper proposes CURIOUS, an extension of Universal Value Function Approximators that enables intrinsically motivated agents to learn to achieve both multiple tasks and multiple goals within a unique policy, leveraging hindsight learning. Agents focus on achievable tasks first, using an automated curriculum learning mechanism that biases their attention towards tasks maximizing the absolute learning progress. This mechanism provides robustness to catastrophic forgetting (by refocusing on tasks where performance decreases) and distracting tasks (by avoiding tasks with no absolute learning progress). Furthermore, we show that having two levels of parameterization (tasks and goals within tasks) enables more efficient learning of skills in an environment with a modular physical structure (e.g. multiple objects) as compared to flat, goal-parameterized RL with hindsight experience replay.

artificial intelligence, learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

1810.06284

Country: Europe > France (0.29)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.86)

Add feedback

How Many Random Seeds? Statistical Power Analysis in Deep Reinforcement Learning Experiments

Colas, Cédric, Sigaud, Olivier, Oudeyer, Pierre-Yves

arXiv.org Machine LearningJul-5-2018

Consistently checking the statistical significance of experimental results is one of the mandatory methodological steps to address the so-called "reproducibility crisis" in deep reinforcement learning. In this tutorial paper, we explain how the number of random seeds relates to the probabilities of statistical errors. For both the t-test and the bootstrap confidence interval test, we recall theoretical guidelines to determine the number of random seeds one should use to provide a statistically significant comparison of the performance of two algorithms. Finally, we discuss the influence of deviations from the assumptions usually made by statistical tests. We show that they can lead to inaccurate evaluations of statistical errors and provide guidelines to counter these negative effects. We make our code available to perform the tests.

algorithm, artificial intelligence, reinforcement learning, (19 more...)

arXiv.org Machine Learning

1806.08295

Country: Europe > France (0.28)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.97)

Add feedback