AITopics | Lambrechts, Gaspard

Collaborating Authors

Lambrechts, Gaspard

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Theoretical Justification for Asymmetric Actor-Critic Algorithms

Lambrechts, Gaspard, Ernst, Damien, Mahajan, Aditya

arXiv.org Machine LearningJan-31-2025

In reinforcement learning for partially observable environments, many successful algorithms were developed within the asymmetric learning paradigm. This paradigm leverages additional state information available at training time for faster learning. Although the proposed learning objectives are usually theoretically sound, these methods still lack a theoretical justification for their potential benefits. We propose such a justification for asymmetric actor-critic algorithms with linear function approximators by adapting a finite-time convergence analysis to this setting. The resulting finite-time bound reveals that the asymmetric critic eliminates an error term arising from aliasing in the agent state.

artificial intelligence, machine learning, reinforcement learning, (10 more...)

arXiv.org Machine Learning

2501.19116

Country: North America > Canada > Quebec (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

Off-Policy Maximum Entropy RL with Future State and Action Visitation Measures

Bolland, Adrien, Lambrechts, Gaspard, Ernst, Damien

arXiv.org Machine LearningDec-9-2024

We introduce a new maximum entropy reinforcement learning framework based on the distribution of states and actions visited by a policy. More precisely, an intrinsic reward function is added to the reward function of the Markov decision process that shall be controlled. For each state and action, this intrinsic reward is the relative entropy of the discounted distribution of states and actions (or features from these states and actions) visited during the next time steps. We first prove that an optimal exploration policy, which maximizes the expected discounted sum of intrinsic rewards, is also a policy that maximizes a lower bound on the state-action value function of the decision process under some assumptions. We also prove that the visitation distribution used in the intrinsic reward definition is the fixed point of a contraction operator. Following, we describe how to adapt existing algorithms to learn this fixed point and compute the intrinsic rewards to enhance exploration. A new practical off-policy maximum entropy reinforcement learning algorithm is finally introduced. Empirically, exploration policies have good state-action space coverage, and high-performing control policies are computed efficiently.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Machine Learning

2412.06655

Country:

North America > United States (0.28)
Europe (0.28)

Genre: Research Report (0.64)

Industry: Energy (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Parallelizing Autoregressive Generation with Variational State Space Models

Lambrechts, Gaspard, Claes, Yann, Geurts, Pierre, Ernst, Damien

arXiv.org Machine LearningJul-11-2024

Attention-based models such as Transformers and recurrent models like state space models (SSMs) have emerged as successful methods for autoregressive sequence modeling. Although both enable parallel training, none enable parallel generation due to their autoregressiveness. We propose the variational SSM (VSSM), a variational autoencoder (VAE) where both the encoder and decoder are SSMs. Since sampling the latent variables and decoding them with the SSM can be parallelized, both training and generation can be conducted in parallel. Moreover, the decoder recurrence allows generation to be resumed without reprocessing the whole sequence. Finally, we propose the autoregressive VSSM that can be conditioned on a partial realization of the sequence, as is common in language generation tasks. Interestingly, the autoregressive VSSM still enables parallel generation. We highlight on toy problems (MNIST, CIFAR) the empirical gains in speed-up and show that it competes with traditional models in terms of generation quality (Transformer, Mamba SSM).

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Machine Learning

2407.08415

Country: North America > United States (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Reinforcement Learning to improve delta robot throws for sorting scrap metal

Louette, Arthur, Lambrechts, Gaspard, Ernst, Damien, Pirard, Eric, Dislaire, Godefroid

arXiv.org Artificial IntelligenceJun-21-2024

This study proposes a novel approach based on reinforcement learning (RL) to enhance the sorting efficiency of scrap metal using delta robots and a Pick-and-Place (PaP) process, widely used in the industry. We use three classical model-free RL algorithms (TD3, SAC and PPO) to reduce the time to sort metal scraps. We learn the release position and speed needed to throw an object in a bin instead of moving to the exact bin location, as with the classical PaP technique. Our contribution is threefold. First, we provide a new simulation environment for learning RL-based Pick-and-Throw (PaT) strategies for parallel grippers. Second, we use RL algorithms for learning this task in this environment resulting in 89% accuracy while speeding up the throughput by 51% in simulation. Third, we evaluate the performances of RL algorithms and compare them to a PaP and a state-of-the-art PaT method both in simulation and reality, learning only from simulation with domain randomisation and without fine tuning in reality to transfer our policies. This work shows the benefits of RL-based PaT compared to PaP or classical optimization PaT techniques used in the industry.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2406.13453

Country:

Europe > Belgium (0.14)
Europe > France (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry: Leisure & Entertainment > Games > Computer Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Behind the Myth of Exploration in Policy Gradients

Bolland, Adrien, Lambrechts, Gaspard, Ernst, Damien

arXiv.org Artificial IntelligenceJan-31-2024

Policy-gradient algorithms are effective reinforcement learning methods for solving control problems with continuous state and action spaces. To compute near-optimal policies, it is essential in practice to include exploration terms in the learning objective. Although the effectiveness of these terms is usually justified by an intrinsic need to explore environments, we propose a novel analysis and distinguish two different implications of these techniques. First, they make it possible to smooth the learning objective and to eliminate local optima while preserving the global maximum. Second, they modify the gradient estimates, increasing the probability that the stochastic parameter update eventually provides an optimal policy. In light of these effects, we discuss and illustrate empirically exploration strategies based on entropy bonuses, highlighting their limitations and opening avenues for future works in the design and analysis of such strategies.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2402.00162

Country:

Europe > France (0.14)
Europe > Belgium (0.14)

Genre: Research Report (0.40)

Industry: Energy > Oil & Gas > Upstream (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)

Add feedback

Warming up recurrent neural networks to maximise reachable multistability greatly improves learning

Lambrechts, Gaspard, De Geeter, Florent, Vecoven, Nicolas, Ernst, Damien, Drion, Guillaume

arXiv.org Artificial IntelligenceJul-20-2023

Training recurrent neural networks is known to be difficult when time dependencies become long. In this work, we show that most standard cells only have one stable equilibrium at initialisation, and that learning on tasks with long time dependencies generally occurs once the number of network stable equilibria increases; a property known as multistability. Multistability is often not easily attained by initially monostable networks, making learning of long time dependencies between inputs and outputs difficult. This insight leads to the design of a novel way to initialise any recurrent cell connectivity through a procedure called "warmup" to improve its capability to learn arbitrarily long time dependencies. This initialisation procedure is designed to maximise network reachable multistability, i.e., the number of equilibria within the network that can be reached through relevant input trajectories, in few gradient steps. We show on several information restitution, sequence classification, and reinforcement learning benchmarks that warming up greatly improves learning speed and performance, for multiple recurrent cells, but sometimes impedes precision. We therefore introduce a double-layer architecture initialised with a partial warmup that is shown to greatly improve learning of long time dependencies while maintaining high levels of precision. This approach provides a general framework for improving learning abilities of any recurrent cell when long time dependencies are present. We also show empirically that other initialisation and pretraining procedures from the literature implicitly foster reachable multistability of recurrent cells.

artificial intelligence, benchmark, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2106.01001

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Informed POMDP: Leveraging Additional Information in Model-Based RL

Lambrechts, Gaspard, Bolland, Adrien, Ernst, Damien

arXiv.org Artificial IntelligenceJun-24-2023

In this work, we generalize the problem of learning through interaction in a POMDP by accounting for eventual additional information available at training time. First, we introduce the informed POMDP, a new learning paradigm offering a clear distinction between the training information and the execution observation. Next, we propose an objective for learning a sufficient statistic from the history for the optimal control that leverages this information. We then show that this informed objective consists of learning an environment model from which we can sample latent trajectories. Finally, we show for the Dreamer algorithm that the convergence speed of the policies is sometimes greatly improved on several environments by using this informed environment model. Those results and the simplicity of the proposed adaptation advocate for a systematic consideration of eventual additional information when learning in a POMDP using model-based RL.

artificial intelligence, machine learning, pomdp, (14 more...)

arXiv.org Artificial Intelligence

2306.11488

Country: North America > United States > Hawaii (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback