Lamprier, Sylvain
A Transformer Model for Predicting Chemical Reaction Products from Generic Templates
Ozer, Derin, Lamprier, Sylvain, Cauchy, Thomas, Gutowski, Nicolas, Da Mota, Benoit
The accurate prediction of chemical reaction outcomes is a major challenge in computational chemistry. Current models rely heavily on either highly specific reaction templates or template-free methods, both of which present limitations. To address these limitations, this work proposes the Broad Reaction Set (BRS), a dataset featuring 20 generic reaction templates that allow for the efficient exploration of the chemical space. Additionally, ProPreT5 is introduced, a T5 model tailored to chemistry that achieves a balance between rigid templates and template-free methods. ProPreT5 demonstrates its capability to generate accurate, valid, and realistic reaction products, making it a promising solution that goes beyond the current state-of-the-art on the complex reaction product prediction task.
Structural Deep Encoding for Table Question Answering
Mouravieff, Raphaël, Piwowarski, Benjamin, Lamprier, Sylvain
Although Transformers-based architectures excel at processing textual information, their naive adaptation for tabular data often involves flattening the table structure. This simplification can lead to the loss of essential inter-dependencies between rows, columns, and cells, while also posing scalability challenges for large tables. To address these issues, prior works have explored special tokens, structured embeddings, and sparse attention patterns. In this paper, we conduct a comprehensive analysis of tabular encoding techniques, which highlights the crucial role of attention sparsity in preserving structural information of tables. We also introduce a set of novel sparse attention mask designs for tabular data, that not only enhance computational efficiency but also preserve structural integrity, leading to better overall performance.
MAGELLAN: Metacognitive predictions of learning progress guide autotelic LLM agents in large goal spaces
Gaven, Loris, Carta, Thomas, Romac, Clément, Colas, Cédric, Lamprier, Sylvain, Sigaud, Olivier, Oudeyer, Pierre-Yves
Open-ended learning agents must efficiently prioritize goals in vast possibility spaces, focusing on those that maximize learning progress (LP). When such autotelic exploration is achieved by LLM agents trained with online RL in high-dimensional and evolving goal spaces, a key challenge for LP prediction is modeling one's own competence, a form of metacognitive monitoring. Traditional approaches either require extensive sampling or rely on brittle expert-defined goal groupings. We introduce MAGELLAN, a metacognitive framework that lets LLM agents learn to predict their competence and LP online. By capturing semantic relationships between goals, MAGELLAN enables sample-efficient LP estimation and dynamic adaptation to evolving goal spaces through generalization. In an interactive learning environment, we show that MAGELLAN improves LP prediction efficiency and goal prioritization, being the only method allowing the agent to fully master a large and evolving goal space. These results demonstrate how augmenting LLM agents with a metacognitive ability for LP predictions can effectively scale curriculum learning to open-ended goal spaces.
Navigation with QPHIL: Quantizing Planner for Hierarchical Implicit Q-Learning
Canesse, Alexi, Petitbois, Mathieu, Denoyer, Ludovic, Lamprier, Sylvain, Portelas, Rémy
Offline Reinforcement Learning (RL) has emerged as a powerful alternative to imitation learning for behavior modeling in various domains, particularly in complex navigation tasks. An existing challenge with Offline RL is the signal-to-noise ratio, i.e. how to mitigate incorrect policy updates due to errors in value estimates. Towards this, multiple works have demonstrated the advantage of hierarchical offline RL methods, which decouples high-level path planning from low-level path following. In this work, we present a novel hierarchical transformer-based approach leveraging a learned quantizer of the space. This quantization enables the training of a simpler zone-conditioned low-level policy and simplifies planning, which is reduced to discrete autoregressive prediction. Among other benefits, zone-level reasoning in planning enables explicit trajectory stitching rather than implicit stitching based on noisy value function estimates. By combining this transformer-based planner with recent advancements in offline RL, our proposed approach achieves state-of-the-art results in complex long-distance navigation environments.
Reinforcement Learning for Aligning Large Language Models Agents with Interactive Environments: Quantifying and Mitigating Prompt Overfitting
Aissi, Mohamed Salim, Romac, Clement, Carta, Thomas, Lamprier, Sylvain, Oudeyer, Pierre-Yves, Sigaud, Olivier, Soulier, Laure, Thome, Nicolas
Reinforcement learning (RL) is a promising approach for aligning large language models (LLMs) knowledge with sequential decision-making tasks. However, few studies have thoroughly investigated the impact on LLM agents capabilities of fine-tuning them with RL in a specific environment. In this paper, we propose a novel framework to analyze the sensitivity of LLMs to prompt formulations following RL training in a textual environment. Our findings reveal that the performance of LLMs degrades when faced with prompt formulations different from those used during the RL training phase. Besides, we analyze the source of this sensitivity by examining the model's internal representations and salient tokens. Finally, we propose to use a contrastive loss to mitigate this sensitivity and improve the robustness and generalization capabilities of LLMs.
SAC-GLAM: Improving Online RL for LLM agents with Soft Actor-Critic and Hindsight Relabeling
Gaven, Loris, Romac, Clement, Carta, Thomas, Lamprier, Sylvain, Sigaud, Olivier, Oudeyer, Pierre-Yves
The past years have seen Large Language Models (LLMs) strive not only as generative models but also as agents solving textual sequential decision-making tasks. When facing complex environments where their zero-shot abilities are insufficient, recent work showed online Reinforcement Learning (RL) could be used for the LLM agent to discover and learn efficient strategies interactively. However, most prior work sticks to on-policy algorithms, which greatly reduces the scope of methods such agents could use for both exploration and exploitation, such as experience replay and hindsight relabeling. Yet, such methods may be key for LLM learning agents, and in particular when designing autonomous intrinsically motivated agents sampling and pursuing their own goals (i.e. autotelic agents). This paper presents and studies an adaptation of Soft Actor-Critic and hindsight relabeling to LLM agents. Our method not only paves the path towards autotelic LLM agents that learn online but can also outperform on-policy methods in more classic multi-goal RL environments.
Robust Deep Reinforcement Learning Through Adversarial Attacks and Training : A Survey
Schott, Lucas, Delas, Josephine, Hajri, Hatem, Gherbi, Elies, Yaich, Reda, Boulahia-Cuppens, Nora, Cuppens, Frederic, Lamprier, Sylvain
The advent of Deep Reinforcement Learning (DRL) has marked a significant shift in various fields, including games [1-3], autonomous robotics [4], autonomous driving [5], and energy management [6]. By integrating Reinforcement Learning (RL) with Deep Neural Networks (DNN), DRL can leverages high dimensional continuous observations and rewards to train neural policies, without the need for supervised example trajectories. While DRL achieves remarkable performances in well known controlled environments, it also encounter challenges in ensuring robust performance amid diverse condition changes and real-world perturbations. It particularly struggle to bridge the reality gap [7, 8], often DRL agents are trained in simulation that remains an imitation of the real-world, resulting in a gap between the performance of a trained agent in the simulation and its performance once transferred to the real-world application. Even without trying to bridge the reality gap, agents can be trained in the first place in some conditions, and be deployed later and the conditions may have changed since.
Training Table Question Answering via SQL Query Decomposition
Mouravieff, Raphaël, Piwowarski, Benjamin, Lamprier, Sylvain
Table Question-Answering involves both understanding the natural language query and grounding it in the context of the input table to extract the relevant information. In this context, many methods have highlighted the benefits of intermediate pre-training from SQL queries. However, while most approaches aim at generating final answers from inputs directly, we claim that there is better to do with SQL queries during training. By learning to imitate a restricted portion of SQL-like algebraic operations, we show that their execution flow provides intermediate supervision steps that allow increased generalization and structural reasoning compared with classical approaches of the field. Our study bridges the gap between semantic parsing and direct answering methods and provides useful insights regarding what types of operations should be predicted by a generative architecture or be preferably executed by an external algorithm.
Deinterleaving of Discrete Renewal Process Mixtures with Application to Electronic Support Measures
Pinsolle, Jean, Goudet, Olivier, Enderli, Cyrille, Lamprier, Sylvain, Hao, Jin-Kao
In this paper, we propose a new deinterleaving method for mixtures of discrete renewal Markov chains. This method relies on the maximization of a penalized likelihood score. It exploits all available information about both the sequence of the different symbols and their arrival times. A theoretical analysis is carried out to prove that minimizing this score allows to recover the true partition of symbols in the large sample limit, under mild conditions on the component processes. This theoretical analysis is then validated by experiments on synthetic data. Finally, the method is applied to deinterleave pulse trains received from different emitters in a RESM (Radar Electronic Support Measurements) context and we show that the proposed method competes favorably with state-of-the-art methods on simulated warfare datasets.
On the Fairness ROAD: Robust Optimization for Adversarial Debiasing
Grari, Vincent, Laugel, Thibault, Hashimoto, Tatsunori, Lamprier, Sylvain, Detyniecki, Marcin
In the field of algorithmic fairness, significant attention has been put on group fairness criteria, such as Demographic Parity and Equalized Odds. Nevertheless, these objectives, measured as global averages, have raised concerns about persistent local disparities between sensitive groups. In this work, we address the problem of local fairness, which ensures that the predictor is unbiased not only in terms of expectations over the whole population, but also within any subregion of the feature space, unknown at training time. To enforce this objective, we introduce ROAD, a novel approach that leverages the Distributionally Robust Optimization (DRO) framework within a fair adversarial learning objective, where an adversary tries to infer the sensitive attribute from the predictions. Using an instance-level re-weighting strategy, ROAD is designed to prioritize inputs that are likely to be locally unfair, i.e. where the adversary faces the least difficulty in reconstructing the sensitive attribute. Numerical experiments demonstrate the effectiveness of our method: it achieves Pareto dominance with respect to local fairness and accuracy for a given global fairness level across three standard datasets, and also enhances fairness generalization under distribution shift.