Preux, Philippe
Evaluating Interpretable Reinforcement Learning by Distilling Policies into Programs
Kohler, Hector, Delfosse, Quentin, Radji, Waris, Akrour, Riad, Preux, Philippe
There exist applications of reinforcement learning like medicine where policies need to be ''interpretable'' by humans. User studies have shown that some policy classes might be more interpretable than others. However, it is costly to conduct human studies of policy interpretability. Furthermore, there is no clear definition of policy interpretabiliy, i.e., no clear metrics for interpretability and thus claims depend on the chosen definition. We tackle the problem of empirically evaluating policies interpretability without humans. Despite this lack of clear definition, researchers agree on the notions of ''simulatability'': policy interpretability should relate to how humans understand policy actions given states. To advance research in interpretable reinforcement learning, we contribute a new methodology to evaluate policy interpretability. This new methodology relies on proxies for simulatability that we use to conduct a large-scale empirical evaluation of policy interpretability. We use imitation learning to compute baseline policies by distilling expert neural networks into small programs. We then show that using our methodology to evaluate the baselines interpretability leads to similar conclusions as user studies. We show that increasing interpretability does not necessarily reduce performances and can sometimes increase them. We also show that there is no policy class that better trades off interpretability and performance across tasks making it necessary for researcher to have methodologies for comparing policies interpretability.
IDEQ: an improved diffusion model for the TSP
Basson, Mickael, Preux, Philippe
Recent years have seen a surge in machine learning models to solve combinatorial optimization (CO) problems. The field of combinatorial optimization is a historical field of research and application in computer science. After decades of progress, very efficient algorithms exist to provide exact solutions or approximate solutions to many CO problems. The Traveling Salesman Problem (TSP) stands as a prominent example of this fact: the TSP is very appealing as it is very simple to understand, it has a wide range of applications, and we are able to solve exactly rather large instances of this problem on a mere laptop (an instance defined over a few thousands cities can be solved within one hour), and we have approximate algorithms that are able to find tours that are very close to optimality (LKH3 [4] can solve instances of 40,000 cities in about one hour on a laptop but we have no guarantee that the result is optimal). These facts can not be forgotten when we try to propose alternate approaches to solve the TSP. Because of its appeal, the TSP has also drawn the attention of researchers in deep neural networks in the recent years. If the first attempts had difficulties solving even small TSP instances of a dozen cities, progress has been made. In this paper, we build on these previous works and go a step further.
Interpretable and Editable Programmatic Tree Policies for Reinforcement Learning
Kohler, Hector, Delfosse, Quentin, Akrour, Riad, Kersting, Kristian, Preux, Philippe
Deep reinforcement learning agents are prone to goal misalignments. The black-box nature of their policies hinders the detection and correction of such misalignments, and the trust necessary for real-world deployment. So far, solutions learning interpretable policies are inefficient or require many human priors. We propose INTERPRETER, a fast distillation method producing INTerpretable Editable tRee Programs for ReinforcEmenT lEaRning. We empirically demonstrate that INTERPRETER compact tree programs match oracles across a diverse set of sequential decision tasks and evaluate the impact of our design choices on interpretability and performances. We show that our policies can be interpreted and edited to correct misalignments on Atari games and to explain real farming strategies.
Towards a Research Community in Interpretable Reinforcement Learning: the InterpPol Workshop
Kohler, Hector, Delfosse, Quentin, Festor, Paul, Preux, Philippe
Embracing the pursuit of intrinsically explainable reinforcement learning raises crucial questions: what distinguishes explainability from interpretability? Should explainable and interpretable agents be developed outside of domains where transparency is imperative? What advantages do interpretable policies offer over neural networks? How can we rigorously define and measure interpretability in policies, without user studies? What reinforcement learning paradigms,are the most suited to develop interpretable agents? Can Markov Decision Processes integrate interpretable state representations? In addition to motivate an Interpretable RL community centered around the aforementioned questions, we propose the first venue dedicated to Interpretable RL: the InterpPol Workshop.
PAQA: Toward ProActive Open-Retrieval Question Answering
Erbacher, Pierre, Nie, Jian-Yun, Preux, Philippe, Soulier, Laure
Conversational systems have made significant progress in generating natural language responses. However, their potential as conversational search systems is currently limited due to their passive role in the information-seeking process. One major limitation is the scarcity of datasets that provide labelled ambiguous questions along with a supporting corpus of documents and relevant clarifying questions. This work aims to tackle the challenge of generating relevant clarifying questions by taking into account the inherent ambiguities present in both user queries and documents. To achieve this, we propose PAQA, an extension to the existing AmbiNQ dataset, incorporating clarifying questions. We then evaluate various models and assess how passage retrieval impacts ambiguity detection and the generation of clarifying questions. By addressing this gap in conversational search systems, we aim to provide additional supervision to enhance their active participation in the information-seeking process and provide users with more accurate results.
Decision Tree Search as a Markov Decision Problem
Kohler, Hector, Akrour, Riad, Preux, Philippe
Finding an optimal decision tree for a supervised learning task is a challenging combinatorial problem to solve at scale. It was recently proposed to frame the problem as a Markov Decision Problem (MDP) and use deep reinforcement learning to tackle scaling. Unfortunately, these methods are not competitive with the current branch-and-bound state-of-the-art. We propose instead to scale the resolution of such MDPs using an information-theoretic tests generating function that heuristically, and dynamically for every state, limits the set of admissible test actions to a few good candidates. As a solver, we show empirically that our algorithm is at the very least competitive with branch-and-bound alternatives. As a machine learning tool, a key advantage of our approach is to solve for multiple complexity-performance trade-offs at virtually no additional cost. With such a set of solutions, a user can then select the tree that generalizes best and which has the interpretability level that best suits their needs, which no current branch-and-bound method allows.
Limits of Actor-Critic Algorithms for Decision Tree Policies Learning in IBMDPs
Kohler, Hector, Akrour, Riad, Preux, Philippe
Interpretability of AI models allows for user safety checks to build trust in such AIs. In particular, Decision Trees (DTs) provide a global look at the learned model and transparently reveal which features of the input are critical for making a decision. However, interpretability is hindered if the DT is too large. To learn compact trees, a recent Reinforcement Learning (RL) framework has been proposed to explore the space of DTs using deep RL. This framework augments a decision problem (e.g. a supervised classification task) with additional actions that gather information about the features of an otherwise hidden input. By appropriately penalizing these actions, the agent learns to optimally trade-off size and performance of DTs. In practice, a reactive policy for a partially observable Markov decision process (MDP) needs to be learned, which is still an open problem. We show in this paper that deep RL can fail even on simple toy tasks of this class. However, when the underlying decision problem is a supervised classification task, we show that finding the optimal tree can be cast as a fully observable Markov decision problem and be solved efficiently, giving rise to a new family of algorithms for learning DTs that go beyond the classical greedy maximization ones.
Reinforcement-learning robotic sailboats: simulator and preliminary results
Vasconcellos, Eduardo Charles, Sampaio, Ronald M, Araújo, André P D, Clua, Esteban Walter Gonzales, Preux, Philippe, Guerra, Raphael, Gonçalves, Luiz M G, Martí, Luis, Lira, Hernan, Sanchez-Pi, Nayat
This work focuses on the main challenges and problems in developing a virtual oceanic environment reproducing real experiments using Unmanned Surface Vehicles (USV) digital twins. We introduce the key features for building virtual worlds, considering using Reinforcement Learning (RL) agents for autonomous navigation and control. With this in mind, the main problems concern the definition of the simulation equations (physics and mathematics), their effective implementation, and how to include strategies for simulated control and perception (sensors) to be used with RL. We present the modeling, implementation steps, and challenges required to create a functional digital twin based on a real robotic sailing vessel. The application is immediate for developing navigation algorithms based on RL to be applied on real boats.
Development and validation of an interpretable machine learning-based calculator for predicting 5-year weight trajectories after bariatric surgery: a multinational retrospective cohort SOPHIA study
Saux, Patrick, Bauvin, Pierre, Raverdy, Violeta, Teigny, Julien, Verkindt, Hélène, Soumphonphakdy, Tomy, Debert, Maxence, Jacobs, Anne, Jacobs, Daan, Monpellier, Valerie, Lee, Phong Ching, Lim, Chin Hong, Andersson-Assarsson, Johanna C, Carlsson, Lena, Svensson, Per-Arne, Galtier, Florence, Dezfoulian, Guelareh, Moldovanu, Mihaela, Andrieux, Severine, Couster, Julien, Lepage, Marie, Lembo, Erminia, Verrastro, Ornella, Robert, Maud, Salminen, Paulina, Mingrone, Geltrude, Peterli, Ralph, Cohen, Ricardo V, Zerrweck, Carlos, Nocca, David, Roux, Carel W Le, Caiazzo, Robert, Preux, Philippe, Pattou, François
Background Weight loss trajectories after bariatric surgery vary widely between individuals, and predicting weight loss before the operation remains challenging. We aimed to develop a model using machine learning to provide individual preoperative prediction of 5-year weight loss trajectories after surgery. Methods In this multinational retrospective observational study we enrolled adult participants (aged $\ge$18 years) from ten prospective cohorts (including ABOS [NCT01129297], BAREVAL [NCT02310178], the Swedish Obese Subjects study, and a large cohort from the Dutch Obesity Clinic [Nederlandse Obesitas Kliniek]) and two randomised trials (SleevePass [NCT00793143] and SM-BOSS [NCT00356213]) in Europe, the Americas, and Asia, with a 5 year followup after Roux-en-Y gastric bypass, sleeve gastrectomy, or gastric band. Patients with a previous history of bariatric surgery or large delays between scheduled and actual visits were excluded. The training cohort comprised patients from two centres in France (ABOS and BAREVAL). The primary outcome was BMI at 5 years. A model was developed using least absolute shrinkage and selection operator to select variables and the classification and regression trees algorithm to build interpretable regression trees. The performances of the model were assessed through the median absolute deviation (MAD) and root mean squared error (RMSE) of BMI. Findings10 231 patients from 12 centres in ten countries were included in the analysis, corresponding to 30 602 patient-years. Among participants in all 12 cohorts, 7701 (75$\bullet$3%) were female, 2530 (24$\bullet$7%) were male. Among 434 baseline attributes available in the training cohort, seven variables were selected: height, weight, intervention type, age, diabetes status, diabetes duration, and smoking status. At 5 years, across external testing cohorts the overall mean MAD BMI was 2$\bullet$8 kg/m${}^2$ (95% CI 2$\bullet$6-3$\bullet$0) and mean RMSE BMI was 4$\bullet$7 kg/m${}^2$ (4$\bullet$4-5$\bullet$0), and the mean difference between predicted and observed BMI was-0$\bullet$3 kg/m${}^2$ (SD 4$\bullet$7). This model is incorporated in an easy to use and interpretable web-based prediction tool to help inform clinical decision before surgery. InterpretationWe developed a machine learning-based model, which is internationally validated, for predicting individual 5-year weight loss trajectories after three common bariatric interventions.
AdaStop: sequential testing for efficient and reliable comparisons of Deep RL Agents
Mathieu, Timothée, Della Vecchia, Riccardo, Shilova, Alena, de Medeiros, Matheus Centa, Kohler, Hector, Maillard, Odalric-Ambrym, Preux, Philippe
The reproducibility of many experimental results in Deep Reinforcement Learning (RL) is under question. To solve this reproducibility crisis, we propose a theoretically sound methodology to compare multiple Deep RL algorithms. The performance of one execution of a Deep RL algorithm is random so that independent executions are needed to assess it precisely. When comparing several RL algorithms, a major question is how many executions must be made and how can we assure that the results of such a comparison is theoretically sound. Researchers in Deep RL often use less than 5 independent executions to compare algorithms: we claim that this is not enough in general. Moreover, when comparing several algorithms at once, the error of each comparison accumulates and must be taken into account with a multiple tests procedure to preserve low error guarantees. To address this problem in a statistically sound way, we introduce AdaStop, a new statistical test based on multiple group sequential tests. When comparing algorithms, AdaStop adapts the number of executions to stop as early as possible while ensuring that we have enough information to distinguish algorithms that perform better than the others in a statistical significant way. We prove both theoretically and empirically that AdaStop has a low probability of making an error (Family-Wise Error). Finally, we illustrate the effectiveness of AdaStop in multiple use-cases, including toy examples and difficult cases such as Mujoco environments.