Virgolin, Marco
Deep Generative Symbolic Regression with Monte-Carlo-Tree-Search
Kamienny, Pierre-Alexandre, Lample, Guillaume, Lamprier, Sylvain, Virgolin, Marco
Symbolic regression (SR) is the problem of learning a symbolic expression from numerical data. Recently, deep neural models trained on procedurally-generated synthetic datasets showed competitive performance compared to more classical Genetic Programming (GP) algorithms. Unlike their GP counterparts, these neural approaches are trained to generate expressions from datasets given as context. This allows them to produce accurate expressions in a single forward pass at test time. However, they usually do not benefit from search abilities, which result in low performance compared to GP on out-of-distribution datasets. In this paper, we propose a novel method which provides the best of both worlds, based on a Monte-Carlo Tree Search procedure using a context-aware neural mutation model, which is initially pre-trained to learn promising mutations, and further refined from successful experiences in an online fashion. The approach demonstrates state-of-the-art performance on the well-known \texttt{SRBench} benchmark.
Conversational Agents: Theory and Applications
Wahde, Mattias, Virgolin, Marco
In this chapter, we provide a review of conversational agents (CAs), discussing chatbots, intended for casual conversation with a user, as well as task-oriented agents that generally engage in discussions intended to reach one or several specific goals, often (but not always) within a specific domain. We also consider the concept of embodied conversational agents, briefly reviewing aspects such as character animation and speech processing. The many different approaches for representing dialogue in CAs are discussed in some detail, along with methods for evaluating such agents, emphasizing the important topics of accountability and interpretability. A brief historical overview is given, followed by an extensive overview of various applications, especially in the fields of health and education. We end the chapter by discussing benefits and potential risks regarding the societal impact of current and future CA technology.
The five Is: Key principles for interpretable and safe conversational AI
Wahde, Mattias, Virgolin, Marco
In this position paper, we present five key principles, namely interpretability, inherent capability to explain, independent data, interactive learning, and inquisitiveness, for the development of conversational AI that, unlike the currently popular black box approaches, is transparent and accountable. At present, there is a growing concern with the use of black box statistical language models: While displaying impressive average performance, such systems are also prone to occasional spectacular failures, for which there is no clear remedy. In an effort to initiate a discussion on possible alternatives, we outline and exemplify how our five principles enable the development of conversational AI systems that are transparent and thus safer for use. We also present some of the challenges inherent in the implementation of those principles.
Learning a Formula of Interpretability to Learn Interpretable Formulas
Virgolin, Marco, De Lorenzo, Andrea, Medvet, Eric, Randone, Francesca
Many risk-sensitive applications require Machine Learning (ML) models to be interpretable. Attempts to obtain interpretable models typically rely on tuning, by trial-and-error, hyper-parameters of model complexity that are only loosely related to interpretability. We show that it is instead possible to take a meta-learning approach: an ML model of non-trivial Proxies of Human Interpretability (PHIs) can be learned from human feedback, then this model can be incorporated within an ML training process to directly optimize for interpretability. We show this for evolutionary symbolic regression. We first design and distribute a survey finalized at finding a link between features of mathematical formulas and two established PHIs, simulatability and decomposability. Next, we use the resulting dataset to learn an ML model of interpretability. Lastly, we query this model to estimate the interpretability of evolving solutions within bi-objective genetic programming. We perform experiments on five synthetic and eight real-world symbolic regression problems, comparing to the traditional use of solution size minimization. The results show that the use of our model leads to formulas that are, for a same level of accuracy-interpretability trade-off, either significantly more or equally accurate. Moreover, the formulas are also arguably more interpretable. Given the very positive results, we believe that our approach represents an important stepping stone for the design of next-generation interpretable (evolutionary) ML algorithms.
Machine learning for automatic construction of pseudo-realistic pediatric abdominal phantoms
Virgolin, Marco, Wang, Ziyuan, Alderliesten, Tanja, Bosman, Peter A. N.
Machine Learning (ML) is proving extremely beneficial in many healthcare applications. In pediatric oncology, retrospective studies that investigate the relationship between treatment and late adverse effects still rely on simple heuristics. To assess the effects of radiation therapy, treatment plans are typically simulated on phantoms, i.e., virtual surrogates of patient anatomy. Currently, phantoms are built according to reasonable, yet simple, human-designed criteria. This often results in a lack of individualization. We present a novel approach that combines imaging and ML to build individualized phantoms automatically. Given the features of a patient treated historically (only 2D radiographs available), and a database of 3D Computed Tomography (CT) imaging with organ segmentations and relative patient features, our approach uses ML to predict how to assemble a patient-specific phantom automatically. Experiments on 60 abdominal CTs of pediatric patients show that our approach constructs significantly more representative phantoms than using current phantom building criteria, in terms of location and shape of the abdomen and of two considered organs, the liver and the spleen. Among several ML algorithms considered, the Gene-pool Optimal Mixing Evolutionary Algorithm for Genetic Programming (GP-GOMEA) is found to deliver the best performing models, which are, moreover, transparent and interpretable mathematical expressions.