AITopics | Leite, Alessandro

Collaborating Authors

Leite, Alessandro

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Evolutionary Pre-Prompt Optimization for Mathematical Reasoning

Videau, Mathurin, Leite, Alessandro, Schoenauer, Marc, Teytaud, Olivier

arXiv.org Artificial IntelligenceDec-5-2024

However, despite their size and complexity, these models still face challenges in multi-step reasoning, particularly in tasks that require arithmetic, logic, and/or mathematical reasoning [Cobbe et al. 2021; Rae et al. 2021]. To address this limitation, recent works have focused on enhancing the reasoning abilities of LLMs. A significant advancement in this direction is the chain-of-thought (CoT) prompting method [Wei et al. 2022b]. This approach involves guiding LLMs to articulate intermediate reasoning steps in a manner akin to human thought processes, leading to more accurate and interpretable solutions. This method has shown substantial improvements on complex tasks, including mathematics and commonsense reasoning [Lu et al. 2022b; Suzgun et al. 2022; Wei et al. 2022b]. The advancement of the CoT prompting has opened new pathways in the design of effective CoT prompts [Fu et al. 2022; Jiang et al. 2023; Kojima et al. 2022; Zhou et al. 2022].

large language model, machine learning, publication date, (17 more...)

arXiv.org Artificial Intelligence

2412.04291

Country:

North America > United States (0.14)
Asia (0.14)

Genre: Research Report > New Finding (0.67)

Industry: Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
(2 more...)

Add feedback

Mixture of Experts in Image Classification: What's the Sweet Spot?

Videau, Mathurin, Leite, Alessandro, Schoenauer, Marc, Teytaud, Olivier

arXiv.org Artificial IntelligenceNov-27-2024

Mixture-of-Experts (MoE) models have shown promising potential for parameter-efficient scaling across various domains. However, the implementation in computer vision remains limited, and often requires large-scale datasets comprising billions of samples. In this study, we investigate the integration of MoE within computer vision models and explore various MoE configurations on open datasets. When introducing MoE layers in image classification, the best results are obtained for models with a moderate number of activated parameters per sample. However, such improvements gradually vanish when the number of parameters per sample increases.

architecture, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2411.18322

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.46)

Add feedback

Evolutionary Retrofitting

Videau, Mathurin, Zameshina, Mariia, Leite, Alessandro, Najman, Laurent, Schoenauer, Marc, Teytaud, Olivier

arXiv.org Artificial IntelligenceOct-15-2024

AfterLearnER (After Learning Evolutionary Retrofitting) consists in applying non-differentiable optimization, including evolutionary methods, to refine fully-trained machine learning models by optimizing a set of carefully chosen parameters or hyperparameters of the model, with respect to some actual, exact, and hence possibly non-differentiable error signal, performed on a subset of the standard validation set. The efficiency of AfterLearnER is demonstrated by tackling non-differentiable signals such as threshold-based criteria in depth sensing, the word error rate in speech re-synthesis, image quality in 3D generative adversarial networks (GANs), image generation via Latent Diffusion Models (LDM), the number of kills per life at Doom, computational accuracy or BLEU in code translation, and human appreciations in image synthesis. In some cases, this retrofitting is performed dynamically at inference time by taking into account user inputs. The advantages of AfterLearnER are its versatility (no gradient is needed), the possibility to use non-differentiable feedback including human evaluations, the limited overfitting, supported by a theoretical study and its anytime behavior. Last but not least, AfterLearnER requires only a minimal amount of feedback, i.e., a few dozens to a few hundreds of scalars, rather than the tens of thousands needed in most related published works. Compared to fine-tuning (typically using the same loss, and gradient-based optimization on a smaller but still big dataset at a fine grain), AfterLearnER uses a minimum amount of data on the real objective function without requiring differentiability.

evolutionary algorithm, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2410.1133

Country:

Europe (0.68)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)

Add feedback

Learning Structural Causal Models through Deep Generative Models: Methods, Guarantees, and Challenges

Poinsot, Audrey, Leite, Alessandro, Chesneau, Nicolas, Sébag, Michèle, Schoenauer, Marc

arXiv.org Machine LearningMay-8-2024

This paper provides a comprehensive review of deep structural causal models (DSCMs), particularly focusing on their ability to answer counterfactual queries using observational data within known causal structures. It delves into the characteristics of DSCMs by analyzing the hypotheses, guarantees, and applications inherent to the underlying deep learning components and structural causal models, fostering a finer understanding of their capabilities and limitations in addressing different counterfactual queries. Furthermore, it highlights the challenges and open questions in the field of deep structural causal modeling. It sets the stages for researchers to identify future work directions and for practitioners to get an overview in order to find out the most appropriate methods for their needs.

artificial intelligence, conference, machine learning, (18 more...)

arXiv.org Machine Learning

2405.05025

Country: Europe (0.14)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.68)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.41)

Add feedback

Conformal Approach To Gaussian Process Surrogate Evaluation With Coverage Guarantees

Jaber, Edgar, Blot, Vincent, Brunel, Nicolas, Chabridon, Vincent, Remy, Emmanuel, Iooss, Bertrand, Lucor, Didier, Mougeot, Mathilde, Leite, Alessandro

arXiv.org Artificial IntelligenceJan-15-2024

Gaussian processes (GPs) are a Bayesian machine learning approach widely used to construct surrogate models for the uncertainty quantification of computer simulation codes in industrial applications. It provides both a mean predictor and an estimate of the posterior prediction variance, the latter being used to produce Bayesian credibility intervals. Interpreting these intervals relies on the Gaussianity of the simulation model as well as the well-specification of the priors which are not always appropriate. We propose to address this issue with the help of conformal prediction. In the present work, a method for building adaptive cross-conformal prediction intervals is proposed by weighting the non-conformity score with the posterior standard deviation of the GP. The resulting conformal prediction intervals exhibit a level of adaptivity akin to Bayesian credibility sets and display a significant correlation with the surrogate model local approximation error, while being free from the underlying model assumptions and having frequentist coverage guarantees. These estimators can thus be used for evaluating the quality of a GP surrogate model and can assist a decision-maker in the choice of the best prior for the specific application of the GP. The performance of the method is illustrated through a panel of numerical examples based on various reference databases. Moreover, the potential applicability of the method is demonstrated in the context of surrogate modeling of an expensive-to-evaluate simulator of the clogging phenomenon in steam generators of nuclear reactors.

artificial intelligence, correlation, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2401.07733

Country:

Europe > France (0.46)
North America > United States (0.46)

Genre: Research Report (0.82)

Industry: Energy > Power Industry > Utilities (1.00)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.45)

Add feedback

A Guide for Practical Use of ADMG Causal Data Augmentation

Poinsot, Audrey, Leite, Alessandro

arXiv.org Artificial IntelligenceApr-3-2023

Data augmentation is essential when applying Machine Learning in small-data regimes. It generates new samples following the observed data distribution while increasing their diversity and variability to help researchers and practitioners improve their models' robustness and, thus, deploy them in the real world. Nevertheless, its usage in tabular data still needs to be improved, as prior knowledge about the underlying data mechanism is seldom considered, limiting the fidelity and diversity of the generated data. Causal data augmentation strategies have been pointed out as a solution to handle these challenges by relying on conditional independence encoded in a causal graph. In this context, this paper experimentally analyzed the ADMG causal augmentation method considering different settings to support researchers and practitioners in understanding under which conditions prior knowledge helps generate new data points and, consequently, enhances the robustness of their models. The results highlighted that the studied method (a) is independent of the underlying model mechanism, (b) requires a minimal number of observations that may be challenging in a small-data regime to improve an ML model's accuracy, (c) propagates outliers to the augmented set degrading the performance of the model, and (d) is sensitive to its hyperparameter's value.

artificial intelligence, causal graph, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2304.01237

Country: Europe (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.36)

Add feedback