AITopics | Bach, Francis

Collaborating Authors

Bach, Francis

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Minimum Volume Conformal Sets for Multivariate Regression

Braun, Sacha, Aolaritei, Liviu, Jordan, Michael I., Bach, Francis

arXiv.org Machine LearningMar-24-2025

Conformal prediction provides a principled framework for constructing predictive sets with finite-sample validity. While much of the focus has been on univariate response variables, existing multivariate methods either impose rigid geometric assumptions or rely on flexible but computationally expensive approaches that do not explicitly optimize prediction set volume. We propose an optimization-driven framework based on a novel loss function that directly learns minimum-volume covering sets while ensuring valid coverage. This formulation naturally induces a new nonconformity score for conformal prediction, which adapts to the residual distribution and covariates. Our approach optimizes over prediction sets defined by arbitrary norm balls, including single and multi-norm formulations. Additionally, by jointly optimizing both the predictive model and predictive uncertainty, we obtain prediction sets that are tight, informative, and computationally efficient, as demonstrated in our experiments on real-world datasets.

artificial intelligence, machine learning, prediction, (17 more...)

arXiv.org Machine Learning

2503.19068

Country:

Europe > France (0.28)
North America > Canada (0.28)

Genre:

Research Report > New Finding (0.67)
Instructional Material > Course Syllabus & Notes (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Add feedback

E-Values Expand the Scope of Conformal Prediction

Gauthier, Etienne, Bach, Francis, Jordan, Michael I.

arXiv.org Machine LearningMar-18-2025

Conformal prediction is a powerful framework for distribution-free uncertainty quantification. The standard approach to conformal prediction relies on comparing the ranks of prediction scores: under exchangeability, the rank of a future test point cannot be too extreme relative to a calibration set. This rank-based method can be reformulated in terms of p-values. In this paper, we explore an alternative approach based on e-values, known as conformal e-prediction. E-values offer key advantages that cannot be achieved with p-values, enabling new theoretical and practical capabilities. In particular, we present three applications that leverage the unique strengths of e-values: batch anytime-valid conformal prediction, fixed-size conformal sets with data-dependent coverage, and conformal prediction under ambiguous ground truth. Overall, these examples demonstrate that e-value-based constructions provide a flexible expansion of the toolbox of conformal prediction.

conformal, conformal prediction, prediction, (13 more...)

arXiv.org Machine Learning

2503.1305

Country:

Europe (0.67)
North America > United States > California (0.14)
North America > Canada > Ontario > Toronto (0.14)

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Optimal Denoising in Score-Based Generative Models: The Role of Data Regularity

Beyler, Eliot, Bach, Francis

arXiv.org Machine LearningMar-17-2025

Score-based generative models achieve state-of-the-art sampling performance by denoising a distribution perturbed by Gaussian noise. In this paper, we focus on a single deterministic denoising step, and compare the optimal denoiser for the quadratic loss, we name ''full-denoising'', to the alternative ''half-denoising'' introduced by Hyv{\"a}rinen (2024). We show that looking at the performances in term of distance between distribution tells a more nuanced story, with different assumptions on the data leading to very different conclusions. We prove that half-denoising is better than full-denoising for regular enough densities, while full-denoising is better for singular densities such as mixtures of Dirac measures or densities supported on a low-dimensional subspace. In the latter case, we prove that full-denoising can alleviate the curse of dimensionality under a linear manifold hypothesis.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Machine Learning

2503.12966

Country: Europe > France (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.62)

Add feedback

Convergence of Shallow ReLU Networks on Weakly Interacting Data

Dana, Léo, Bach, Francis, Pillaud-Vivien, Loucas

arXiv.org Machine LearningFeb-24-2025

Understanding the properties of models used in machine learning is crucial for providing guarantees to downstream users. Of particular importance, the convergence of the training process under gradient methods stands as one of the first issues to address in order to comprehend them. If, on the one hand, such a question for linear models and convex optimization problems (Bottou et al., 2018; Bach, 2024) are well understood, this is not the case for neural networks, which are the most used models in large-scale machine learning. This paper focuses on providing quantitative convergence guarantees for a one-hidden-layer neural network. Theoretically, such global convergence analysis of neural networks has seen two main achievements in the past years: (i) the identification of the lazy regime, due to a particular initialization, where convergence is always guaranteed at the cost of being essentially a linear model (Jacot et al., 2018; Arora et al., 2019; Chizat et al., 2019), and (ii) the proof that with an infinite amount of hidden units a two-layer neural network converges towards the global minimizer of the loss (Mei et al., 2018; Chizat and Bach, 2018; Rotskoff and Vanden-Eijnden, 2018). However, neural networks are trained in practice outside of these regimes, as neural networks are known to perform feature learning, and experimentally reach global minimum with a large but finite number of neurons. Quantifying in which regimes neural networks converge to a global minimum of their loss is still an important open question. 1

artificial intelligence, initialization, machine learning, (16 more...)

arXiv.org Machine Learning

2502.16977

Country:

North America > United States (0.14)
Europe > France (0.14)

Genre: Research Report (0.63)

Industry: Government > Regional Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Building Bridges between Regression, Clustering, and Classification

Stewart, Lawrence, Bach, Francis, Berthet, Quentin

arXiv.org Machine LearningFeb-18-2025

Regression, the task of predicting a continuous scalar target y based on some features x is one of the most fundamental tasks in machine learning and statistics. It has been observed and theoretically analyzed that the classical approach, meansquared error minimization, can lead to suboptimal results when training neural networks. In this work, we propose a new method to improve the training of these models on regression tasks, with continuous scalar targets. Our method is based on casting this task in a different fashion, using a target encoder, and a prediction decoder, inspired by approaches in classification and clustering. We showcase the performance of our method on a wide range of real-world datasets.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Machine Learning

2502.02996

Country: Europe > France (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Spectral structure learning for clinical time series

Lerner, Ivan, Burgun, Anita, Bach, Francis

arXiv.org Artificial IntelligenceFeb-17-2025

We develop and evaluate a structure learning algorithm for clinical time series. Clinical time series are multivariate time series observed in multiple patients and irregularly sampled, challenging existing structure learning algorithms. We assume that our times series are realizations of StructGP, a k-dimensional multi-output or multi-task stationary Gaussian process (GP), with independent patients sharing the same covariance function. StructGP encodes ordered conditional relations between time series, represented in a directed acyclic graph. We implement an adapted NOTEARS algorithm, which based on a differentiable definition of acyclicity, recovers the graph by solving a series of continuous optimization problems. Simulation results show that up to mean degree 3 and 20 tasks, we reach a median recall of 0.93% [IQR, 0.86, 0.97] while keeping a median precision of 0.71% [0.57-0.84], for recovering directed edges. We further show that the regularization path is key to identifying the graph. With StructGP, we proposed a model of time series dependencies, that flexibly adapt to different time series regularity, while enabling us to learn these dependencies from observations.

clinical time sery, optimization problem, spectral structure, (2 more...)

arXiv.org Artificial Intelligence

2502.1168

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.80)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.53)

Add feedback

Forecasting time series with constraints

Doumèche, Nathan, Bach, Francis, Bedek, Éloi, Biau, Gérard, Boyer, Claire, Goude, Yannig

arXiv.org Machine LearningFeb-14-2025

Time series forecasting presents unique challenges that limit the effectiveness of traditional machine learning algorithms. To address these limitations, various approaches have incorporated linear constraints into learning algorithms, such as generalized additive models and hierarchical forecasting. In this paper, we propose a unified framework for integrating and combining linear constraints in time series forecasting. Within this framework, we show that the exact minimizer of the constrained empirical risk can be computed efficiently using linear algebra alone. This approach allows for highly scalable implementations optimized for GPUs. We validate the proposed methodology through extensive benchmarking on real-world tasks, including electricity demand forecasting and tourism forecasting, achieving state-of-the-art performance.

artificial intelligence, constraint, machine learning, (17 more...)

arXiv.org Machine Learning

2502.10485

Country: Europe (0.46)

Genre: Research Report (1.00)

Industry:

Energy > Power Industry (1.00)
Transportation > Ground > Road (0.67)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

An Uncertainty Principle for Linear Recurrent Neural Networks

François, Alexandre, Orvieto, Antonio, Bach, Francis

arXiv.org Artificial IntelligenceFeb-13-2025

We consider linear recurrent neural networks, which have become a key building block of sequence modeling due to their ability for stable and effective long-range modeling. In this paper, we aim at characterizing this ability on a simple but core copy task, whose goal is to build a linear filter of order $S$ that approximates the filter that looks $K$ time steps in the past (which we refer to as the shift-$K$ filter), where $K$ is larger than $S$. Using classical signal models and quadratic cost, we fully characterize the problem by providing lower bounds of approximation, as well as explicit filters that achieve this lower bound up to constants. The optimal performance highlights an uncertainty principle: the optimal filter has to average values around the $K$-th time step in the past with a range~(width) that is proportional to $K/S$.

artificial intelligence, linear recurrent neural network, machine learning, (1 more...)

arXiv.org Artificial Intelligence

2502.09287

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.60)

Add feedback

Statistical Collusion by Collectives on Learning Platforms

Gauthier, Etienne, Bach, Francis, Jordan, Michael I.

arXiv.org Machine LearningFeb-7-2025

As platforms increasingly rely on learning algorithms, collectives may form and seek ways to influence these platforms to align with their own interests. This can be achieved by coordinated submission of altered data. To evaluate the potential impact of such behavior, it is essential to understand the computations that collectives must perform to impact platforms in this way. In particular, collectives need to make a priori assessments of the effect of the collective before taking action, as they may face potential risks when modifying their data. Moreover they need to develop implementable coordination algorithms based on quantities that can be inferred from observed data. We develop a framework that provides a theoretical and algorithmic treatment of these issues and present experimental results in a product evaluation domain.

artificial intelligence, machine learning, platform, (19 more...)

arXiv.org Machine Learning

2502.04879

Country:

Europe (0.46)
North America > United States > California (0.14)

Genre: Research Report (1.00)

Industry:

Automobiles & Trucks (1.00)
Government > Regional Government (0.46)
Transportation > Passenger (0.46)
Transportation > Ground > Road (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Sampling Binary Data by Denoising through Score Functions

Bach, Francis, Saremi, Saeed

arXiv.org Machine LearningFeb-1-2025

Gaussian smoothing combined with a probabilistic framework for denoising via the empirical Bayes formalism, i.e., the Tweedie-Miyasawa formula (TMF), are the two key ingredients in the success of score-based generative models in Euclidean spaces. Smoothing holds the key for easing the problem of learning and sampling in high dimensions, denoising is needed for recovering the original signal, and TMF ties these together via the score function of noisy data. In this work, we extend this paradigm to the problem of learning and sampling the distribution of binary data on the Boolean hypercube by adopting Bernoulli noise, instead of Gaussian noise, as a smoothing device. We first derive a TMF-like expression for the optimal denoiser for the Hamming loss, where a score function naturally appears. Sampling noisy binary data is then achieved using a Langevin-like sampler which we theoretically analyze for different noise levels. At high Bernoulli noise levels sampling becomes easy, akin to log-concave sampling in Euclidean spaces. In addition, we extend the sequential multi-measurement sampling of Saremi et al. (2024) to the binary setting where we can bring the "effective noise" down by sampling multiple noisy measurements at a fixed noise level, without the need for continuous-time stochastic processes. We validate our formalism and theoretical findings by experiments on synthetic data and binarized images.

artificial intelligence, machine learning, stationary distribution, (16 more...)

arXiv.org Machine Learning

2502.00557

Country: Europe > France (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.34)

Add feedback