Goto

Collaborating Authors

 Destercke, Sébastien


Reducing Aleatoric and Epistemic Uncertainty through Multi-modal Data Acquisition

arXiv.org Artificial Intelligence

To generate accurate and reliable predictions, modern AI systems need to combine data from multiple modalities, such as text, images, audio, spreadsheets, and time series. Multi-modal data introduces new opportunities and challenges for disentangling uncertainty: it is commonly assumed in the machine learning community that epistemic uncertainty can be reduced by collecting more data, while aleatoric uncertainty is irreducible. However, this assumption is challenged in modern AI systems when information is obtained from different modalities. This paper introduces an innovative data acquisition framework where uncertainty disentanglement leads to actionable decisions, allowing sampling in two directions: sample size and data modality. The main hypothesis is that aleatoric uncertainty decreases as the number of modalities increases, while epistemic uncertainty decreases by collecting more observations. We provide proof-of-concept implementations on two multi-modal datasets to showcase our data acquisition framework, which combines ideas from active learning, active feature acquisition and uncertainty quantification.


Guaranteed confidence-band enclosures for PDE surrogates

arXiv.org Artificial Intelligence

We propose a method for obtaining statistically guaranteed confidence bands for functional machine learning techniques: surrogate models which map between function spaces, motivated by the need build reliable PDE emulators. The method constructs nested confidence sets on a low-dimensional representation (an SVD) of the surrogate model's prediction error, and then maps these sets to the prediction space using set-propagation techniques. The result are conformal-like coverage guaranteed prediction sets for functional surrogate models. We use zonotopes as basis of the set construction, due to their well studied set-propagation and verification properties. The method is model agnostic and can thus be applied to complex Sci-ML models, including Neural Operators, but also in simpler settings. We also elicit a technique to capture the truncation error of the SVD, ensuring the guarantees of the method.


Multi-label Chaining with Imprecise Probabilities

arXiv.org Artificial Intelligence

We present two different strategies to extend the classical multi-label chaining approach to handle imprecise probability estimates. These estimates use convex sets of distributions (or credal sets) in order to describe our uncertainty rather than a precise one. The main reasons one could have for using such estimations are (1) to make cautious predictions (or no decision at all) when a high uncertainty is detected in the chaining and (2) to make better precise predictions by avoiding biases caused in early decisions in the chaining. We adapt both strategies to the case of the naive credal classifier, showing that this adaptations are computationally efficient. Our experimental results on missing labels, which investigate how reliable these predictions are in both approaches, indicate that our approaches produce relevant cautiousness on those hard-to-predict instances where the precise models fail.


Copula-based conformal prediction for Multi-Target Regression

arXiv.org Artificial Intelligence

The most common supervised task in machine learning is to learn a single-task, single-output prediction model. However, such a setting can be ill-adapted to some problems and applications. On the one hand, producing a single output can be undesirable when data is scarce and when producing reliable, possibly set-valued predictions is important (for instance in the medical domain where examples are very hard to collect for specific targets, and where predictions are used for critical decisions). Such an issue can be solved by using conformal prediction approaches [1]. It was initially proposed as a transductive online learning approach to provide set predictions (in the classification case) or interval predictions (in the case of regression) with a statistical guarantee depending on the probability of error tolerated by the user, but was then extended to handle inductive processes [2]. On the other hand, there are many situations where there are multiple, possibly correlated output variables to predict at once, and it is then natural to try to leverage such correlations to improve predictions. Such learning tasks are commonly called Multi-task in the literature [3]. Most research work on conformal prediction for multi-task learning focuses on the problem of multi-label prediction [4, 5], where each task is a binary classification one. Conformal prediction for multi-target regression has been less explored, with only a few studies dealing with it: Kuleshov et al. [6] provide a theoretical framework to use conformal predictors within manifold (e.g., to provide a mono-dimensional embedding of the multi-variate output), while Neeven and Smirnov [7] use a straightforward multi-target extension of a conformal single-output k-nearest neighbor regressor [8] to provide weather forecasts.


Epistemic Uncertainty Sampling

arXiv.org Machine Learning

Various strategies for active learning have been proposed in the machine learning literature. In uncertainty sampling, which is among the most popular approaches, the active learner sequentially queries the label of those instances for which its current prediction is maximally uncertain. The predictions as well as the measures used to quantify the degree of uncertainty, such as entropy, are almost exclusively of a probabilistic nature. In this paper, we advocate a distinction between two different types of uncertainty, referred to as epistemic and aleatoric, in the context of active learning. Roughly speaking, these notions capture the reducible and the irreducible part of the total uncertainty in a prediction, respectively. We conjecture that, in uncertainty sampling, the usefulness of an instance is better reflected by its epistemic than by its aleatoric uncertainty. This leads us to suggest the principle of "epistemic uncertainty sampling", which we instantiate by means of a concrete approach for measuring epistemic and aleatoric uncertainty. In experimental studies, epistemic uncertainty sampling does indeed show promising performance.


Querying Partially Labelled Data to Improve a K-nn Classifier

AAAI Conferences

When learning from instances whose output labels may be partial, the problem of knowing which of these output labels should be made precise to improve the accuracy of predictions arises. This problem can be seen as the intersection of two tasks: the one of learning from partial labels and the one of active learning, where the goal is to provide the labels of additional instances to improve the model accuracy. In this paper, we propose querying strategies of partial labels for the well-known K-nn classifier. We propose different criteria of increasing complexity, using among other things the amount of ambiguity that partial labels introduce in the K-nn decision rule. We then show that our strategies usually outperform simple baseline schemes, and that more complex strategies provide a faster improvement of the model accuracies.


Building an interpretable fuzzy rule base from data using Orthogonal Least Squares Application to a depollution problem

arXiv.org Artificial Intelligence

In many fields where human understanding plays a crucial role, such as bioprocesses, the capacity of extracting knowledge from data is of critical importance. Within this framework, fuzzy learning methods, if properly used, can greatly help human experts. Amongst these methods, the aim of orthogonal transformations, which have been proven to be mathematically robust, is to build rules from a set of training data and to select the most important ones by linear regression or rank revealing techniques. The OLS algorithm is a good representative of those methods. However, it was originally designed so that it only cared about numerical performance. Thus, we propose some modifications of the original method to take interpretability into account. After recalling the original algorithm, this paper presents the changes made to the original method, then discusses some results obtained from benchmark problems. Finally, the algorithm is applied to a real-world fault detection depollution problem.