AITopics | Mourtada, Jaouad

Collaborating Authors

Mourtada, Jaouad

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Finite-sample performance of the maximum likelihood estimator in logistic regression

Chardon, Hugo, Lerasle, Matthieu, Mourtada, Jaouad

arXiv.org Machine LearningDec-5-2024

Logistic regression is a classical model for describing the probabilistic dependence of binary responses to multivariate covariates. We consider the predictive performance of the maximum likelihood estimator (MLE) for logistic regression, assessed in terms of logistic risk. We consider two questions: first, that of the existence of the MLE (which occurs when the dataset is not linearly separated), and second that of its accuracy when it exists. These properties depend on both the dimension of covariates and on the signal strength. In the case of Gaussian covariates and a well-specified logistic model, we obtain sharp non-asymptotic guarantees for the existence and excess logistic risk of the MLE. We then generalize these results in two ways: first, to non-Gaussian covariates satisfying a certain two-dimensional margin condition, and second to the general case of statistical learning with a possibly misspecified logistic model. Finally, we consider the case of a Bernoulli design, where the behavior of the MLE is highly sensitive to the parameter direction.

artificial intelligence, inequality, machine learning, (19 more...)

arXiv.org Machine Learning

2411.02137

Country:

Europe > United Kingdom > England (0.14)
North America > United States (0.13)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.90)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.84)

Add feedback

Local Risk Bounds for Statistical Aggregation

Mourtada, Jaouad, Vaškevičius, Tomas, Zhivotovskiy, Nikita

arXiv.org Artificial IntelligenceJun-29-2023

In the problem of aggregation, the aim is to combine a given class of base predictors to achieve predictions nearly as accurate as the best one. In this flexible framework, no assumption is made on the structure of the class or the nature of the target. Aggregation has been studied in both sequential and statistical contexts. Despite some important differences between the two problems, the classical results in both cases feature the same global complexity measure. In this paper, we revisit and tighten classical results in the theory of aggregation in the statistical setting by replacing the global complexity with a smaller, local one. Some of our proofs build on the PAC-Bayes localization technique introduced by Catoni. Among other results, we prove localized versions of the classical bound for the exponential weights estimator due to Leung and Barron and deviation-optimal bounds for the Q-aggregation estimator. These bounds improve over the results of Dai, Rigollet and Zhang for fixed design regression and the results of Lecu\'e and Rigollet for random design regression.

artificial intelligence, estimator, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2306.17151

Country:

Europe (0.67)
North America > United States > California > Alameda County > Berkeley (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Universal coding, intrinsic volumes, and metric complexity

Mourtada, Jaouad

arXiv.org Machine LearningMar-13-2023

We study sequential probability assignment in the Gaussian setting, where the goal is to predict, or equivalently compress, a sequence of real-valued observations almost as well as the best Gaussian distribution with mean constrained to a given subset of $\mathbf{R}^n$. First, in the case of a convex constraint set $K$, we express the hardness of the prediction problem (the minimax regret) in terms of the intrinsic volumes of $K$; specifically, it equals the logarithm of the Wills functional from convex geometry. We then establish a comparison inequality for the Wills functional in the general nonconvex case, which underlines the metric nature of this quantity and generalizes the Slepian-Sudakov-Fernique comparison principle for the Gaussian width. Motivated by this inequality, we characterize the exact order of magnitude of the considered functional for a general nonconvex set, in terms of global covering numbers and local Gaussian widths. This implies metric isomorphic estimates for the log-Laplace transform of the intrinsic volume sequence of a convex body. As part of our analysis, we also characterize the minimax redundancy for a general constraint set. We finally relate and contrast our findings with classical asymptotic results in information theory.

artificial intelligence, inequality, machine learning, (19 more...)

arXiv.org Machine Learning

2303.07279

Country:

Europe > United Kingdom > England (0.14)
North America > United States > California (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.50)

Add feedback

Distribution-Free Robust Linear Regression

Mourtada, Jaouad, Vaškevičius, Tomas, Zhivotovskiy, Nikita

arXiv.org Machine LearningOct-21-2021

We study random design linear regression with no assumptions on the distribution of the covariates and with a heavy-tailed response variable. In this distribution-free regression setting, we show that boundedness of the conditional second moment of the response given the covariates is a necessary and sufficient condition for achieving nontrivial guarantees. As a starting point, we prove an optimal version of the classical in-expectation bound for the truncated least squares estimator due to Gy\"{o}rfi, Kohler, Krzy\.{z}ak, and Walk. However, we show that this procedure fails with constant probability for some distributions despite its optimal in-expectation performance. Then, combining the ideas of truncated least squares, median-of-means procedures, and aggregation theory, we construct a non-linear estimator achieving excess risk of order $d/n$ with an optimal sub-exponential tail. While existing approaches to linear regression for heavy-tailed distributions focus on proper estimators that return linear functions, we highlight that the improperness of our procedure is necessary for attaining nontrivial guarantees in the distribution-free setting.

artificial intelligence, estimator, machine learning, (16 more...)

arXiv.org Machine Learning

2102.12919

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

Regularized ERM on random subspaces

Della Vecchia, Andrea, Mourtada, Jaouad, De Vito, Ernesto, Rosasco, Lorenzo

arXiv.org Machine LearningOct-22-2020

We study a natural extension of classical empirical risk minimization, where the hypothesis space is a random subspace of a given space. In particular, we consider possibly data dependent subspaces spanned by a random subset of the data, recovering as a special case Nystr\"om approaches for kernel methods. Considering random subspaces naturally leads to computational savings, but the question is whether the corresponding learning accuracy is degraded. These statistical-computational tradeoffs have been recently explored for the least squares loss and self-concordant loss functions, such as the logistic loss. Here, we work to extend these results to convex Lipschitz loss functions, that might not be smooth, such as the hinge loss used in support vector machines. This extension requires developing new proofs, that use different technical tools. Our main results show the existence of different settings, depending on how hard the learning problem is, for which computational efficiency can be improved with no loss in performance. Theoretical results are illustrated with simple numerical experiments.

artificial intelligence, machine learning, probability, (16 more...)

arXiv.org Machine Learning

2006.10016

Country:

North America > United States (0.28)
Europe > United Kingdom > England (0.14)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.68)

Add feedback

Asymptotics of Ridge(less) Regression under General Source Condition

Richards, Dominic, Mourtada, Jaouad, Rosasco, Lorenzo

arXiv.org Machine LearningJun-11-2020

Understanding the generalisation properties of Artificial Deep Neural Networks (ANN) has recently motivated a number of statistical questions. These models perform well in practice despite perfectly fitting (interpolating) the data, a property that seems at odds with classical statistical theory [49]. This has motivated the investigation of the generalisation performance of methods that achieve zero training error (interpolators) [32, 9, 11, 10, 8] and, in the context of linear least squares, the unique least norm solution to which gradient descent converges [22, 5, 37, 8, 21, 38, 20, 39]. Overparameterized linear models, where the number of variables exceed the number of points, are arguably the simplest and most natural setting where interpolation can be studied. Moreover, in certain regimes ANN can be approximated by suitable linear models [24, 17, 18, 2, 13]. The learning curve (test error versus model capacity) for interpolators has been shown to exhibit a characteristic "Double Descent" [1, 7] shape, where the test error decreases after peaking at the "interpolating" threshold, that is, the model capacity required to interpolate the data. The regime beyond this threshold naturally captures the settings of ANN [49], and thus, has motivated its investigation [36, 44, 39].

deep learning, neural network, ridge regression, (15 more...)

arXiv.org Machine Learning

2006.06386

Country:

North America > United States > Massachusetts (0.14)
Europe > United Kingdom > England (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

AMF: Aggregated Mondrian Forests for Online Learning

Mourtada, Jaouad, Gaïffas, Stéphane, Scornet, Erwan

arXiv.org Machine LearningJun-25-2019

Introduced by Breiman (2001), Random Forests (RF) is one of the algorithms of choice in many supervised learning applications. The appeal of these methods comes from their remarkable accuracy in a variety of tasks, the small number (or even the absence) of parameters to tune, their reasonable computational cost at training and prediction time, and their suitability in highdimensional settings. Most commonly used RF algorithms, such as the original random forest procedure (Breiman, 2001), extra-trees (Geurts et al., 2006), or conditional inference forest (Hothorn et al., 2010) are batch algorithms, that require the whole dataset to be available at once. Several online random forests variants have been proposed to overcome this issue and handle data that come sequentially. Utgoff (1989) was the first to extend Quinlan's ID3 batch decision tree algorithm (see Quinlan, 1986) to an online setting. Later on, Domingos and Hulten (2000) introduce Hoeffding Trees that can be easily updated: since observations are available sequentially, a cell is split when (i) enough observations have fallen into this cell, (ii) the best split in the cell is statistically relevant (a generic Hoeffding inequality being used to assess the quality of the best split). Since random forests are known to exhibit better empirical performances than individual decision trees, online random forests have been proposed (see, e.g., Saffari et al., 2009; Denil et al., 2013).

algorithm, computer based training, decision tree learning, (24 more...)

arXiv.org Machine Learning

1906.10529

Country:

Europe (0.28)
North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Education > Educational Setting > Online (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Anytime Hedge achieves optimal regret in the stochastic regime

Mourtada, Jaouad, Gaïffas, Stéphane

arXiv.org Machine LearningSep-5-2018

This paper is about a surprising fact: we prove that the anytime Hedge algorithm with decreasing learning rate, which is one of the simplest algorithm for the problem of prediction with expert advice, is actually both worst-case optimal and adaptive to the easier stochastic and adversarial with a gap problems. This runs counter to the common belief in the literature that this algorithm is overly conservative, and that only new adaptive algorithms can simultaneously achieve minimax regret and adapt to the difficulty of the problem. Moreover, our analysis exhibits qualitative differences with other variants of the Hedge algorithm, based on the so-called "doubling trick", and the fixed-horizon version (with constant learning rate).

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

1809.01382

Country: Europe > France (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Minimax optimal rates for Mondrian trees and forests

Mourtada, Jaouad, Gaïffas, Stéphane, Scornet, Erwan

arXiv.org Machine LearningMar-15-2018

Introduced by Breiman (2001), Random Forests are widely used as classification and regression algorithms. While being initially designed as batch algorithms, several variants have been proposed to handle online learning. One particular instance of such forests is the Mondrian Forest, whose trees are built using the so-called Mondrian process, therefore allowing to easily update their construction in a streaming fashion. In this paper, we study Mondrian Forests in a batch setting and prove their consistency assuming a proper tuning of the lifetime sequence. A thorough theoretical study of Mondrian partitions allows us to derive an upper bound for the risk of Mondrian Forests, which turns out to be the minimax optimal rate for both Lipschitz and twice differentiable regression functions. These results are actually the first to state that some particular random forests achieve minimax rates \textit{in arbitrary dimension}, paving the way to a refined theoretical analysis and thus a deeper understanding of these black box algorithms.

artificial intelligence, decision tree learning, partition, (20 more...)

arXiv.org Machine Learning

1803.05784

Country: North America > United States > California (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.54)

Add feedback

Universal consistency and minimax rates for online Mondrian Forests

Mourtada, Jaouad, Gaïffas, Stéphane, Scornet, Erwan

Neural Information Processing SystemsDec-31-2017

We establish the consistency of an algorithm of Mondrian Forests [LRT14, LRT16], a randomized classification algorithm that can be implemented online. First, we amend the original Mondrian Forest algorithm proposed in [LRT14], that considers a fixed lifetime parameter. Indeed, the fact that this parameter is fixed hinders the statistical consistency of the original procedure.

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.28)

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.43)

Add feedback