AITopics | Cauchois, Maxime

Collaborating Authors

Cauchois, Maxime

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Query-Adaptive Predictive Inference with Partial Labels

Cauchois, Maxime, Duchi, John

arXiv.org Machine LearningJun-14-2022

The cost and scarcity of fully supervised labels in statistical machine learning encourage using partially labeled data for model validation as a cheaper and more accessible alternative. Effectively collecting and leveraging weakly supervised data for large-space structured prediction tasks thus becomes an important part of an end-to-end learning system. We propose a new computationally-friendly methodology to construct predictive sets using only partially labeled data on top of black-box predictive models. To do so, we introduce "probe" functions as a way to describe weakly supervised instances and define a false discovery proportion-type loss, both of which seamlessly adapt to partial supervision and structured prediction -- ranking, matching, segmentation, multilabel or multiclass classification. Our experiments highlight the validity of our predictive set construction as well as the attractiveness of a more flexible user-dependent loss framework.

artificial intelligence, machine learning, query-adaptive predictive inference, (1 more...)

arXiv.org Machine Learning

2206.07236

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Predictive Inference with Weak Supervision

Cauchois, Maxime, Gupta, Suyash, Ali, Alnur, Duchi, John

arXiv.org Machine LearningFeb-9-2022

Consider the typical supervised learning pipeline that we teach students learning statistical machine learning: we collect data in (X, Y) pairs, where Y is a label or target to be predicted; we pick a model and loss measuring the fidelity of the model to observed data; we choose the model minimizing the loss and validate it on held-out data. This picture obscures what is becoming one of the major challenges in this endeavor: that of actually collecting highquality labeled data [44, 13, 38]. Hand labeling large-scale training sets is often impractically expensive. Consider, as simple motivation, a ranking problem: a prediction is an ordered list of a set of items, yet available feedback is likely to be incomplete and partial, such as a top element (for example, in web search a user clicks on a single preferred link, or in a grocery, an individual buys one kind of milk but provides no feedback on the other brands present). Developing methods to leverage such partial and weak feedback is therefore becoming a major focus, and researchers have developed methods to transform weak and noisy labels into a dataset with strong, "gold-standard" labels [38, 56]. In this paper, we adopt this weakly labeled setting, but instead of considering model fitting and the construction of strong labels, we focus on validation, model confidence, and predictive inference, moving beyond point predictions and single labels. Our goal is to develop methods to rigorously quantify the confidence a practitioner should have in a model given only weak labels.

artificial intelligence, inductive learning, machine learning, (18 more...)

arXiv.org Machine Learning

2201.08315

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry:

Education (1.00)
Government > Voting & Elections (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.67)

Add feedback

The Lifecycle of a Statistical Model: Model Failure Detection, Identification, and Refitting

Ali, Alnur, Cauchois, Maxime, Duchi, John C.

arXiv.org Machine LearningFeb-8-2022

The statistical machine learning community has demonstrated considerable resourcefulness over the years in developing highly expressive tools for estimation, prediction, and inference. The bedrock assumptions underlying these developments are that the data comes from a fixed population and displays little heterogeneity. But reality is significantly more complex: statistical models now routinely fail when released into real-world systems and scientific applications, where such assumptions rarely hold. Consequently, we pursue a different path in this paper vis-a-vis the well-worn trail of developing new methodology for estimation and prediction. In this paper, we develop tools and theory for detecting and identifying regions of the covariate space (subpopulations) where model performance has begun to degrade, and study intervening to fix these failures through refitting. We present empirical results with three real-world data sets -- including a time series involving forecasting the incidence of COVID-19 -- showing that our methodology generates interpretable results, is useful for tracking model performance, and can boost model performance through refitting. We complement these empirical results with theory proving that our methodology is minimax optimal for recovering anomalous subpopulations as well as refitting to improve accuracy in a structured normal means setting.

artificial intelligence, data mining, machine learning, (23 more...)

arXiv.org Machine Learning

2202.04166

Country: North America > United States > California (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Epidemiology (0.48)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.35)
Health & Medicine > Therapeutic Area > Immunology (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.45)

Add feedback

Robust Validation: Confident Predictions Even When Distributions Shift

Cauchois, Maxime, Gupta, Suyash, Ali, Alnur, Duchi, John C.

arXiv.org Machine LearningAug-10-2020

While the traditional viewpoint in machine learning and statistics assumes training and testing samples come from the same population, practice belies this fiction. One strategy---coming from robust statistics and optimization---is thus to build a model robust to distributional perturbations. In this paper, we take a different approach to describe procedures for robust predictive inference, where a model provides uncertainty estimates on its predictions rather than point predictions. We present a method that produces prediction sets (almost exactly) giving the right coverage level for any test distribution in an $f$-divergence ball around the training population. The method, based on conformal inference, achieves (nearly) valid coverage in finite samples, under only the condition that the training data be exchangeable. An essential component of our methodology is to estimate the amount of expected future data shift and build robustness to it; we develop estimators and prove their consistency for protection and validity of uncertainty estimates under shifts. By experimenting on several large-scale benchmark datasets, including Recht et al.'s CIFAR-v4 and ImageNet-V2 datasets, we provide complementary empirical results that highlight the importance of robust predictive validity.

artificial intelligence, machine learning, prediction, (17 more...)

arXiv.org Machine Learning

2008.04267

Country:

North America > United States > California (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > New Finding (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback