AITopics | conditional risk

The loss function used to train a neural network is strongly connected to its output layer from a statistical point of view. This technical report analyzes common activation functions for a neural network output layer, like linear, sigmoid, ReLU, and softmax, detailing their mathematical properties and their appropriate use cases. A strong statistical justification exists for the selection of the suitable loss function for training a deep learning model. This report connects common loss functions such as Mean Squared Error (MSE), Mean Absolute Error (MAE), and various Cross-Entropy losses to the statistical principle of Maximum Likelihood Estimation (MLE). Choosing a specific loss function is equivalent to assuming a specific probability distribution for the model output, highlighting the link between these functions and the Generalized Linear Models (GLMs) that underlie network output layers. Additional scenarios of practical interest are also considered, such as alternative output encodings, constrained outputs, and distributions with heavy tails.

artificial intelligence, bayesian inference, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2511.05131

Country: North America > United States (0.93)

Genre: Research Report (0.64)

Add feedback

Evaluating model performance under worst-case subpopulations Mike Li

Neural Information Processing SystemsAug-22-2025, 00:42:25 GMT

A particularly problematic form of distribution shift comes from embedded power structures in data collection.

artificial intelligence, machine learning, worst-case subpopulation performance, (16 more...)

Neural Information Processing Systems

Country: Africa (0.05)

Genre: Research Report (0.68)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.68)
Health & Medicine > Therapeutic Area (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Supplementary Material for " Training Over-parameterized Models with Non-decomposable Objectives " Algorithm 2 Reductions-based Algorithm for Constraining Coverage (2)

Neural Information Processing SystemsAug-16-2025, 05:14:06 GMT

These algorithms additionally incorporate the "two dataset" trick suggested by Cotter et al. We will find the following standard result to be useful in our proofs. We reproduce the proof from Narasimhan et al. We provide a proof for Proposition 4 . The proof follows by setting D = G and applying Proposition 4 .

artificial intelligence, exp, machine learning, (16 more...)

Neural Information Processing Systems

Genre: Instructional Material (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

Towards Calibrated Losses for Adversarial Robust Reject Option Classification

Shah, Vrund, Chaudhari, Tejas, Manwani, Naresh

arXiv.org Machine LearningOct-14-2024

Robustness towards adversarial attacks is a vital property for classifiers in several applications such as autonomous driving, medical diagnosis, etc. Also, in such scenarios, where the cost of misclassification is very high, knowing when to abstain from prediction becomes crucial. A natural question is which surrogates can be used to ensure learning in scenarios where the input points are adversarially perturbed and the classifier can abstain from prediction? This paper aims to characterize and design surrogates calibrated in "Adversarial Robust Reject Option" setting. First, we propose an adversarial robust reject option loss $\ell_{d}^{\gamma}$ and analyze it for the hypothesis set of linear classifiers ($\mathcal{H}_{\textrm{lin}}$). Next, we provide a complete characterization result for any surrogate to be $(\ell_{d}^{\gamma},\mathcal{H}_{\textrm{lin}})$- calibrated. To demonstrate the difficulty in designing surrogates to $\ell_{d}^{\gamma}$, we show negative calibration results for convex surrogates and quasi-concave conditional risk cases (these gave positive calibration in adversarial setting without reject option). We also empirically argue that Shifted Double Ramp Loss (DRL) and Shifted Double Sigmoid Loss (DSL) satisfy the calibration conditions. Finally, we demonstrate the robustness of shifted DRL and shifted DSL against adversarial perturbations on a synthetically generated dataset.

classifier, drl, reject option classifier, (16 more...)

arXiv.org Machine Learning

2410.10736

Country:

Asia > Middle East > Jordan (0.04)
Asia > India (0.04)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Bagging in overparameterized learning: Risk characterization and risk monotonization

Patil, Pratik, Du, Jin-Hong, Kuchibhotla, Arun Kumar

arXiv.org Machine LearningOct-24-2023

Bagging is a commonly used ensemble technique in statistics and machine learning to improve the performance of prediction procedures. In this paper, we study the prediction risk of variants of bagged predictors under the proportional asymptotics regime, in which the ratio of the number of features to the number of observations converges to a constant. Specifically, we propose a general strategy to analyze the prediction risk under squared error loss of bagged predictors using classical results on simple random sampling. Specializing the strategy, we derive the exact asymptotic risk of the bagged ridge and ridgeless predictors with an arbitrary number of bags under a well-specified linear model with arbitrary feature covariance matrices and signal vectors. Furthermore, we prescribe a generic cross-validation procedure to select the optimal subsample size for bagging and discuss its utility to eliminate the non-monotonic behavior of the limiting risk in the sample size (i.e., double or multiple descents). In demonstrating the proposed procedure for bagged ridge and ridgeless predictors, we thoroughly investigate the oracle properties of the optimal subsample size and provide an in-depth comparison between different bagging variants.

artificial intelligence, machine learning, predictor, (18 more...)

arXiv.org Machine Learning

2210.11445

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > California > Alameda County > Berkeley (0.13)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York (0.04)

Genre: Research Report > New Finding (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

On the Design of Loss Functions for Classification: theory, robustness to outliers, and SavageBoost

Neural Information Processing SystemsApr-6-2023, 14:31:56 GMT

The machine learning problem of classifier design is studied from the perspective of probability elicitation, in statistics. This shows that the standard approach of proceeding from the specification of a loss, to the minimization of conditional risk is overly restrictive. It is shown that a better alternative is to start from the specification of a functional form for the minimum conditional risk, and derive the loss function. This has various consequences of practical interest, such as showing that 1) the widely adopted practice of relying on convex loss functions is unnecessary, and 2) many new losses can be derived for classification problems. These points are illustrated by the derivation of a new loss which is not convex, but does not compromise the computational tractability of classifier design, and is robust to the contamination of data with outliers. A new boosting algorithm, SavageBoost, is derived for the minimization of this loss.

classification, outlier, savageboost, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Simple and near-optimal algorithms for hidden stratification and multi-group learning

Tosh, Christopher, Hsu, Daniel

arXiv.org Machine LearningDec-22-2021

Much of the success of modern machine learning has been measured by improvements in accuracy for various classification tasks. Across domains as diverse as image classification and text translation, machine learning models are achieving incredible levels of accuracy; in some cases, they have outperformed humans in visual recognition tasks (Ewerth et al., 2017). However, accuracy is an aggregate statistic that often obscures the underlying structure of mistaken predictions. Oakden-Rayner et al. (2020) recently raised this concern in the context of medical image analysis. Consider the problem of diagnosing a image as being indicative of lung cancer or not.

algorithm, predictor, probability, (16 more...)

arXiv.org Machine Learning

2112.12181

Country: