AITopics | erm model

Recent work demonstrates that deep neural networks trained using Empirical Risk Minimization (ERM) can generalize under distribution shift, outperforming specialized training algorithms for domain generalization. The goal of this paper is to further understand this phenomenon. In particular, we study the extent to which the seminal domain adaptation theory of Ben-David et al. (2007) explains the performance of ERMs. Perhaps surprisingly, we find that this theory does not provide a tight explanation of the out-of-domain generalization observed across a large number of ERM models trained on three popular domain generalization datasets. This motivates us to investigate other possible measures--that, however, lack theory--which could explain generalization in this setting. Our investigation reveals that measures relating to the Fisher information, predictive entropy, and maximum mean discrepancy are good predictors of the out-of-distribution generalization of ERM models. We hope that our work helps galvanize the community towards building a better understanding of when deep networks trained with ERM generalize out-of-distribution.

artificial intelligence, machine learning, proceedings, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.60)

Add feedback

No Subclass Left Behind: Fine-Grained Robustness in Coarse-Grained Classification Problems

Neural Information Processing SystemsAug-16-2025, 22:19:21 GMT

We then use these approximate subclass labels as a form of noisy supervision in a distributionally robust optimization objective.

data mining, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.67)

Industry:

Information Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.67)
Health & Medicine > Therapeutic Area > Dermatology (0.46)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)
(4 more...)

Add feedback

We are glad that all reviewers appreciated the soundness of our work, the importance of the hidden stratification (HS)

Neural Information Processing SystemsAug-16-2025, 22:01:55 GMT

ERM model to obtain a feature representation and then trains a second, robust model. With tuning of learning rate schedules and other hyperparameters (HPs), GEORGE's cost could be further reduced. D.4, we define "inherent hardness" as the minimum possible worst-case subclass We hope that building on this method may also be of independent interest. Our results are fairly insensitive (no significant performance drop) to reasonable variation in these HPs. Additional classification metrics (ISIC omitted for space).

artificial intelligence, machine learning, subclass, (15 more...)

Neural Information Processing Systems

Genre: Research Report (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

An Empirical Investigation of Domain Generalization with Empirical Risk Minimizers

Neural Information Processing SystemsMay-27-2025, 06:19:16 GMT

Recent work demonstrates that deep neural networks trained using Empirical Risk Minimization (ERM) can generalize under distribution shift, outperforming specialized training algorithms for domain generalization. The goal of this paper is to further understand this phenomenon. In particular, we study the extent to which the seminal domain adaptation theory of Ben-David et al. (2007) explains the performance of ERMs. Perhaps surprisingly, we find that this theory does not provide a tight explanation of the out-of-domain generalization observed across a large number of ERM models trained on three popular domain generalization datasets. This motivates us to investigate other possible measures--that, however, lack theory--which could explain generalization in this setting.

artificial intelligence, generalization, machine learning, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.64)

Add feedback

Invariant Learning with Annotation-free Environments

Le, Phuong Quynh, Seifert, Christin, Schlötterer, Jörg

arXiv.org Artificial IntelligenceApr-23-2025

Invariant learning is a promising approach to improve domain generalization compared to Empirical Risk Minimization (ERM). However, most invariant learning methods rely on the assumption that training examples are pre-partitioned into different known environments. We instead infer environments without the need for additional annotations, motivated by observations of the properties within the representation space of a trained ERM model. We show the preliminary effectiveness of our approach on the ColoredMNIST benchmark, achieving performance comparable to methods requiring explicit environment labels and on par with an annotation-free method that poses strong restrictions on the ERM reference model.

artificial intelligence, inductive learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2504.15686

Genre: Research Report (0.70)

Industry: Health & Medicine > Therapeutic Area (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.68)

Add feedback

An Empirical Investigation of Domain Generalization with Empirical Risk Minimizers

Neural Information Processing SystemsJan-19-2025, 12:11:18 GMT

Recent work demonstrates that deep neural networks trained using Empirical Risk Minimization (ERM) can generalize under distribution shift, outperforming specialized training algorithms for domain generalization. The goal of this paper is to further understand this phenomenon. In particular, we study the extent to which the seminal domain adaptation theory of Ben-David et al. (2007) explains the performance of ERMs. Perhaps surprisingly, we find that this theory does not provide a tight explanation of the out-of-domain generalization observed across a large number of ERM models trained on three popular domain generalization datasets. This motivates us to investigate other possible measures--that, however, lack theory--which could explain generalization in this setting.

empirical investigation, empirical risk minimizer, generalization, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.64)

Add feedback

Correct-N-Contrast: A Contrastive Approach for Improving Robustness to Spurious Correlations

Zhang, Michael, Sohoni, Nimit S., Zhang, Hongyang R., Finn, Chelsea, Ré, Christopher

arXiv.org Artificial IntelligenceDec-11-2024

Spurious correlations pose a major challenge for robust machine learning. Models trained with empirical risk minimization (ERM) may learn to rely on correlations between class labels and spurious attributes, leading to poor performance on data groups without these correlations. This is particularly challenging to address when spurious attribute labels are unavailable. To improve worst-group performance on spuriously correlated data without training attribute labels, we propose Correct-N-Contrast (CNC), a contrastive approach to directly learn representations robust to spurious correlations. As ERM models can be good spurious attribute predictors, CNC works by (1) using a trained ERM model's outputs to identify samples with the same class but dissimilar spurious features, and (2) training a robust model with contrastive learning to learn similar representations for same-class samples. To support CNC, we introduce new connections between worst-group error and a representation alignment loss that CNC aims to minimize. We empirically observe that worst-group error closely tracks with alignment loss, and prove that the alignment loss over a class helps upper-bound the class's worst-group vs. average error gap. On popular benchmarks, CNC reduces alignment loss drastically, and achieves state-of-the-art worst-group accuracy by 3.6% average absolute lift. CNC is also competitive with oracle methods that require group labels.

erm model, representation, spurious correlation, (14 more...)

arXiv.org Artificial Intelligence

2203.01517

Country: North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report (1.00)

Industry:

Information Technology (0.67)
Semiconductors & Electronics (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)

Add feedback

Out of spuriousity: Improving robustness to spurious correlations without group annotations

Le, Phuong Quynh, Schlötterer, Jörg, Seifert, Christin

arXiv.org Artificial IntelligenceJul-20-2024

Machine learning models are known to learn spurious correlations, i.e., features having strong relations with class labels but no causal relation. Relying on those correlations leads to poor performance in the data groups without these correlations and poor generalization ability. To improve the robustness of machine learning models to spurious correlations, we propose an approach to extract a subnetwork from a fully trained network that does not rely on spurious correlations. The subnetwork is found by the assumption that data points with the same spurious attribute will be close to each other in the representation space when training with ERM, then we employ supervised contrastive loss in a novel way to force models to unlearn the spurious connections. The increase in the worst-group performance of our approach contributes to strengthening the hypothesis that there exists a subnetwork in a fully trained dense network that is responsible for using only invariant features in classification tasks, therefore erasing the influence of spurious features even in the setup of multi spurious attributes and no prior knowledge of attributes labels.

accuracy, spurious correlation, spurious feature, (14 more...)

arXiv.org Artificial Intelligence

2407.14974

Country: North America > United States > California (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Add feedback

Filters

Collaborating Authors

erm model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

e0688d13958a19e087e123148555e4b4-Supplemental.pdf

e0688d13958a19e087e123148555e4b4-AuthorFeedback.pdf

An Empirical Investigation of Domain Generalization with Empirical Risk Minimizers

No Subclass Left Behind: Fine-Grained Robustness in Coarse-Grained Classification Problems

We are glad that all reviewers appreciated the soundness of our work, the importance of the hidden stratification (HS)

An Empirical Investigation of Domain Generalization with Empirical Risk Minimizers

Invariant Learning with Annotation-free Environments

An Empirical Investigation of Domain Generalization with Empirical Risk Minimizers

Correct-N-Contrast: A Contrastive Approach for Improving Robustness to Spurious Correlations

Out of spuriousity: Improving robustness to spurious correlations without group annotations