Heteroskedastic and Imbalanced Deep Learning with Adaptive Regularization

Cao, Kaidi, Chen, Yining, Lu, Junwei, Arechiga, Nikos, Gaidon, Adrien, Ma, Tengyu

Jun-28-2020–arXiv.org Machine Learning

In real-world machine learning applications, even well-curated training datasets have various types of heterogeneity. Two main types of heterogeneity are: (1) data imbalance: the input or label distribution often has a long-tailed density, and (2) heteroskedasticity: the labels given inputs have varying levels of uncertainties across subsets of data stemming from various sources such as the intrinsic ambiguity of the data or annotation errors. Many deep learning algorithms have been proposed for imbalanced datasets (e.g., see [Wang et al., 2017, Cao et al., 2019, Cui et al., 2019, Liu et al., 2019] and the reference therein). However, heteroskedasticity, a classical notion studied extensively in the statistical community [Pintore et al., 2006, Wang et al., 2013, Tibshirani et al., 2014], has so far been under-explored in deep learning. This paper focuses on addressing heteroskedasticity and its interaction with data imbalance in deep learning. Heteroskedasticity is often studied in regression analysis and refers to the property that the distribution of the error varies across inputs. In this work, we mostly focus on classification, though the developed technique also applies to regression. Here, heteroskedasticity reflects how the uncertainty in the conditional distribution p(y x), or the entropy of y x, varies as a function of x .

deep learning, neural network, regularization, (16 more...)

arXiv.org Machine Learning

Jun-28-2020

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (0.34)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found