Our Evaluation Metric Needs an Update to Encourage Generalization

Mishra, Swaroop, Arunkumar, Anjana, Bryan, Chris, Baral, Chitta

Jul-14-2020–arXiv.org Artificial Intelligence

Models that surpass human performance on several popular benchmarks display significant degradation Several approaches have been proposed to address this issue in performance on exposure to Out of Distribution at various levels: (i) Data - filtering of biases (Bras et al., (OOD) data. Recent research has shown 2020; Li & Vasconcelos, 2019; Li et al., 2018; Wang et al., that models overfit to spurious biases and'hack' 2018), quantifying data quality, controlling data quality, using datasets, in lieu of learning generalizable features active learning, and avoiding the creation of low quality like humans. In order to stop the inflation in data (Mishra et al., 2020; Nie et al., 2019; Gardner et al., model performance - and thus overestimation in 2020; Kaushik et al., 2019), and (ii) Model - utilizing prior AI systems' capabilities - we propose a simple knowledge of biases to train a naive model exploiting biases, and novel evaluation metric, WOOD Score, that and then subsequently training an ensemble of the naive encourages generalization during evaluation.

deep learning, neural network, survey article, (17 more...)

arXiv.org Artificial Intelligence

Jul-14-2020

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.28)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (0.95)
  - Natural Language (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found