Multi-View Knowledge Distillation from Crowd Annotations for Out-of-Domain Generalization
Wright, Dustin, Augenstein, Isabelle
–arXiv.org Artificial Intelligence
Selecting an effective training signal for tasks in natural language processing is difficult: expert annotations are expensive, and crowd-sourced annotations may not be reliable. At the same time, recent work in NLP has demonstrated that learning from a distribution over labels acquired from crowd annotations can be effective. However, there are many ways to acquire such a distribution, and the performance allotted by any one method can fluctuate based on the task and the amount of available crowd annotations, making it difficult to know a priori which distribution is best. This paper systematically analyzes this in the out-of-domain setting, adding to the NLP literature which has focused on in-domain evaluation, and proposes new methods for acquiring soft-labels from crowd-annotations by aggregating the distributions produced by existing methods. In particular, we propose to aggregate multiple-views of crowd annotations via temperature scaling and finding their Jensen-Shannon centroid. We demonstrate that these aggregation methods lead to the most consistent performance across four NLP tasks on out-of-domain test sets, mitigating fluctuations in performance from the individual distributions. Additionally, aggregation results in the most consistently well-calibrated uncertainty estimation. We argue that aggregating different views of crowd-annotations is an effective and minimal intervention to acquire soft-labels which induce robust classifiers despite the inconsistency of the individual soft-labeling methods.
arXiv.org Artificial Intelligence
May-23-2023
- Country:
- Africa > Ethiopia (0.04)
- Oceania > Australia
- New South Wales > Sydney (0.04)
- North America
- United States
- Maryland > Baltimore (0.04)
- Oregon > Multnomah County
- Portland (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- Georgia > Fulton County
- Atlanta (0.04)
- Canada
- Quebec > Montreal (0.04)
- British Columbia (0.04)
- United States
- Europe
- Netherlands (0.04)
- United Kingdom > England
- Hampshire > Southampton (0.04)
- Portugal > Lisbon
- Lisbon (0.04)
- Denmark > Capital Region
- Copenhagen (0.04)
- Asia
- South Korea > Seoul
- Seoul (0.04)
- Japan > Honshū
- Kantō > Kanagawa Prefecture > Yokohama (0.04)
- South Korea > Seoul
- Genre:
- Research Report > New Finding (0.46)