SEAL : Interactive Tool for Systematic Error Analysis and Labeling

Rajani, Nazneen, Liang, Weixin, Chen, Lingjiao, Mitchell, Meg, Zou, James

Oct-11-2022–arXiv.org Artificial Intelligence

With the advent of Transformers, large language models (LLMs) have saturated well-known NLP benchmarks and leaderboards with high aggregate performance. However, many times these models systematically fail on tail data or rare groups not obvious in aggregate evaluation. Identifying such problematic data groups is even more challenging when there are no explicit labels (e.g., ethnicity, gender, etc.) and further compounded for NLP datasets due to the lack of visual features to characterize failure modes (e.g., Asian males, animals indoors, waterbirds on land, etc.). This paper introduces an interactive Systematic Error Analysis and Labeling (\seal) tool that uses a two-step approach to first identify high error slices of data and then, in the second step, introduce methods to give human-understandable semantics to those underperforming slices. We explore a variety of methods for coming up with coherent semantics for the error groups using language models for semantic labeling and a text-to-image model for generating visual features. SEAL toolkit and demo screencast is available at https://huggingface.co/spaces/nazneen/seal.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Oct-11-2022

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Virginia (0.04)
  - Oregon (0.04)
  - Maine (0.04)
  - New York > New York County
    - New York City (0.04)
  - California
    - Santa Clara County > Palo Alto (0.04)
    - San Diego County > San Diego (0.04)
  - Arizona > Maricopa County
    - Phoenix (0.04)
- Europe
  - France (0.04)
  - Netherlands > North Holland
    - Amsterdam (0.04)
  - Italy > Tuscany
    - Florence (0.04)
  - Denmark > Capital Region
    - Copenhagen (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Workflow (0.46)
- Research Report (0.40)

Industry:
- Media > Film (1.00)
- Leisure & Entertainment (1.00)
- Law (0.93)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.93)
- Health & Medicine > Therapeutic Area (0.68)
- Information Technology (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found