Unsupervised Layer-wise Score Aggregation for Textual OOD Detection

Darrin, Maxime, Staerman, Guillaume, Gomes, Eduardo Dadalto Câmara, Cheung, Jackie CK, Piantanida, Pablo, Colombo, Pierre

May-29-2023–arXiv.org Artificial Intelligence

Out-of-distribution (OOD) detection for text applications is a rapidly growing field due to new robustness and security requirements driven by an increased number of AI-based systems. Existing OOD textual detectors often rely on an anomaly score (e.g., Mahalanobis distance) computed on the embedding output of the last layer of the encoder. In this work, we begin by uncovering that the fact that performance of existent methods varies greatly depending on the task and choice of the layer output. More importantly, we show that the usual choice (the last layer) is rarely the best one and thus, far better results could be achieved if the best layer were chosen. To leverage our key observation, we propose a data-driven, unsupervised method to combine layer-wise anomaly scores. In addition, we extend classical textual OOD benchmarks by including classification tasks with a greater number of classes (up to 77), which reflects more realistic settings. On this augmented benchmark, we show that the proposed post-aggregation methods achieve robust and consistent results while removing manual feature selection altogether. Their performance achieves near oracle's best layer performance.

data mining, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

May-29-2023

arXiv.org PDF

Add feedback

Country:
- North America
  - Dominican Republic (0.04)
  - United States
    - Washington > King County
      - Seattle (0.04)
    - Pennsylvania > Allegheny County
      - Pittsburgh (0.04)
    - Oregon > Multnomah County
      - Portland (0.04)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
    - California > San Francisco County
      - San Francisco (0.14)
  - Canada > Quebec
    - Montreal (0.04)
- Europe
  - Spain > Catalonia (0.04)
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
  - France > Provence-Alpes-Côte d'Azur
    - Bouches-du-Rhône > Marseille (0.04)
- Asia
  - Indonesia > Bali (0.04)
  - China > Sichuan Province
    - Chengdu (0.04)

Genre:
- Research Report (0.82)

Industry:
- Information Technology (0.46)

Technology:
- Information Technology
  - Data Science > Data Mining (1.00)
  - Artificial Intelligence
    - Representation & Reasoning (1.00)
    - Natural Language (1.00)
    - Machine Learning
      - Statistical Learning (1.00)
      - Performance Analysis > Accuracy (1.00)
      - Neural Networks (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found