Oddballness: universal anomaly detection with language models

Graliński, Filip, Staruch, Ryszard, Jurkiewicz, Krzysztof

Sep-4-2024–arXiv.org Artificial Intelligence

We present a new method to detect anomalies in texts (in general: in sequences of any data), using language models, in a totally unsupervised manner. The method considers probabilities (likelihoods) generated by a language model, but instead of focusing on low-likelihood tokens, it considers a new metric introduced in this paper: oddballness. Oddballness measures how ``strange'' a given token is according to the language model. We demonstrate in grammatical error detection tasks (a specific case of text anomaly detection) that oddballness is better than just considering low-likelihood events, if a totally unsupervised setup is assumed.

detection, oddballness 0, probability, (13 more...)

arXiv.org Artificial Intelligence

Sep-4-2024

arXiv.org PDF

Add feedback

Country:
- North America
  - Dominican Republic (0.04)
  - United States > Oregon
    - Multnomah County > Portland (0.04)
- Europe
  - Germany > Berlin (0.04)
  - Faroe Islands > Streymoy
    - Tórshavn (0.04)

Genre:
- Research Report (0.65)

Technology:
- Information Technology
  - Data Science > Data Mining
    - Anomaly Detection (1.00)
  - Artificial Intelligence
    - Natural Language (1.00)
    - Machine Learning > Neural Networks
      - Deep Learning (0.31)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found