AITopics | local outlier factor

Collaborating Authors

local outlier factor

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Short Ticketing Detection Framework Analysis Report

Miao, Yuyang, Xing, Huijun, Mandic, Danilo P., Constantinides, Tony G.

arXiv.org Artificial IntelligenceOct-29-2025

Each year, fare evasion costs the UK railway system approximately 240 million pounds [1] with short ticketing, where passengers buy tickets for shorter, cheaper journeys but travel beyond the permitted destination, representing a specific and often undetected aspect of the broader issue. A simple but practical example would be: a passenger travelling from Seaside Station to International Terminus Station via Commuter Hub Station and Financial District Station might purchase two separate tickets (Seaside Station to Commuter Hub Station, and Financial District Station to International Terminus Station) instead of the complete journey ticket, potentially saving money while committing ticket fraud leading to revenue loss for the Train Operating Companies (TOCs). To solve this problem, this comprehensive report provides an in-depth analysis of the short ticketing detection framework developed by researchers Yuyang Miao and Huijun Xing at Imperial College London. This study represents an unsupervised machine learning approach. This work is based on a dataset collected from the UK railway system, including 100 stations' entry and exit data for seven days, with approximately 6.5 million trials of records.

data mining, detection, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2510.23619

Genre: Research Report (0.50)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Rail (1.00)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

THEMIS: Unlocking Pretrained Knowledge with Foundation Model Embeddings for Anomaly Detection in Time Series

Lorik, Yadav Mahesh, Sarveswaran, Kaushik, Sundaramahalingam, Nagaraj, Venugopalan, Aravindakumar

arXiv.org Artificial IntelligenceOct-7-2025

Time series anomaly detection forms a very crucial area in several domains but poses substantial challenges. Due to time series data possessing seasonality, trends, noise, and evolving patterns (concept drift), it becomes very difficult to set a general notion of what constitutes normal behavior. Anomalies themselves could be varied, ranging from a single outlier to contextual or collective anomalies, and are normally very rare; hence, the dataset is largely imbalanced. Additional layers of complexities arise due to the problems of increased dimensionality of modern time series, real-time detection criteria, setting up appropriate detection thresholds, and arriving at results that are interpretable. To embrace these multifaceted challenges, very strong, flexible, and interpretable approaches are required. This paper presents THEMIS, a new framework for time series anomaly detection that exploits pretrained knowledge from foundation models. THEMIS extracts embeddings from the encoder of the Chronos time series foundation model and applies outlier detection techniques like Local Outlier Factor and Spectral Decomposition on the self-similarity matrix, to spot anomalies in the data. Our experiments show that this modular method achieves SOTA results on the MSL dataset and performs quite competitively on the SMAP and SWAT$^*$ datasets. Notably, THEMIS exceeds models trained specifically for anomaly detection, presenting hyperparameter robustness and interpretability by default. This paper advocates for pretrained representations from foundation models for performing efficient and adaptable anomaly detection for time series data.

artificial intelligence, data mining, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2510.03911

Country: North America > United States (0.29)

Genre: Research Report (1.00)

Industry: Water & Waste Management > Water Management (0.46)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Methodological Framework for Quantifying Semantic Test Coverage in RAG Systems

Broestl, Noah, Abdalla, Adel Nasser, Bale, Rajprakash, Gupta, Hersh, Struever, Max

arXiv.org Artificial IntelligenceOct-2-2025

Reliably determining the performance of Retrieval-Augmented Generation (RAG) systems depends on comprehensive test questions. While a proliferation of evaluation frameworks for LLM-powered applications exists, current practices lack a systematic method to ensure these test sets adequately cover the underlying knowledge base, leaving developers with significant blind spots. To address this, we present a novel, applied methodology to quantify the semantic coverage of RAG test questions against their underlying documents. Our approach leverages existing technologies, including vector embeddings and clustering algorithms, to create a practical framework for validating test comprehensiveness. Our methodology embeds document chunks and test questions into a unified vector space, enabling the calculation of multiple coverage metrics: basic proximity, content-weighted coverage, and multi-topic question coverage. Furthermore, we incorporate outlier detection to filter irrelevant questions, allowing for the refinement of test sets. Experimental evidence from two distinct use cases demonstrates that our framework effectively quantifies test coverage, identifies specific content areas with inadequate representation, and provides concrete recommendations for generating new, high-value test questions. This work provides RAG developers with essential tools to build more robust test suites, thereby improving system reliability and extending to applications such as identifying misaligned documents.

data mining, large language model, machine learning, (22 more...)

arXiv.org Artificial Intelligence

2510.00001

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Unsupervised Outlier Detection in Audit Analytics: A Case Study Using USA Spending Data

Li, Buhe, Kaplan, Berkay, Lazirko, Maksym, Kogan, Aleksandr

arXiv.org Artificial IntelligenceSep-25-2025

This study investigates the effectiveness of unsupervised outlier detection methods in audit analytics, utilizing USA spending data from the U.S. Department of Health and Human Services (DHHS) as a case example. We employ and compare multiple outlier detection algorithms, including Histogram-based Outlier Score (HBOS), Robust Principal Component Analysis (PCA), Minimum Covariance Determinant (MCD), and K-Nearest Neighbors (KNN) to identify anomalies in federal spending patterns. The research addresses the growing need for efficient and accurate anomaly detection in large-scale governmental datasets, where traditional auditing methods may fall short. Our methodology involves data preparation, algorithm implementation, and performance evaluation using precision, recall, and F1 scores. Results indicate that a hybrid approach, combining multiple detection strategies, enhances the robustness and accuracy of outlier identification in complex financial data. This study contributes to the field of audit analytics by providing insights into the comparative effectiveness of various outlier detection models and demonstrating the potential of unsupervised learning techniques in improving audit quality and efficiency. The findings have implications for auditors, policymakers, and researchers seeking to leverage advanced analytics in governmental financial oversight and risk management.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2509.19366

Country: North America > United States > New Jersey > Essex County > Newark (0.15)

Genre: Research Report > New Finding (0.93)

Industry: Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

Add feedback

Real-Time Anomaly Detection with Synthetic Anomaly Monitoring (SAM)

Luzio, Emanuele, Ponti, Moacir Antonelli

arXiv.org Artificial IntelligenceFeb-3-2025

Anomaly detection is essential for identifying rare and significant events across diverse domains such as finance, cybersecurity, and network monitoring. This paper presents Synthetic Anomaly Monitoring (SAM), an innovative approach that applies synthetic control methods from causal inference to improve both the accuracy and interpretability of anomaly detection processes. By modeling normal behavior through the treatment of each feature as a control unit, SAM identifies anomalies as deviations within this causal framework. We conducted extensive experiments comparing SAM with established benchmark models, including Isolation Forest, Local Outlier Factor (LOF), k-Nearest Neighbors (kNN), and One-Class Support Vector Machine (SVM), across five diverse datasets, including Credit Card Fraud, HTTP Dataset CSIC 2010, and KDD Cup 1999, among others. Our results demonstrate that SAM consistently delivers robust performance, highlighting its potential as a powerful tool for real-time anomaly detection in dynamic and complex environments.

artificial intelligence, data mining, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2501.18417

Country:

South America > Brazil > São Paulo (0.04)
South America > Uruguay > Montevideo > Montevideo (0.04)
North America > United States > Colorado (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.68)

Industry: Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.54)

Add feedback

Novelty-focused R&D landscaping using transformer and local outlier factor

Choi, Jaewoong

arXiv.org Artificial IntelligenceNov-4-2024

While numerous studies have explored the field of research and development (R&D) landscaping, the preponderance of these investigations has emphasized predictive analysis based on R&D outcomes, specifically patents, and academic literature. However, the value of research proposals and novelty analysis has seldom been addressed. This study proposes a systematic approach to constructing and navigating the R&D landscape that can be utilized to guide organizations to respond in a reproducible and timely manner to the challenges presented by increasing number of research proposals. At the heart of the proposed approach is the composite use of the transformer-based language model and the local outlier factor (LOF). The semantic meaning of the research proposals is captured with our further-trained transformers, thereby constructing a comprehensive R&D landscape. Subsequently, the novelty of the newly selected research proposals within the annual landscape is quantified on a numerical scale utilizing the LOF by assessing the dissimilarity of each proposal to others preceding and within the same year. A case study examining research proposals in the energy and resource sector in South Korea is presented. The systematic process and quantitative outcomes are expected to be useful decision-support tools, providing future insights regarding R&D planning and roadmapping.

data mining, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2411.02738

Country: Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Energy > Renewable > Hydrogen (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.66)

Add feedback

Towards Trustworthy Automated Driving through Qualitative Scene Understanding and Explanations

Belmecheri, Nassim, Gotlieb, Arnaud, Lazaar, Nadjib, Spieker, Helge

arXiv.org Artificial IntelligenceMar-25-2024

Artificial Intelligence (AI) methods nowadays are at the center of automated driving and connected mobility, including perception and scene understanding [1, 2, 3]. However, passing control to an AI-based system and trusting its decisions requires the ability to request explanations for these decisions [4]. Societal acceptance of automated driving significantly depends on these AI models' trustworthiness, transparency, and reliability [5]. Still, this is an open challenge, as many of the state-of-the-art machine learning (ML) models are opaque and not inherently explainable by themselves [6]. In recent years, several explainable AI methods with a focus on automated driving have been proposed. Following [6], they fall into three main categories: a) Vision-based explainable AI related to highlighting the area of an image that influences a perception model towards a certain output [4]; b) Feature-based importance scores quantify the influence of each input feature on the model output; and c) Textual-based explainable AI that aims to formulate explanations as intelligible arguments using natural language processing [7]. Unfortunately, automated support for multisensor and video-based scene explanation is still restricted to quantitative analysis, e.g., saliency heatmaps [4]. In this work, we exploit qualitative methods for scene understanding by using Qualitative Explainable Graphs (QXG) and, based on this representation, we propose a method for action explanation through simple classification models.

classifier, explanation, vehicle, (17 more...)

arXiv.org Artificial Intelligence

2403.16908

Country:

Asia > Singapore (0.05)
North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
Europe > France > Occitanie > Hérault > Montpellier (0.04)
(3 more...)

Genre: Research Report (0.82)

Industry:

Transportation > Ground > Road (1.00)
Information Technology > Robotics & Automation (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Quantum Normalizing Flows for Anomaly Detection

Rosenhahn, Bodo, Hirche, Christoph

arXiv.org Artificial IntelligenceFeb-5-2024

A Normalizing Flow computes a bijective mapping from an arbitrary distribution to a predefined (e.g. normal) distribution. Such a flow can be used to address different tasks, e.g. anomaly detection, once such a mapping has been learned. In this work we introduce Normalizing Flows for Quantum architectures, describe how to model and optimize such a flow and evaluate our method on example datasets. Our proposed models show competitive performance for anomaly detection compared to classical methods, e.g. based on isolation forests, the local outlier factor (LOF) or single-class SVMs, while being fully executable on a quantum computer.

anomaly detection, dataset, normalizing flow, (15 more...)

arXiv.org Artificial Intelligence

2402.02866

Country:

North America > United States > New York > New York County > New York City (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(2 more...)

Genre: Overview (0.68)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.95)

Add feedback

HLoOP -- Hyperbolic 2-space Local Outlier Probabilities

Allietta, Clémence, Condomines, Jean-Philippe, Tourneret, Jean-Yves, Lochin, Emmanuel

arXiv.org Machine LearningDec-6-2023

Hyperbolic geometry has recently garnered considerable attention in machine learning due to its capacity to embed hierarchical graph structures with low distortions for further downstream processing. This paper introduces a simple framework to detect local outliers for datasets grounded in hyperbolic 2-space referred to as HLoOP (Hyperbolic Local Outlier Probability). Within a Euclidean space, well-known techniques for local outlier detection are based on the Local Outlier Factor (LOF) and its variant, the LoOP (Local Outlier Probability), which incorporates probabilistic concepts to model the outlier level of a data vector. The developed HLoOP combines the idea of finding nearest neighbors, density-based outlier scoring with a probabilistic, statistically oriented approach. Therefore, the method consists in computing the Riemmanian distance of a data point to its nearest neighbors following a Gaussian probability density function expressed in a hyperbolic space. This is achieved by defining a Gaussian cumulative distribution in this space. The HLoOP algorithm is tested on the WordNet dataset yielding promising results. Code and data will be made available on request for reproductibility.

algorithm, dataset, hyperbolic space, (13 more...)

arXiv.org Machine Learning

2312.03895

Country:

North America > United States > New York (0.04)
Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
Asia > Japan > Honshū > Chūbu > Nagano Prefecture > Nagano (0.04)

Genre: Research Report (0.82)

Industry:

Health & Medicine (0.68)
Information Technology (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

Anomaly/Outlier Detection using Local Outlier Factors - DataScienceCentral.com

#artificialintelligenceMar-20-2022, 20:59:11 GMT

Outliers are patterns in data that do not confirm to the expected behavior. While detecting such patterns are of prime importance in Credit Card Fraud, Stock Trading etc. Detecting anomaly or outlier observations are also of importance when training any of the supervised machine learning models. This brings us to two very important questions: concept of a local outlier, and why a local outlier? In a multivariate dataset where the rows are generated independently from a probability distribution, only using centroid of the data might not alone be sufficient to tag all the outliers. Measures like Mahalanobis distance might be able to identify extreme observations but won't be able to label all possible outlier observations.

anomaly outlier detection, local outlier factor, outlier, (8 more...)

#artificialintelligence

Industry: Banking & Finance (0.58)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.42)

Add feedback