Goto

Collaborating Authors

 local outlier factor


Short Ticketing Detection Framework Analysis Report

Miao, Yuyang, Xing, Huijun, Mandic, Danilo P., Constantinides, Tony G.

arXiv.org Artificial Intelligence

Each year, fare evasion costs the UK railway system approximately 240 million pounds [1] with short ticketing, where passengers buy tickets for shorter, cheaper journeys but travel beyond the permitted destination, representing a specific and often undetected aspect of the broader issue. A simple but practical example would be: a passenger travelling from Seaside Station to International Terminus Station via Commuter Hub Station and Financial District Station might purchase two separate tickets (Seaside Station to Commuter Hub Station, and Financial District Station to International Terminus Station) instead of the complete journey ticket, potentially saving money while committing ticket fraud leading to revenue loss for the Train Operating Companies (TOCs). To solve this problem, this comprehensive report provides an in-depth analysis of the short ticketing detection framework developed by researchers Yuyang Miao and Huijun Xing at Imperial College London. This study represents an unsupervised machine learning approach. This work is based on a dataset collected from the UK railway system, including 100 stations' entry and exit data for seven days, with approximately 6.5 million trials of records.


THEMIS: Unlocking Pretrained Knowledge with Foundation Model Embeddings for Anomaly Detection in Time Series

Lorik, Yadav Mahesh, Sarveswaran, Kaushik, Sundaramahalingam, Nagaraj, Venugopalan, Aravindakumar

arXiv.org Artificial Intelligence

Time series anomaly detection forms a very crucial area in several domains but poses substantial challenges. Due to time series data possessing seasonality, trends, noise, and evolving patterns (concept drift), it becomes very difficult to set a general notion of what constitutes normal behavior. Anomalies themselves could be varied, ranging from a single outlier to contextual or collective anomalies, and are normally very rare; hence, the dataset is largely imbalanced. Additional layers of complexities arise due to the problems of increased dimensionality of modern time series, real-time detection criteria, setting up appropriate detection thresholds, and arriving at results that are interpretable. To embrace these multifaceted challenges, very strong, flexible, and interpretable approaches are required. This paper presents THEMIS, a new framework for time series anomaly detection that exploits pretrained knowledge from foundation models. THEMIS extracts embeddings from the encoder of the Chronos time series foundation model and applies outlier detection techniques like Local Outlier Factor and Spectral Decomposition on the self-similarity matrix, to spot anomalies in the data. Our experiments show that this modular method achieves SOTA results on the MSL dataset and performs quite competitively on the SMAP and SWAT$^*$ datasets. Notably, THEMIS exceeds models trained specifically for anomaly detection, presenting hyperparameter robustness and interpretability by default. This paper advocates for pretrained representations from foundation models for performing efficient and adaptable anomaly detection for time series data.


Unsupervised Outlier Detection in Audit Analytics: A Case Study Using USA Spending Data

Li, Buhe, Kaplan, Berkay, Lazirko, Maksym, Kogan, Aleksandr

arXiv.org Artificial Intelligence

This study investigates the effectiveness of unsupervised outlier detection methods in audit analytics, utilizing USA spending data from the U.S. Department of Health and Human Services (DHHS) as a case example. We employ and compare multiple outlier detection algorithms, including Histogram-based Outlier Score (HBOS), Robust Principal Component Analysis (PCA), Minimum Covariance Determinant (MCD), and K-Nearest Neighbors (KNN) to identify anomalies in federal spending patterns. The research addresses the growing need for efficient and accurate anomaly detection in large-scale governmental datasets, where traditional auditing methods may fall short. Our methodology involves data preparation, algorithm implementation, and performance evaluation using precision, recall, and F1 scores. Results indicate that a hybrid approach, combining multiple detection strategies, enhances the robustness and accuracy of outlier identification in complex financial data. This study contributes to the field of audit analytics by providing insights into the comparative effectiveness of various outlier detection models and demonstrating the potential of unsupervised learning techniques in improving audit quality and efficiency. The findings have implications for auditors, policymakers, and researchers seeking to leverage advanced analytics in governmental financial oversight and risk management.


Real-Time Anomaly Detection with Synthetic Anomaly Monitoring (SAM)

Luzio, Emanuele, Ponti, Moacir Antonelli

arXiv.org Artificial Intelligence

Anomaly detection is essential for identifying rare and significant events across diverse domains such as finance, cybersecurity, and network monitoring. This paper presents Synthetic Anomaly Monitoring (SAM), an innovative approach that applies synthetic control methods from causal inference to improve both the accuracy and interpretability of anomaly detection processes. By modeling normal behavior through the treatment of each feature as a control unit, SAM identifies anomalies as deviations within this causal framework. We conducted extensive experiments comparing SAM with established benchmark models, including Isolation Forest, Local Outlier Factor (LOF), k-Nearest Neighbors (kNN), and One-Class Support Vector Machine (SVM), across five diverse datasets, including Credit Card Fraud, HTTP Dataset CSIC 2010, and KDD Cup 1999, among others. Our results demonstrate that SAM consistently delivers robust performance, highlighting its potential as a powerful tool for real-time anomaly detection in dynamic and complex environments.


Novelty-focused R&D landscaping using transformer and local outlier factor

Choi, Jaewoong

arXiv.org Artificial Intelligence

While numerous studies have explored the field of research and development (R&D) landscaping, the preponderance of these investigations has emphasized predictive analysis based on R&D outcomes, specifically patents, and academic literature. However, the value of research proposals and novelty analysis has seldom been addressed. This study proposes a systematic approach to constructing and navigating the R&D landscape that can be utilized to guide organizations to respond in a reproducible and timely manner to the challenges presented by increasing number of research proposals. At the heart of the proposed approach is the composite use of the transformer-based language model and the local outlier factor (LOF). The semantic meaning of the research proposals is captured with our further-trained transformers, thereby constructing a comprehensive R&D landscape. Subsequently, the novelty of the newly selected research proposals within the annual landscape is quantified on a numerical scale utilizing the LOF by assessing the dissimilarity of each proposal to others preceding and within the same year. A case study examining research proposals in the energy and resource sector in South Korea is presented. The systematic process and quantitative outcomes are expected to be useful decision-support tools, providing future insights regarding R&D planning and roadmapping.


Towards Trustworthy Automated Driving through Qualitative Scene Understanding and Explanations

Belmecheri, Nassim, Gotlieb, Arnaud, Lazaar, Nadjib, Spieker, Helge

arXiv.org Artificial Intelligence

Artificial Intelligence (AI) methods nowadays are at the center of automated driving and connected mobility, including perception and scene understanding [1, 2, 3]. However, passing control to an AI-based system and trusting its decisions requires the ability to request explanations for these decisions [4]. Societal acceptance of automated driving significantly depends on these AI models' trustworthiness, transparency, and reliability [5]. Still, this is an open challenge, as many of the state-of-the-art machine learning (ML) models are opaque and not inherently explainable by themselves [6]. In recent years, several explainable AI methods with a focus on automated driving have been proposed. Following [6], they fall into three main categories: a) Vision-based explainable AI related to highlighting the area of an image that influences a perception model towards a certain output [4]; b) Feature-based importance scores quantify the influence of each input feature on the model output; and c) Textual-based explainable AI that aims to formulate explanations as intelligible arguments using natural language processing [7]. Unfortunately, automated support for multisensor and video-based scene explanation is still restricted to quantitative analysis, e.g., saliency heatmaps [4]. In this work, we exploit qualitative methods for scene understanding by using Qualitative Explainable Graphs (QXG) and, based on this representation, we propose a method for action explanation through simple classification models.


Quantum Normalizing Flows for Anomaly Detection

Rosenhahn, Bodo, Hirche, Christoph

arXiv.org Artificial Intelligence

A Normalizing Flow computes a bijective mapping from an arbitrary distribution to a predefined (e.g. normal) distribution. Such a flow can be used to address different tasks, e.g. anomaly detection, once such a mapping has been learned. In this work we introduce Normalizing Flows for Quantum architectures, describe how to model and optimize such a flow and evaluate our method on example datasets. Our proposed models show competitive performance for anomaly detection compared to classical methods, e.g. based on isolation forests, the local outlier factor (LOF) or single-class SVMs, while being fully executable on a quantum computer.


HLoOP -- Hyperbolic 2-space Local Outlier Probabilities

Allietta, Clémence, Condomines, Jean-Philippe, Tourneret, Jean-Yves, Lochin, Emmanuel

arXiv.org Machine Learning

Hyperbolic geometry has recently garnered considerable attention in machine learning due to its capacity to embed hierarchical graph structures with low distortions for further downstream processing. This paper introduces a simple framework to detect local outliers for datasets grounded in hyperbolic 2-space referred to as HLoOP (Hyperbolic Local Outlier Probability). Within a Euclidean space, well-known techniques for local outlier detection are based on the Local Outlier Factor (LOF) and its variant, the LoOP (Local Outlier Probability), which incorporates probabilistic concepts to model the outlier level of a data vector. The developed HLoOP combines the idea of finding nearest neighbors, density-based outlier scoring with a probabilistic, statistically oriented approach. Therefore, the method consists in computing the Riemmanian distance of a data point to its nearest neighbors following a Gaussian probability density function expressed in a hyperbolic space. This is achieved by defining a Gaussian cumulative distribution in this space. The HLoOP algorithm is tested on the WordNet dataset yielding promising results. Code and data will be made available on request for reproductibility.


Anomaly/Outlier Detection using Local Outlier Factors - DataScienceCentral.com

#artificialintelligence

Outliers are patterns in data that do not confirm to the expected behavior. While detecting such patterns are of prime importance in Credit Card Fraud, Stock Trading etc. Detecting anomaly or outlier observations are also of importance when training any of the supervised machine learning models. This brings us to two very important questions: concept of a local outlier, and why a local outlier? In a multivariate dataset where the rows are generated independently from a probability distribution, only using centroid of the data might not alone be sufficient to tag all the outliers. Measures like Mahalanobis distance might be able to identify extreme observations but won't be able to label all possible outlier observations.


An In-depth Guide to Local Outlier Factor (LOF) for Outlier Detection in Python

#artificialintelligence

Originally published on Towards AI the World's Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses. It's free, we don't spam, and we never share your email address.