AITopics

arXiv.org Artificial IntelligenceMar-9-2025

Towards Superior Quantization Accuracy: A Layer-sensitive Approach

Zhang, Feng, Liu, Yanbin, Li, Weihua, Lv, Jie, Wang, Xiaodan, Bai, Quan

Large Vision and Language Models have exhibited remarkable human-like intelligence in tasks such as natural language comprehension, problem-solving, logical reasoning, and knowledge retrieval. However, training and serving these models require substantial computational resources, posing a significant barrier to their widespread application and further research. To mitigate this challenge, various model compression techniques have been developed to reduce computational requirements. Nevertheless, existing methods often employ uniform quantization configurations, failing to account for the varying difficulties across different layers in quantizing large neural network models. This paper tackles this issue by leveraging layer-sensitivity features, such as activation sensitivity and weight distribution Kurtosis, to identify layers that are challenging to quantize accurately and allocate additional memory budget. The proposed methods, named SensiBoost and KurtBoost, respectively, demonstrate notable improvement in quantization accuracy, achieving up to 9% lower perplexity with only a 2% increase in memory budget on LLama models compared to the baseline.

bit budget, budget, kurtboost, (14 more...)

2503.06518

Country:

North America > United States > New York > New York County > New York City (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceFeb-8-2025

Feature Explosion: a generic optimization strategy for outlier detection algorithms

Li, Qi

Outlier detection tasks aim at discovering potential issues or opportunities and are widely used in cybersecurity, financial security, industrial inspection, etc. To date, thousands of outlier detection algorithms have been proposed. Clearly, in real-world scenarios, such a large number of algorithms is unnecessary. In other words, a large number of outlier detection algorithms are redundant. We believe the root cause of this redundancy lies in the current highly customized (i.e., non-generic) optimization strategies. Specifically, when researchers seek to improve the performance of existing outlier detection algorithms, they have to design separate optimized versions tailored to the principles of each algorithm, leading to an ever-growing number of outlier detection algorithms. To address this issue, in this paper, we introduce the explosion from physics into the outlier detection task and propose a generic optimization strategy based on feature explosion, called OSD (Optimization Strategy for outlier Detection algorithms). In the future, when improving the performance of existing outlier detection algorithms, it will be sufficient to invoke the OSD plugin without the need to design customized optimized versions for them. We compared the performances of 14 outlier detection algorithms on 24 datasets before and after invoking the OSD plugin. The experimental results show that the performances of all outlier detection algorithms are improved on almost all datasets. In terms of average accuracy, OSD make these outlier detection algorithms improve by 15% (AUC), 63.7% (AP).

artificial intelligence, data mining, machine learning, (16 more...)

2502.05496

Country:

Asia > China > Beijing > Beijing (0.04)
Asia > Taiwan > Taiwan Province > Taipei (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Neural Information Processing SystemsJan-18-2025, 13:07:39 GMT

BOND: Benchmarking Unsupervised Outlier Node Detection on Static Attributed Graphs

benchmarking unsupervised outlier node detection, outlier detection algorithm, static attributed graph, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Machine LearningSep-17-2024

Outlier Detection with Cluster Catch Digraphs

Shi, Rui, Billor, Nedret, Ceyhan, Elvan

This paper introduces a novel family of outlier detection algorithms based on Cluster Catch Digraphs (CCDs), specifically tailored to address the challenges of high dimensionality and varying cluster shapes, which deteriorate the performance of most traditional outlier detection methods. We propose the Uniformity-Based CCD with Mutual Catch Graph (U-MCCD), the Uniformity- and Neighbor-Based CCD with Mutual Catch Graph (UN-MCCD), and their shape-adaptive variants (SU-MCCD and SUN-MCCD), which are designed to detect outliers in data sets with arbitrary cluster shapes and high dimensions. We present the advantages and shortcomings of these algorithms and provide the motivation or need to define each particular algorithm. Through comprehensive Monte Carlo simulations, we assess their performance and demonstrate the robustness and effectiveness of our algorithms across various settings and contamination levels. We also illustrate the use of our algorithms on various real-life data sets. The U-MCCD algorithm efficiently identifies outliers while maintaining high true negative rates, and the SU-MCCD algorithm shows substantial improvement in handling non-uniform clusters. Additionally, the UN-MCCD and SUN-MCCD algorithms address the limitations of existing methods in high-dimensional spaces by utilizing Nearest Neighbor Distances (NND) for clustering and outlier detection. Our results indicate that these novel algorithms offer substantial advancements in the accuracy and adaptability of outlier detection, providing a valuable tool for various real-world applications. Keyword: Outlier detection, Graph-based clustering, Cluster catch digraphs, $k$-nearest-neighborhood, Mutual catch graphs, Nearest neighbor distance.

algorithm, outlier, simulation, (14 more...)

2409.11596

Country:

North America > United States > New York > Richmond County > New York City (0.04)
North America > United States > New York > Queens County > New York City (0.04)
North America > United States > New York > New York County > New York City (0.04)
(12 more...)

Genre: Research Report > New Finding (0.66)

Industry:

Health & Medicine > Therapeutic Area (0.93)
Information Technology (0.92)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

arXiv.org Artificial IntelligenceMay-29-2024

Comparative Study of Neighbor-based Methods for Local Outlier Detection

Qi, Zhuang, Zhang, Junlin, Chen, Xiaming, Qi, Xin

The neighbor-based method has become a powerful tool to handle the outlier detection problem, which aims to infer the abnormal degree of the sample based on the compactness of the sample and its neighbors. However, the existing methods commonly focus on designing different processes to locate outliers in the dataset, while the contributions of different types neighbors to outlier detection has not been well discussed. To this end, this paper studies the neighbor in the existing outlier detection algorithms and a taxonomy is introduced, which uses the three-level components of information, neighbor and methodology to define hybrid methods. This taxonomy can serve as a paradigm where a novel neighbor-based outlier detection method can be proposed by combining different components in this taxonomy. A large number of comparative experiments were conducted on synthetic and real-world datasets in terms of performance comparison and case study, and the results show that reverse K-nearest neighbor based methods achieve promising performance and dynamic selection method is suitable for working in high-dimensional space. Notably, it is verified that rationally selecting components from this taxonomy may create an algorithms superior to existing methods.

dataset, neighbor, outlier, (13 more...)

2405.19247

Country:

Asia > China > Guangdong Province > Shantou (0.04)
Asia > China > Shandong Province (0.04)
Asia > China > Jiangsu Province (0.04)
Africa > Middle East > Egypt (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

Boersma, Marcel, Manoorkar, Krishna, Palmigiano, Alessandra, Panettiere, Mattia, Tzimoulis, Apostolos, Wijnberg, Nachoem

Outlier detection using flexible categorisation and interrogative agendas

arXiv.org Artificial IntelligenceDec-20-2023

Categorization is one of the basic tasks in machine learning and data analysis. Building on formal concept analysis (FCA), the starting point of the present work is that different ways to categorize a given set of objects exist, which depend on the choice of the sets of features used to classify them, and different such sets of features may yield better or worse categorizations, relative to the task at hand. In their turn, the (a priori) choice of a particular set of features over another might be subjective and express a certain epistemic stance (e.g. interests, relevance, preferences) of an agent or a group of agents, namely, their interrogative agenda. In the present paper, we represent interrogative agendas as sets of features, and explore and compare different ways to categorize objects w.r.t. different sets of features (agendas). We first develop a simple unsupervised FCA-based algorithm for outlier detection which uses categorizations arising from different agendas. We then present a supervised meta-learning algorithm to learn suitable (fuzzy) agendas for categorization as sets of features with different weights or masses. We combine this meta-learning algorithm with the unsupervised outlier detection algorithm to obtain a supervised outlier detection algorithm. We show that these algorithms perform at par with commonly used algorithms for outlier detection on commonly used datasets in outlier detection. These algorithms provide both local and global explanations of their results.

algorithm, detection, outlier degree, (13 more...)

2312.1201

Country:

Europe > Netherlands > North Holland > Amsterdam (0.04)
Africa > South Africa > Gauteng > Johannesburg (0.04)
North America > United States > Ohio > Summit County > Akron (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry:

Health & Medicine (0.94)
Information Technology > Security & Privacy (0.46)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.45)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

arXiv.org Machine LearningJun-20-2021

Outlier Detection and Spatial Analysis Algorithms

John, Jacob

Outlier detection is a significant area in data mining. It can be either used to pre-process the data prior to an analysis or post the processing phase (before visualization) depending on the effectiveness of the outlier and its importance. Outlier detection extends to several fields such as detection of credit card fraud, network intrusions, machine failure prediction, potential terrorist attacks, and so on. Outliers are those data points with characteristics considerably different. They deviate from the data set causing inconsistencies, noise and anomalies during analysis and result in modification of the original points However, a common misconception is that outliers have to be immediately eliminated or replaced from the data set. Such points could be considered useful if analyzed separately as they could be obtained from a separate mechanism entirely making it important to the research question. This study surveys the different methods of outlier detection for spatial analysis. Spatial data or geospatial data are those that exhibit geographic properties or attributes such as position or areas. An example would be weather data such as precipitation, temperature, wind velocity, and so on collected for a defined region.

detection, outlier, outlier detection, (11 more...)

2106.10669

Country:

Europe > United Kingdom (0.04)
Europe > France > Normandy (0.04)
Asia > India > Tamil Nadu > Vellore (0.04)
Asia > China > Zhejiang Province > Ningbo (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Law Enforcement & Public Safety > Terrorism (0.54)
Information Technology > Security & Privacy (0.34)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)

Hees, Jörn, Herurkar, Dayananda, Meier, Mario

RECol: Reconstruction Error Columns for Outlier Detection

arXiv.org Machine LearningFeb-4-2021

Detecting outliers or anomalies is a common data analysis task. As a sub-field of unsupervised machine learning, a large variety of approaches exist, but the vast majority treats the input features as independent and often fails to recognize even simple (linear) relationships in the input feature space. Hence, we introduce RECol, a generic data pre-processing approach to generate additional columns in a leave-one-out-fashion: For each column, we try to predict its values based on the other columns, generating reconstruction error columns. We run experiments across a large variety of common baseline approaches and benchmark datasets with and without our RECol pre-processing method and show that the generated reconstruction error feature space generally seems to support common outlier detection methods and often considerably improves their ROC-AUC and PR-AUC values.

algorithm, outlier detection algorithm, recol, (11 more...)

2102.02791

Country:

North America > United States > New York > New York County > New York City (0.05)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Machine LearningSep-20-2020

COPOD: Copula-Based Outlier Detection

Li, Zheng, Zhao, Yue, Botta, Nicola, Ionescu, Cezar, Hu, Xiyang

Outlier detection refers to the identification of rare items that are deviant from the general data distribution. Existing approaches suffer from high computational complexity, low predictive capability, and limited interpretability. As a remedy, we present a novel outlier detection algorithm called COPOD, which is inspired by copulas for modeling multivariate data distribution. COPOD first constructs an empirical copula, and then uses it to predict tail probabilities of each given data point to determine its level of "extremeness". Intuitively, we think of this as calculating an anomalous p-value. This makes COPOD both parameter-free, highly interpretable, and computationally efficient. In this work, we make three key contributions, 1) propose a novel, parameter-free outlier detection algorithm with both great performance and interpretability, 2) perform extensive experiments on 30 benchmark datasets to show that COPOD outperforms in most cases and is also one of the fastest algorithms, and 3) release an easy-to-use Python implementation for reproducibility.

copod, probability, tail probability, (16 more...)

2009.09463

Country:

North America > United States > Wisconsin (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > Germany > Brandenburg > Potsdam (0.04)
(2 more...)

Genre: Research Report (0.84)

Industry: Health & Medicine > Therapeutic Area (0.47)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)