Goto

Collaborating Authors

 Performance Analysis


Membership and Memorization in LLM Knowledge Distillation

arXiv.org Artificial Intelligence

Recent advances in Knowledge Distillation (KD) aim to mitigate the high computational demands of Large Language Models (LLMs) by transferring knowledge from a large ''teacher'' to a smaller ''student'' model. However, students may inherit the teacher's privacy when the teacher is trained on private data. In this work, we systematically characterize and investigate membership and memorization privacy risks inherent in six LLM KD techniques. Using instruction-tuning settings that span seven NLP tasks, together with three teacher model families (GPT-2, LLAMA-2, and OPT), and various size student models, we demonstrate that all existing LLM KD approaches carry membership and memorization privacy risks from the teacher to its students. However, the extent of privacy risks varies across different KD techniques. We systematically analyse how key LLM KD components (KD objective functions, student training data and NLP tasks) impact such privacy risks. We also demonstrate a significant disagreement between memorization and membership privacy risks of LLM KD techniques. Finally, we characterize per-block privacy risk and demonstrate that the privacy risk varies across different blocks by a large margin.


Privacy-Preserving Tabular Synthetic Data Generation Using TabularARGN

arXiv.org Artificial Intelligence

Synthetic data generation has become essential for securely sharing and analyzing sensitive data sets. Traditional anonymization techniques, however, often fail to adequately preserve privacy. We introduce the Tabular Auto-Regressive Generative Network (TabularARGN), a neural network architecture specifically designed for generating high-quality synthetic tabular data. Using a discretization-based auto-regressive approach, TabularARGN achieves high data fidelity while remaining computationally efficient. We evaluate TabularARGN against existing synthetic data generation methods, showing competitive results in statistical similarity, machine learning utility, and detection robustness. We further perform an in-depth privacy evaluation using systematic membership-inference attacks, highlighting the robustness and effective privacy-utility balance of our approach.


Segmented Confidence Sequences and Multi-Scale Adaptive Confidence Segments for Anomaly Detection in Nonstationary Time Series

arXiv.org Artificial Intelligence

As time series data become increasingly prevalent in domains such as manufacturing, IT, and infrastructure monitoring, anomaly detection must adapt to nonstationary environments where statistical properties shift over time. Traditional static thresholds are easily rendered obsolete by regime shifts, concept drift, or multi-scale changes. To address these challenges, we introduce and empirically evaluate two novel adaptive thresholding frameworks: Segmented Confidence Sequences (SCS) and Multi-Scale Adaptive Confidence Segments (MACS). Both leverage statistical online learning and segmentation principles for local, contextually sensitive adaptation, maintaining guarantees on false alarm rates even under evolving distributions. Our experiments across Wafer Manufacturing benchmark datasets show significant F1-score improvement compared to traditional percentile and rolling quantile approaches. This work demonstrates that robust, statistically principled adaptive thresholds enable reliable, interpretable, and timely detection of diverse real-world anomalies.


Semi-Supervised Supply Chain Fraud Detection with Unsupervised Pre-Filtering

arXiv.org Artificial Intelligence

Detecting fraud in modern supply chains is a growing challenge, driven by the complexity of global networks and the scarcity of labeled data. Traditional detection methods often struggle with class imbalance and limited supervision, reducing their effectiveness in real-world applications. This paper proposes a novel two-phase learning framework to address these challenges. In the first phase, the Isolation Forest algorithm performs unsupervised anomaly detection to identify potential fraud cases and reduce the volume of data requiring further analysis. In the second phase, a self-training Support Vector Machine (SVM) refines the predictions using both labeled and high-confidence pseudo-labeled samples, enabling robust semi-supervised learning. The proposed method is evaluated on the DataCo Smart Supply Chain Dataset, a comprehensive real-world supply chain dataset with fraud indicators. It achieves an F1-score of 0.817 while maintaining a false positive rate below 3.0%. These results demonstrate the effectiveness and efficiency of combining unsupervised pre-filtering with semi-supervised refinement for supply chain fraud detection under real-world constraints, though we acknowledge limitations regarding concept drift and the need for comparison with deep learning approaches.


From Label Error Detection to Correction: A Modular Framework and Benchmark for Object Detection Datasets

arXiv.org Artificial Intelligence

Object detection has advanced rapidly in recent years, driven by increasingly large and diverse datasets. However, label errors, defined as missing labels, incorrect classification or inaccurate localization, often compromise the quality of these datasets. This can have a significant impact on the outcomes of training and benchmark evaluations. Although several methods now exist for detecting label errors in object detection datasets, they are typically validated only on synthetic benchmarks or limited manual inspection. How to correct such errors systemically and at scale therefore remains an open problem. We introduce a semi-automated framework for label-error correction called REC$\checkmark$D (Rechecked). Building on existing detectors, the framework pairs their error proposals with lightweight, crowd-sourced microtasks. These tasks enable multiple annotators to independently verify each candidate bounding box, and their responses are aggregated to estimate ambiguity and improve label quality. To demonstrate the effectiveness of REC$\checkmark$D, we apply it to the class pedestrian in the KITTI dataset. Our crowdsourced review yields high-quality corrected annotations, which indicate a rate of at least 24% of missing and inaccurate annotations in original annotations. This validated set will be released as a new real-world benchmark for label error detection and correction. We show that current label error detection methods, when combined with our correction framework, can recover hundreds of errors in the time it would take a human to annotate bounding boxes from scratch. However, even the best methods still miss up to 66% of the true errors and with low quality labels introduce more errors than they find. This highlights the urgent need for further research, now enabled by our released benchmark.


Age-Diverse Deepfake Dataset: Bridging the Age Gap in Deepfake Detection

arXiv.org Artificial Intelligence

The challenges associated with deepfake detection are increasing significantly with the latest advancements in technology and the growing popularity of deepfake videos and images. Despite the presence of numerous detection models, demographic bias in the deepfake dataset remains largely unaddressed. This paper focuses on the mitigation of age-specific bias in the deepfake dataset by introducing an age-diverse deepfake dataset that will improve fairness across age groups. The dataset is constructed through a modular pipeline incorporating the existing deepfake datasets Celeb-DF, FaceForensics++, and UTKFace datasets, and the creation of synthetic data to fill the age distribution gaps. The effectiveness and generalizability of this dataset are evaluated using three deepfake detection models: XceptionNet, EfficientNet, and LipForensics. Evaluation metrics, including AUC, pAUC, and EER, revealed that models trained on the age-diverse dataset demonstrated fairer performance across age groups, improved overall accuracy, and higher generalization across datasets. This study contributes a reproducible, fairness-aware deepfake dataset and model pipeline that can serve as a foundation for future research in fairer deepfake detection. The complete dataset and implementation code are available at https://github.com/unishajoshi/age-diverse-deepfake-detection.


Are you In or Out (of gallery)? Wisdom from the Same-Identity Crowd

arXiv.org Artificial Intelligence

A central problem in one-to-many facial identification is that the person in the probe image may or may not have enrolled image(s) in the gallery; that is, may be In-gallery or Out-of-gallery. Past approaches to detect when a rank-one result is Out-of-gallery have mostly focused on finding a suitable threshold on the similarity score. W e take a new approach, using the additional enrolled images of the identity with the rank-one result to predict if the rank-one result is In-gallery / Out-of-gallery. Given a gallery of identities and images, we generate In-gallery and Out-of-gallery training data by extracting the ranks of additional enrolled images corresponding to the rank-one identity. W e then train a classifier to utilize this feature vector to predict whether a rank-one result is In-gallery or Out-of-gallery. Using two different datasets and four different matchers, we present experimental results showing that our approach is viable for mugshot quality probe images, and also, importantly, for probes degraded by blur, reduced resolution, atmospheric turbulence and sunglasses. W e also analyze results across demographic groups, and show that In-gallery / Out-of-gallery classification accuracy is similar across demographics. Our approach has the potential to provide an objective estimate of whether a one-to-many facial identification is Out-of-gallery, and thereby to reduce false positive identifications, wrongful arrests, and wasted investigative time. Interestingly, comparing the results of older deep CNN-based face matchers with newer ones suggests that the effectiveness of our Out-of-gallery detection approach emerges only with matchers trained using advanced margin-based loss functions.


A Study on Regularization-Based Continual Learning Methods for Indic ASR

arXiv.org Artificial Intelligence

Indias linguistic diversity poses significant challenges for developing inclusive Automatic Speech Recognition (ASR) systems. Traditional multilingual models, which require simultaneous access to all language data, are impractical due to the sequential arrival of data and privacy constraints. Continual Learning (CL) offers a solution by enabling models to learn new languages sequentially without catastrophically forgetting previously learned knowledge. This paper investigates CL for ASR on Indian languages using a subset of the IndicSUPERB benchmark. We employ a Conformer-based hybrid RNN-T/CTC model, initially pretrained on Hindi, which is then incrementally trained on eight additional Indian languages, for a total sequence of nine languages. We evaluate three prominent regularization- and distillation-based CL strategies: Elastic Weight Consolidation (EWC), Memory Aware Synapses (MAS), and Learning without Forgetting (LwF), selected for their suitability in no-replay, privacy-conscious scenarios. Performance is analyzed using Word Error Rate (WER) for both RNN-T and CTC paths on clean and noisy data, as well as knowledge retention via Backward Transfer. We also explore the impact of varying the number of training epochs (1, 2, 5, and 10) per task. Results, compared against naive fine-tuning, demonstrate CLs effectiveness in mitigating forgetting, making it a promising approach for scalable ASR in diverse Indian languages under realistic constraints. The code is available at: https://github.com/FrozenWolf-Cyber/Indic-CL-ASR


Classification is a RAG problem: A case study on hate speech detection

arXiv.org Artificial Intelligence

Robust content moderation requires classification systems that can quickly adapt to evolving policies without costly retraining. We present classification using Retrieval-Augmented Generation (RAG), which shifts traditional classification tasks from determining the correct category in accordance with pre-trained parameters to evaluating content in relation to contextual knowledge retrieved at inference. In hate speech detection, this transforms the task from "is this hate speech?" to "does this violate the hate speech policy?" Our Contextual Policy Engine (CPE) - an agentic RAG system - demonstrates this approach and offers three key advantages: (1) robust classification accuracy comparable to leading commercial systems, (2) inherent explainability via retrieved policy segments, and (3) dynamic policy updates without model retraining. Through three experiments, we demonstrate strong baseline performance and show that the system can apply fine-grained policy control by correctly adjusting protection for specific identity groups without requiring retraining or compromising overall performance. These findings establish that RAG can transform classification into a more flexible, transparent, and adaptable process for content moderation and wider classification problems.


Study of Robust Features in Formulating Guidance for Heuristic Algorithms for Solving the Vehicle Routing Problem

arXiv.org Artificial Intelligence

Combinatorial optimization problems, such as Vehicle Routing Problems (VRP), are important in real-world applications as they search for efficient solutions to minimize costs. Despite extensive research over decades, achieving optimal solutions remains a challenge (Laporte, 2009). Furthermore, the unique constraints of various problem variants demand specialized algorithms. The development of these algorithms is complex, making Machine Learning (ML) an attractive approach to improving the existing algorithms. Routing algorithms are typically divided into two categories: exact algorithms that offer global optimum but require many computational resources and heuristics methods for practical, real-world applications that mostly find a near-optimal solution. While most heuristics rely on human-designed strategies (Lucas et al., 2020), ML offers a new approach improving algorithm. Moreover, the selection of features influenced by these ML models plays a critical role in effectively enhancing heuristic performances (Arnold and S orensen, 2019b; Arnold and S orensen, 2019a; Lucas, Billot, and Sevaux, 2019). Understanding the predictions of an ML model can be as crucial as the accuracy of the prediction itself in many applications (Lundberg and Lee, 2017).