Goto

Collaborating Authors

 Performance Analysis


Quantum Boltzmann Machines using Parallel Annealing for Medical Image Classification

arXiv.org Artificial Intelligence

Exploiting the fact that samples drawn from a quantum annealer inherently follow a Boltzmann-like distribution, annealing-based Quantum Boltzmann Machines (QBMs) have gained increasing popularity in the quantum research community. While they harbor great promises for quantum speed-up, their usage currently stays a costly endeavor, as large amounts of QPU time are required to train them. This limits their applicability in the NISQ era. Following the idea of Noè et al. (2024), who tried to alleviate this cost by incorporating parallel quantum annealing into their unsupervised training of QBMs, this paper presents an improved version of parallel quantum annealing that we employ to train QBMs in a supervised setting. Saving qubits to encode the inputs, the latter setting allows us to test our approach on medical images from the MedMNIST data set (Yang et al., 2023), thereby moving closer to real-world applicability of the technology. Our experiments show that QBMs using our approach already achieve reasonable results, comparable to those of similarly-sized Convolutional Neural Networks (CNNs), with markedly smaller numbers of epochs than these classical models. Our parallel annealing technique leads to a speed-up of almost 70 % compared to regular annealing-based BM executions.


LLAMAPIE: Proactive In-Ear Conversation Assistants

arXiv.org Artificial Intelligence

We introduce LlamaPIE, the first real-time proactive assistant designed to enhance human conversations through discreet, concise guidance delivered via hearable devices. Unlike traditional language models that require explicit user invocation, this assistant operates in the background, anticipating user needs without interrupting conversations. We address several challenges, including determining when to respond, crafting concise responses that enhance conversations, leveraging knowledge of the user for context-aware assistance, and real-time, on-device processing. To achieve this, we construct a semi-synthetic dialogue dataset and propose a two-model pipeline: a small model that decides when to respond and a larger model that generates the response. We evaluate our approach on real-world datasets, demonstrating its effectiveness in providing helpful, unobtrusive assistance. User studies with our assistant, implemented on Apple Silicon M2 hardware, show a strong preference for the proactive assistant over both a baseline with no assistance and a reactive model, highlighting the potential of LlamaPie to enhance live conversations.


Enhancing Glass Defect Detection with Diffusion Models: Addressing Imbalanced Datasets in Manufacturing Quality Control

arXiv.org Artificial Intelligence

Visual defect detection in industrial glass manufacturing remains a critical challenge due to the low frequency of defective products, leading to imbalanced datasets that limit the performance of deep learning models and computer vision systems. This paper presents a novel approach using Denoising Diffusion Probabilistic Models (DDPMs) to generate synthetic defective glass product images for data augmentation, effectively addressing class imbalance issues in manufacturing quality control and automated visual inspection. The methodology significantly enhances image classification performance of standard CNN architectures (ResNet50V2, EfficientNetB0, and MobileNetV2) in detecting anomalies by increasing the minority class representation. Experimental results demonstrate substantial improvements in key machine learning metrics, particularly in recall for defective samples across all tested deep neural network architectures while maintaining perfect precision on the validation set. The most dramatic improvement was observed in ResNet50V2's overall classification accuracy, which increased from 78\% to 93\% when trained with the augmented data. This work provides a scalable, cost-effective approach to enhancing automated defect detection in glass manufacturing that can potentially be extended to other industrial quality assurance systems and industries with similar class imbalance challenges.


Self-Aware Safety Augmentation: Leveraging Internal Semantic Understanding to Enhance Safety in Vision-Language Models

arXiv.org Artificial Intelligence

Large vision-language models (LVLMs) are vulnerable to harmful input compared to their language-only backbones. We investigated this vulnerability by exploring LVLMs internal dynamics, framing their inherent safety understanding in terms of three key capabilities. Specifically, we define these capabilities as safety perception, semantic understanding, and alignment for linguistic expression, and experimentally pinpointed their primary locations within the model architecture. The results indicate that safety perception often emerges before comprehensive semantic understanding, leading to the reduction in safety. Motivated by these findings, we propose \textbf{Self-Aware Safety Augmentation (SASA)}, a technique that projects informative semantic representations from intermediate layers onto earlier safety-oriented layers. This approach leverages the model's inherent semantic understanding to enhance safety recognition without fine-tuning. Then, we employ linear probing to articulate the model's internal semantic comprehension to detect the risk before the generation process. Extensive experiments on various datasets and tasks demonstrate that SASA significantly improves the safety of LVLMs, with minimal impact on the utility.


Data-Efficient Prediction-Powered Calibration via Cross-Validation

arXiv.org Machine Learning

--Calibration data are necessary to formally quantify the uncertainty of the decisions produced by an existing artificial intelligence (AI) model. T o overcome the common issue of scarce calibration data, a promising approach is to employ synthetic labels produced by a (generally different) predictive model. This paper introduces a novel approach that efficiently utilizes limited calibration data to simultaneously fine-tune a predictor and estimate the bias of the synthetic labels. The proposed method yields prediction sets with rigorous coverage guarantees for AI-generated decisions. N many AI applications, it is critical to quantify the uncertainty in model decisions by constructing prediction sets or confidence intervals.


Irredundant $k$-Fold Cross-Validation

arXiv.org Machine Learning

In traditional k-fold cross-validation, each instance is used ($k-1$) times for training and once for testing, leading to redundancy that lets many instances disproportionately influence the learning phase. We introduce Irredundant $k$-fold cross-validation, a novel method that guarantees each instance is used exactly once for training and once for testing across the entire validation procedure. This approach ensures a more balanced utilization of the dataset, mitigates overfitting due to instance repetition, and enables sharper distinctions in comparative model analysis. The method preserves stratification and remains model-agnostic, i.e., compatible with any classifier. Experimental results demonstrate that it delivers consistent performance estimates across diverse datasets -- comparable to $k$-fold cross-validation -- while providing less optimistic variance estimates because training partitions are non-overlapping, and significantly reducing the overall computational cost.


Behavior-Specific Filtering for Enhanced Pig Behavior Classification in Precision Livestock Farming

arXiv.org Artificial Intelligence

Precision Livestock Farming (PLF) has emerged as a critical field for monitoring and improving animal health and behavior[1]. Accurate and continuous tracking of livestock behavior is essential for identifying early signs of health issues an d enabling timely intervention. Traditional methods for monitoring pig behavior, such as manual observation, are labor - intensive, limited in scalability, and prone to inaccuracies [2]. Recent advancements in PLF have introduced automated systems that lev erage biosensors to track behavior in real time. These sensors, often attached to animals, collect data that is both costeffective and reliable, making them indispensable for modern livestock management [3,4].


HAMLET-FFD: Hierarchical Adaptive Multi-modal Learning Embeddings Transformation for Face Forgery Detection

arXiv.org Artificial Intelligence

The rapid evolution of face manipulation techniques poses a critical challenge for face forgery detection: cross-domain generalization. Conventional methods, which rely on simple classification objectives, often fail to learn domain-invariant representations. We propose HAMLET-FFD, a cognitively inspired Hierarchical Adaptive Multi-modal Learning framework that tackles this challenge via bidirectional cross-modal reasoning. Building on contrastive vision-language models such as CLIP, HAMLET-FFD introduces a knowledge refinement loop that iteratively assesses authenticity by integrating visual evidence with conceptual cues, emulating expert forensic analysis. A key innovation is a bidirectional fusion mechanism in which textual authenticity embeddings guide the aggregation of hierarchical visual features, while modulated visual features refine text embeddings to generate image-adaptive prompts. This closed-loop process progressively aligns visual observations with semantic priors to enhance authenticity assessment. By design, HAMLET-FFD freezes all pretrained parameters, serving as an external plugin that preserves CLIP's original capabilities. Extensive experiments demonstrate its superior generalization to unseen manipulations across multiple benchmarks, and visual analyses reveal a division of labor among embeddings, with distinct representations specializing in fine-grained artifact recognition.


Investigation of Accuracy and Bias in Face Recognition Trained with Synthetic Data

arXiv.org Artificial Intelligence

The use of synthetic data to train face recognition (FR) models has gained increasing attention in recent years, primarily, due to its potential to avoid ethical, legal, and licensing challenges associated with using real facial images, especially in the context of privacy regulations such as the GDPR [1]. Synthetic data offers the potential to generate large-scale datasets to train models for commercial use without infringing on individual privacy, hence, facilitating the development of safer FR systems. Moreover, it allows a more fine-grained control over the data generation, which potentially can help mitigating biases in FR systems. The main focus of the recent work is on generating synthetic face datasets that could be used to train FR models with performance approaching that of models trained on real data [1, 3-8]. However, several issues are still missing from the current research discourse pertaining to training FR models with synthetic data: Dual-generator framework: Synthetic face data generation often employs a two-stage process: a seed generator for creation of distinct identities and an augmentation generator for producing intra-class variations such as different poses, lighting conditions, and expressions. Although considerable effort is directed toward enhancing the diversity of seed identities [3, 4], the role of augmentation generators in influencing FR performance remains underexplored. Unfair dataset comparisons: Comparative studies between synthetic and real datasets frequently suffer from inconsistencies in dataset sizes and compositions. A synthetic dataset with 10 K identities and 64 images per identity gets compared with another dataset of 50 k identities and 20 images per identity or to a WebFace-12M's 1. 5 M identities and 12 M images [2].


Multilingual Self-Taught Faithfulness Evaluators

arXiv.org Artificial Intelligence

The growing use of large language models (LLMs) has increased the need for automatic evaluation systems, particularly to address the challenge of information hallucination. Although existing faithfulness evaluation approaches have shown promise, they are predominantly English-focused and often require expensive human-labeled training data for fine-tuning specialized models. As LLMs see increased adoption in multilingual contexts, there is a need for accurate faithfulness evaluators that can operate across languages without extensive labeled data. This paper presents Self-Taught Evaluators for Multilingual Faithfulness, a framework that learns exclusively from synthetic multilingual summarization data while leveraging cross-lingual transfer learning. Through experiments comparing language-specific and mixed-language fine-tuning approaches, we demonstrate a consistent relationship between an LLM's general language capabilities and its performance in language-specific evaluation tasks. Our framework shows improvements over existing baselines, including state-of-the-art English evaluators and machine translation-based approaches.