Goto

Collaborating Authors

 Overview


LATTE: Latent Trajectory Embedding for Diffusion-Generated Image Detection

arXiv.org Artificial Intelligence

The rapid advancement of diffusion-based image generators has made it increasingly difficult to distinguish generated from real images. This erodes trust in digital media, making it critical to develop generated image detectors that remain reliable across different generators. While recent approaches leverage diffusion denoising cues, they typically rely on single-step reconstruction errors and overlook the sequential nature of the denoising process. In this work, we propose LATTE - LATent Trajectory Embedding - a novel approach that models the evolution of latent embeddings across multiple denoising steps. Instead of treating each denoising step in isolation, LATTE captures the trajectory of these representations, revealing subtle and discriminative patterns that distinguish real from generated images. Experiments on several benchmarks, such as GenImage, Chameleon, and Diffusion Forensics, show that LATTE achieves superior performance, especially in challenging cross-generator and cross-dataset scenarios, highlighting the potential of latent trajectory modeling. The code is available on the following link: https://github.com/AnaMVasilcoiu/LATTE-Diffusion-Detector.


Scalable LLM Math Reasoning Acceleration with Low-rank Distillation

arXiv.org Artificial Intelligence

While many existing efficient inference methods have been developed with excellent performance preservation on language tasks, they often severely degrade math performance. In this paper, we propose Caprese, a resource-efficient distillation method to recover lost capabilities from deploying efficient inference methods, focused primarily in feedforward blocks. With original weights unperturbed, roughly 1% of additional parameters, and only 20K synthetic training samples, we are able to recover much if not all of the math capabilities lost from efficient inference for thinking LLMs and without harm to language tasks for instruct LLMs. Moreover, Caprese slashes the number of active parameters ( 2B cut for Gemma 2 9B and Llama 3.1 8B) and integrates cleanly into existing model layers to reduce latency (>16% time-to-next-token reduction) while encouraging response brevity (up to 8.5% fewer tokens).


Conflict-Based Search and Prioritized Planning for Multi-Agent Path Finding Among Movable Obstacles

arXiv.org Artificial Intelligence

Abstract--This paper investigates Multi-Agent Path Finding Among Movable Obstacles (M-PAMO), which seeks collision-free paths for multiple agents from their start to goal locations among static and movable obstacles. M-PAMO arises in logistics and warehouses where mobile robots are among unexpected movable objects. Although Multi-Agent Path Finding (MAPF) and single-agent Path planning Among Movable Obstacles (PAMO) were both studied, M-PAMO remains under-explored. Movable obstacles lead to new fundamental challenges as the state space, which includes both agents and movable obstacles, grows exponentially with respect to the number of agents and movable obstacles. This paper makes a first attempt to adapt and fuse the popular Conflict-Based Search (CBS) and Prioritized Planning (PP) for MAPF, and a recent single-agent PAMO planner called PAMO*, together to address M-PAMO. We compare their performance with up to 20 agents and hundreds of movable obstacles, and show the pros and cons of these approaches.


EEG-based AI-BCI Wheelchair Advancement: Hybrid Deep Learning with Motor Imagery for Brain Computer Interface

arXiv.org Artificial Intelligence

This paper presents an Artificial Intelligence (AI) integrated novel approach to Brain - Computer Interface (BCI) - based wheelchair development, utilizing a motor imagery r ight - l eft - h and m ovement mechanism for control. The system is designed to simulate wheelchair navigation based on motor imagery right and left - hand movements using electroencephalogram (EEG) data. A pre - filtered dataset, obtained from an open - source EEG repository, was seg mented into arrays of 19x200 to capture the onset of hand movements. Th e data was acquired at a sampling frequency of 200Hz. The system integrates a Tkinter - based interface for simulating wheelchair movements, offering users a functional and intuitive control system. We propose a BiLSTM - BiGRU model that shows a superior test accuracy of 92. 26 % as compared with v arious machine learning baseline models, including XGBoost, EEGNet, and a transformer - based model . The Bi - LSTM - BiGRU attention - based model achieved a mean accuracy of 90.13 % through cross - validation, showcasing the potential of attention mechanisms in BCI applications. Keywords: Brain Computer Interface (BCI), BiLSTM - BiGRU, Raspberry Pi, E lectroencephalogram (EEG), Hybrid Deep learning 1. Introduction Brain - Computer Interfaces (BCIs) are advanced systems that establish direct communication between the human brain and external devices . In recent years, BCIs have been widely investigated for their potential to assist individuals with mobility impairments, offering novel pathways for restoring autonomy. This paper proposes a BCI - based wheelchair control system driven by electroencephalogra phy (EEG) signals associated with motor imagery. The proposed framework incorporates a variety of machine learning models with tailored hyperparameter optimization techniques, culminating in the deployment of a BiLSTM - BiGRU hybrid deep learning model for effective EEG signal classification.


Deep Joint Task Learning for Generic Object Extraction

Neural Information Processing Systems

This paper investigates how to extract objects-of-interest without relying on hand-craft features and sliding windows approaches, that aims to jointly solve two sub-tasks: (i) rapidly localizing salient objects from images, and (ii) accurately segmenting the objects based on the localizations. We present a general joint task learning framework, in which each task (either object localization or object segmentation) is tackled via a multi-layer convolutional neural network, and the two networks work collaboratively to boost performance. In particular, we propose to incorporate latent variables bridging the two networks in a joint optimization manner. The first network directly predicts the positions and scales of salient objects from raw images, and the latent variables adjust the object localizations to feed the second network that produces pixelwise object masks. An EM-type method is then studied for the joint optimization, iterating with two steps: (i) by using the two networks, it estimates the latent variables by employing an MCMC-based sampling method; (ii) it optimizes the parameters of the two networks unitedly via back propagation, with the fixed latent variables. Extensive experiments demonstrate that our joint learning framework significantly outperforms other state-of-the-art approaches in both accuracy and efficiency (e.g., 1000 times faster than competing approaches).


DocPruner: A Storage-Efficient Framework for Multi-Vector Visual Document Retrieval via Adaptive Patch-Level Embedding Pruning

arXiv.org Artificial Intelligence

Visual Document Retrieval (VDR), the task of retrieving visually-rich document pages using queries that combine visual and textual cues, is crucial for numerous real-world applications. Recent state-of-the-art methods leverage Large Vision-Language Models (LVLMs) in a multi-vector paradigm, representing each document as patch-level embeddings to capture fine-grained details. While highly effective, this approach introduces a critical challenge: prohibitive storage overhead, as storing hundreds of vectors per page makes large-scale deployment costly and impractical. To address this, we introduce DocPruner, the first framework to employ adaptive patch-level embedding pruning for VDR to effectively reduce the storage overhead. DocPruner leverages the intra-document patch attention distribution to dynamically identify and discard redundant embeddings for each document. This adaptive mechanism enables a significant 50-60% reduction in storage for leading multi-vector VDR models with negligible degradation in document retrieval performance. Extensive experiments across more than ten representative datasets validate that DocPruner offers a robust, flexible, and effective solution for building storage-efficient, large-scale VDR systems.


Optimizing Privacy-Preserving Primitives to Support LLM-Scale Applications

arXiv.org Artificial Intelligence

Privacy-preserving technologies have introduced a paradigm shift that allows for realizable secure computing in real-world systems. The significant barrier to the practical adoption of these primitives is the computational and communication overhead that is incurred when applied at scale. In this paper, we present an overview of our efforts to bridge the gap between this overhead and practicality for privacy-preserving learning systems using multi-party computation (MPC), zero-knowledge proofs (ZKPs), and fully homomorphic encryption (FHE). Through meticulous hardware/software/algorithm co-design, we show progress towards enabling LLM-scale applications in privacy-preserving settings. We demonstrate the efficacy of our solutions in several contexts, including DNN IP ownership, ethical LLM usage enforcement, and transformer inference.


Large Language Models for Software Testing: A Research Roadmap

arXiv.org Artificial Intelligence

Large Language Models (LLMs) are starting to be profiled as one of the most significant disruptions in the Software Testing field. Specifically, they have been successfully applied in software testing tasks such as generating test code, or summarizing documentation. This potential has attracted hundreds of researchers, resulting in dozens of new contributions every month, hardening researchers to stay at the forefront of the wave. Still, to the best of our knowledge, no prior work has provided a structured vision of the progress and most relevant research trends in LLM-based testing. In this article, we aim to provide a roadmap that illustrates its current state, grouping the contributions into different categories, and also sketching the most promising and active research directions for the field. To achieve this objective, we have conducted a semi-systematic literature review, collecting articles and mapping them into the most prominent categories, reviewing the current and ongoing status, and analyzing the open challenges of LLM-based software testing. Lastly, we have outlined several expected long-term impacts of LLMs over the whole software testing field.


KIRETT -- A wearable device to support rescue operations using artificial intelligence to improve first aid

arXiv.org Artificial Intelligence

This short paper presents first steps in the scientific part of the KIRETT project, which aims to improve first aid during rescue operations using a wearable device. The wearable is used for computer-aided situation recognition by means of artificial intelligence. It provides contextual recommendations for actions and operations to rescue personnel and is intended to minimize damage to patients due to incorrect treatment, as well as increase the probability of survival. The paper describes a first overview of research approaches within the project.


RDD: Pareto Analysis of the Rate-Distortion-Distinguishability Trade-off

arXiv.org Artificial Intelligence

Extensive monitoring systems generate data that is usually compressed for network transmission. This compressed data might then be processed in the cloud for tasks such as anomaly detection. However, compression can potentially impair the detector's ability to distinguish between regular and irregular patterns due to information loss. Here we extend the information-theoretic framework introduced in [1] to simultaneously address the trade-off between the three features on which the effectiveness of the system depends: the effectiveness of compression, the amount of distortion it introduces, and the distinguishability between compressed normal signals and compressed anomalous signals. We leverage a Gaussian assumption to draw curves showing how moving on a Pareto surface helps administer such a trade-off better than simply relying on optimal rate-distortion compression and hoping that compressed signals can be distinguished from each other.