hist
Diffuse to Detect: A Generalizable Framework for Anomaly Detection with Diffusion Models Applications to UAVs and Beyond
Gong, Mingze, Du, Juan, You, Jianbang
Anomaly detection in complex, high-dimensional data, such as UA V sensor readings, is essential for operational safety but challenging for existing methods due to their limited sensitivity, scalability, and inability to capture intricate dependencies. We propose the Diffuse to Detect (DTD) framework, a novel approach that innovatively adapts diffusion models for anomaly detection, diverging from their conventional use in generative tasks with high inference time. By comparison, DTD employs a single-step diffusion process to predict noise patterns, enabling rapid and precise identification of anomalies without reconstruction errors. This approach is grounded in robust theoretical foundations that link noise prediction to the data distribution's score function, ensuring reliable deviation detection. By integrating Graph Neural Networks to model sensor relationships as dynamic graphs, DTD effectively captures spatial (inter-sensor) and temporal anomalies. Its two-branch architecture, with parametric neural network-based energy scoring for scalability and nonparametric statistical methods for interpretability, provides flexible trade-offs between computational efficiency and transparency. Extensive evaluations on UA V sensor data, multivariate time series, and images demonstrate DTD's superior performance over existing methods, underscoring its generality across diverse data modalities. This versatility, combined with its adaptability, positions DTD as a transformative solution for safety-critical applications, including industrial monitoring and beyond.
- Asia > China > Hong Kong (0.04)
- Asia > China > Guangdong Province > Guangzhou (0.04)
- North America > United States > Tennessee > Davidson County > Nashville (0.04)
- North America > United States > Alaska > Anchorage Municipality > Anchorage (0.04)
that not all comments are replied here and our replies have to be short due to space, but they'll be fully addressed in a
We appreciate the valuable comments, which urged us to embody explicit connections to practices of learning. We plead a reconsideration based on the improvement, as our contribution is truly innovative and nontrivial. Re: connection to learning, and when Cond.1&2 hold. Fig.1 shows large LR again produces stochasticity as our paper studies. Cond.1&2 use auxiliary random variables to define the needed f Re: weaken isotropic noise assumption?
Dependency Chain Analysis of ROS 2 DDS QoS Policies: From Lifecycle Tutorial to Static Verification
Lee, Sanghoon, Kang, Junha, Park, Kyung-Joon
Robot Operating System 2 (ROS 2) relies on the Data Distribution Service (DDS), which offers more than 20 Quality of Service (QoS) policies governing availability, reliability, and resource usage. Yet ROS 2 users lack clear guidance on safe policy combinations and validation processes prior to deployment, which often leads to trial-and-error tuning and unexpected runtime failures. To address these challenges, we analyze DDS Publisher-Subscriber communication over a life cycle divided into Discovery, Data Exchange, and Disassociation, and provide a user oriented tutorial explaining how 16 QoS policies operate in each phase. Building on this analysis, we derive a QoS dependency chain that formalizes inter-policy relationships and classifies 41 dependency violation rules, capturing constraints that commonly cause communication failures in practice. Finally, we introduce QoS Guard, a ROS 2 package that statically validates DDS XML profiles offline, flags conflicts, and enables safe, predeployment tuning without establishing a live ROS 2 session. Together, these contributions give ROS 2 users both conceptual insight and a concrete tool that enables early detection of misconfigurations, improving the reliability and resource efficiency of ROS 2 based robotic systems.
SPGrasp: Spatiotemporal Prompt-driven Grasp Synthesis in Dynamic Scenes
Mei, Yunpeng, Cao, Hongjie, Xia, Yinqiu, Xiao, Wei, Feng, Zhaohan, Wang, Gang, Chen, Jie
Real-time interactive grasp synthesis for dynamic objects remains challenging as existing methods fail to achieve low-latency inference while maintaining promptability. To bridge this gap, we propose SPGrasp (spatiotemporal prompt-driven dynamic grasp synthesis), a novel framework extending segment anything model v2 (SAMv2) for video stream grasp estimation. Our core innovation integrates user prompts with spatiotemporal context, enabling real-time interaction with end-to-end latency as low as 59 ms while ensuring temporal consistency for dynamic objects. In benchmark evaluations, SPGrasp achieves instance-level grasp accuracies of 90.6% on OCID and 93.8% on Jacquard. On the challenging GraspNet-1Billion dataset under continuous tracking, SPGrasp achieves 92.0% accuracy with 73.1 ms per-frame latency, representing a 58.5% reduction compared to the prior state-of-the-art promptable method RoG-SAM while maintaining competitive accuracy. Real-world experiments involving 13 moving objects demonstrate a 94.8% success rate in interactive grasping scenarios. These results confirm SPGrasp effectively resolves the latency-interactivity trade-off in dynamic grasp synthesis.
LUST: A Multi-Modal Framework with Hierarchical LLM-based Scoring for Learned Thematic Significance Tracking in Multimedia Content
This paper introduces the Learned User Significance Tracker (LUST), a framework designed to analyze video content and quantify the thematic relevance of its segments in relation to a user-provided textual description of significance. LUST leverages a multi-modal analytical pipeline, integrating visual cues from video frames with textual information extracted via Automatic Speech Recognition (ASR) from the audio track. The core innovation lies in a hierarchical, two-stage relevance scoring mechanism employing Large Language Models (LLMs). An initial "direct relevance" score, $S_{d,i}$, assesses individual segments based on immediate visual and auditory content against the theme. This is followed by a "contextual relevance" score, $S_{c,i}$, that refines the assessment by incorporating the temporal progression of preceding thematic scores, allowing the model to understand evolving narratives. The LUST framework aims to provide a nuanced, temporally-aware measure of user-defined significance, outputting an annotated video with visualized relevance scores and comprehensive analytical logs.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > Germany > Bavaria > Upper Bavaria > Ingolstadt (0.05)
- Overview (0.68)
- Research Report (0.50)
Laplace-Net: Learning Dynamical Systems with External Forcing
Zimmering, Bernd, Coelho, Cecília, Gupta, Vaibhav, Maleshkova, Maria, Niggemann, Oliver
Modelling forced dynamical systems - where an external input drives the system state - is critical across diverse domains such as engineering, finance, and the natural sciences. In this work, we propose Laplace-Net, a decoupled, solver-free neural framework for learning forced and delay-aware systems. It leverages a Laplace transform-based approach to decompose internal dynamics, external inputs, and initial values into established theoretical concepts, enhancing interpretability. Laplace-Net promotes transferability since the system can be rapidly re-trained or fine-tuned for new forcing signals, providing flexibility in applications ranging from controller adaptation to long-horizon forecasting. Experimental results on eight benchmark datasets - including linear, non-linear, and delayed systems - demonstrate the method's improved accuracy and robustness compared to state-of-the-art approaches, particularly in handling complex and previously unseen inputs.
- Europe > Germany > Hamburg (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- South America > Colombia (0.04)
- (2 more...)
A Linear Theory of Multi-Winner Voting
We introduces a general linear framework that unifies the study of multi-winner voting rules and proportionality axioms, demonstrating that many prominent multi-winner voting rules-including Thiele methods, their sequential variants, and approval-based committee scoring rules-are linear. Similarly, key proportionality axioms such as Justified Representation (JR), Extended JR (EJR), and their strengthened variants (PJR+, EJR+), along with core stability, can fit within this linear structure as well. Leveraging PAC learning theory, we establish general and novel upper bounds on the sample complexity of learning linear mappings. Our approach yields near-optimal guarantees for diverse classes of rules, including Thiele methods and ordered weighted average rules, and can be applied to analyze the sample complexity of learning proportionality axioms such as approximate core stability. Furthermore, the linear structure allows us to leverage prior work to extend our analysis beyond worst-case scenarios to study the likelihood of various properties of linear rules and axioms. We introduce a broad class of distributions that extend Impartial Culture for approval preferences, and show that under these distributions, with high probability, any Thiele method is resolute, CORE is non-empty, and any Thiele method satisfies CORE, among other observations on the likelihood of commonly-studied properties in social choice. We believe that this linear theory offers a new perspective and powerful new tools for designing and analyzing multi-winner rules in modern social choice applications.