EfficientSAM3: Progressive Hierarchical Distillation for Video Concept Segmentation from SAM1, 2, and 3
Zeng, Chengxi, Jiang, Yuxuan, Zhang, Aaron
The Segment Anything Model 3 (SAM3) advances visual understanding with Promptable Concept Segmentation (PCS) across images and videos, but its unified architecture (shared vision backbone, DETR-style detector, dense-memory tracker) remains prohibitive for on-device use. W e present EfficientSAM3, a family of efficient models built on Progressive Hierarchical Distillation (PHD) that transfers capability from SAM3 to lightweight students in three stages: (1) Encoder Distillation aligns image features via prompt-in-the-loop training on SA-1B; (2) Temporal Memory Distillation replaces dense memory with a compact Perceiver-based module trained on SA-V to compress and retrieve spatiotemporal features efficiently; and (3) End-to-End Fine-Tuning refines the full pipeline on the official SAM3 PCS data to preserve concept-level performance. PHD yields a spectrum of student variants using RepViT, TinyViT, and EfficientViT backbones, enabling on-device concept segmentation and tracking while maintaining high fidelity to teacher behavior . W e benchmark on popular VOS datasets, and compare with varies of releated work, achieing strong performance-efficiency trade-offs.
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Sensing and Signal Processing > Image Processing (0.88)
- Information Technology > Artificial Intelligence > Natural Language (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
I've Changed My Mind: Robots Adapting to Changing Human Goals during Collaboration
Ghose, Debasmita, Gitelson, Oz, Jin, Ryan, Abawe, Grace, Vazquez, Marynel, Scassellati, Brian
I've Changed My Mind: Robots Adapting to Changing Human Goals during Collaboration Abstract --For effective human-robot collaboration, a robot must align its actions with human goals, even as they change mid-task. Prior approaches often assume fixed goals, reducing goal prediction to a one-time inference. However, in real-world scenarios, humans frequently shift goals, making it challenging for robots to adapt without explicit communication. We propose a method for detecting goal changes by tracking multiple candidate action sequences and verifying their plausibility against a policy bank. Upon detecting a change, the robot refines its belief in relevant past actions and constructs Receding Horizon Planning (RHP) trees to actively select actions that assist the human while encouraging Differentiating Actions to reveal their updated goal. We evaluate our approach in a collaborative cooking environment with up to 30 unique recipes and compare it to three comparable human goal prediction algorithms. Our method outperforms all baselines, quickly converging to the correct goal after a switch, reducing task completion time and improving collaboration efficiency. N real-world scenarios, humans often change their goals in response to evolving circumstances, new information, or spontaneous decisions. Previous work often addresses changing human goals by relying on explicit communication [1], [2], [3]. While effective, relying on communication assumes humans can and will communicate with the robot, which is often impractical due to physical, situational, or cognitive constraints [4], [5], [6], [7], [8].
Detecting Sleeper Agents in Large Language Models via Semantic Drift Analysis
Zanbaghi, Shahin, Rostampour, Ryan, Abid, Farhan, Jarmakani, Salim Al
Large Language Models (LLMs) can be backdoored to exhibit malicious behavior under specific deployment conditions while appearing safe during training a phenomenon known as "sleeper agents." Recent work by Hubinger et al. demonstrated that these backdoors persist through safety training, yet no practical detection methods exist. We present a novel dual-method detection system combining semantic drift analysis with canary baseline comparison to identify backdoored LLMs in real-time. Our approach uses Sentence-BERT embeddings to measure semantic deviation from safe baselines, complemented by injected canary questions that monitor response consistency. Evaluated on the official Cadenza-Labs dolphin-llama3-8B sleeper agent model, our system achieves 92.5% accuracy with 100% precision (zero false positives) and 85% recall. The combined detection method operates in real-time (<1s per query), requires no model modification, and provides the first practical solution to LLM backdoor detection. Our work addresses a critical security gap in AI deployment and demonstrates that embedding-based detection can effectively identify deceptive model behavior without sacrificing deployment efficiency.
- North America > Canada > Ontario > Essex County > Windsor (0.04)
- Europe > France (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.96)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)
FOOTPASS: A Multi-Modal Multi-Agent Tactical Context Dataset for Play-by-Play Action Spotting in Soccer Broadcast Videos
Ochin, Jeremie, Chekroun, Raphael, Stanciulescu, Bogdan, Manitsaris, Sotiris
Soccer video understanding has motivated the creation of datasets for tasks such as temporal action localization, spatiotemporal action detection (STAD), or multiobject tracking (MOT). The annotation of structured sequences of events (who does what, when, and where) used for soccer analytics requires a holistic approach that integrates both STAD and MOT. However, current action recognition methods remain insufficient for constructing reliable play-by-play data and are typically used to assist rather than fully automate annotation. Parallel research has advanced tactical modeling, trajectory forecasting, and performance analysis, all grounded in game-state and play-by-play data. This motivates leveraging tactical knowledge as a prior to support computer-vision-based predictions, enabling more automated and reliable extraction of play-by-play data. We introduce Footovision Play-by-Play Action Spotting in Soccer Dataset (FOOTPASS), the first benchmark for play-by-play action spotting over entire soccer matches in a multi-modal, multi-agent tactical context. It enables the development of methods for player-centric action spotting that exploit both outputs from computer-vision tasks (e.g., tracking, identification) and prior knowledge of soccer, including its tactical regularities over long time horizons, to generate reliable play-by-play data streams. These streams form an essential input for data-driven sports analytics.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > Mexico > Gulf of Mexico (0.04)
- (8 more...)
- Leisure & Entertainment > Sports > Soccer (1.00)
- Leisure & Entertainment > Sports > Football (1.00)
CARE-RAG - Clinical Assessment and Reasoning in RAG
Potluri, Deepthi, Mathew, Aby Mammen, DeWitt, Jeffrey B, Rasgon, Alexander L., Hao, Yide, Hong, Junyuan, Ding, Ying
Access to the right evidence does not guarantee that large language models (LLMs) will reason with it correctly. This gap between retrieval and reasoning is especially concerning in clinical settings, where outputs must align with structured protocols. We study this gap using Written Exposure Therapy (WET) guidelines as a testbed. In evaluating model responses to curated clinician-vetted questions, we find that errors persist even when authoritative passages are provided. To address this, we propose an evaluation framework that measures accuracy, consistency, and fidelity of reasoning. Our results highlight both the potential and the risks: retrieval-augmented generation (RAG) can constrain outputs, but safe deployment requires assessing reasoning as rigorously as retrieval.
- Europe > Austria > Vienna (0.14)
- North America > United States > Texas > Travis County > Austin (0.05)
- North America > United States > Michigan (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Mind the Motions: Benchmarking Theory-of-Mind in Everyday Body Language
Lee, Seungbeen, Jeong, Jinhong, Kim, Donghyun, Son, Yejin, Yu, Youngjae
Our ability to interpret others' mental states through nonverbal cues (NVCs) is fundamental to our survival and social cohesion. While existing Theory of Mind (ToM) benchmarks have primarily focused on false-belief tasks and reasoning with asymmetric information, they overlook other mental states beyond belief and the rich tapestry of human nonverbal communication. We present Motion2Mind, a framework for evaluating the ToM capabilities of machines in interpreting NVCs. Leveraging an expert-curated body-language reference as a proxy knowledge base, we build Motion2Mind, a carefully curated video dataset with fine-grained nonverbal cue annotations paired with manually verified psychological interpretations. It encompasses 222 types of nonverbal cues and 397 mind states. Our evaluation reveals that current AI systems struggle significantly with NVC interpretation, exhibiting not only a substantial performance gap in Detection, as well as patterns of over-interpretation in Explanation compared to human annotators.
- Europe > Monaco (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (3 more...)
- Health & Medicine > Therapeutic Area (1.00)
- Information Technology (0.93)
- Health & Medicine > Diagnostic Medicine > Imaging (0.46)
An Agent-Based Framework for the Automatic Validation of Mathematical Optimization Models
Zadorojniy, Alexander, Wasserkrug, Segev, Farchi, Eitan
Recently, using Large Language Models (LLMs) to generate optimization models from natural language descriptions has became increasingly popular. However, a major open question is how to validate that the generated models are correct and satisfy the requirements defined in the natural language description. In this work, we propose a novel agent-based method for automatic validation of optimization models that builds upon and extends methods from software testing to address optimization modeling . This method consists of several agents that initially generate a problem-level testing API, then generate tests utilizing this API, and, lastly, generate mutations specific to the optimization model (a well-known software testing technique assessing the fault detection power of the test suite). In this work, we detail this validation framework and show, through experiments, the high quality of validation provided by this agent ensemble in terms of the well-known software testing measure called mutation coverage.
- North America > United States > Oregon > Washington County > Beaverton (0.04)
- North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
Formal Models and Convergence Analysis for Context-Aware Security Verification
Traditional security scanners fail when facing new attack patterns they haven't seen before. They rely on fixed rules and predetermined signatures, making them blind to novel threats. We present a fundamentally different approach: instead of memorizing specific attack patterns, we learn what makes systems genuinely secure. Our key insight is simple yet powerful: context determines vulnerability. A SQL query that's safe in one environment becomes dangerous in another. By modeling this context-vulnerability relationship, we achieve something remarkable: our system detects attacks it has never seen before. We introduce context-aware verification that learns from genuine system behavior. Through reconstruction learning on secure systems, we capture their essential characteristics. When an unknown attack deviates from these patterns, our system recognizes it, even without prior knowledge of that specific attack type. We prove this capability theoretically, showing detection rates improve exponentially with context information I(W;C). Our framework combines three components: (1) reconstruction learning that models secure behavior, (2) multi-scale graph reasoning that aggregates contextual clues, and (3) attention mechanisms guided by reconstruction differences. Extensive experiments validate our approach: detection accuracy jumps from 58 percent to 82 percent with full context, unknown attack detection improves by 31 percent, and our system maintains above 90 percent accuracy even against completely novel attack vectors.
Machine Learning Epidemic Predictions Using Agent-based Wireless Sensor Network Models
Nwokoye, Chukwunonso Henry, Oluchi, Blessing, Waldron, Sharna, Ezzeh, Peace
Given Name Surname line 2: dept. Abstract -- The lack of epidemiological data in wireless sensor networks (WSNs) is a fundamental difficulty in constructing robust models to forecast and mitigate threats like viruses and worms. Many studies have looked at different epidemic models for WSNs, focusing on the manner in which malware infections spread given the network's specific properties, including energy limits and node mobili ty. In this study, an agent - based realization of the susceptible - exposed - infected - recovered - vaccinated (SEIRV) mathematical model was employed for machine learning (ML) predictions. Using tools such as Netlogo's BehaviorSpace and Python, two epidemic synth etic datasets were generated and prepared for the application of several ML algorithms. Posed as a regression problem, the infected and recovered nodes were predicted, and the performance of these algorithms is compared using the error metrics of the train and the test sets. The predictions performed quite well, with low error metrics and high R values (0.997, 1.000, 0.999, 1.000), indicating an effective fit to the training set. The validation values were lowered (0.992, 0.998, 0.971, and 0.999), as is ty pical when evaluating model performance on unknown data. Judging from the recorded performances, support vector, linear, Lasso, Ridge, and ElasticNet regression were among the worst performing algorithms, while Random Forest, XGBoost, Decision Trees, and K nearest neighbor had the best model performances. In recent years, the globe as we know it has been changing due to bre akthroughs in numerous linked innovations including smart electrical grids [1], the IoT, long - term evolution, 5G connectivity [2] and cyber physical systems [3] such as wireless sensor networks (WSN).
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- North America > Canada > Ontario > Durham Region > Oshawa (0.04)
- (2 more...)
HybSpecNet: A Critical Analysis of Architectural Instability in Hybrid-Domain Spectral GNNs
Spectral Graph Neural Networks offer a principled approach to graph filtering but face a fundamental "Stability-vs-Adaptivity" trade-off. This trade-off is dictated by the choice of spectral domain. Filters in the finite [-1, 1] domain (e.g., ChebyNet) are numerically stable at high polynomial degrees (K) but are static and low-pass, causing them to fail on heterophilic graphs. Conversely, filters in the semi-infinite [0, infty) domain (e.g., KrawtchoukNet) are highly adaptive and achieve SOTA results on heterophily by learning non-low-pass responses. However, as we demonstrate, these adaptive filters can also suffer from numerical instability, leading to catastrophic performance collapse at high K. In this paper, we propose to resolve this trade-off by designing a hybrid-domain GNN, HybSpecNet, which combines a stable `ChebyNet` branch with an adaptive `KrawtchoukNet` branch. We first demonstrate that a "naive" hybrid architecture, which fuses the branches via concatenation, successfully unifies performance at low K, achieving strong results on both homophilic and heterophilic benchmarks. However, we then prove that this naive architecture fails the stability test. Our K-ablation experiments show that this architecture catastrophically collapses at K=25, exactly mirroring the collapse of its unstable `KrawtchoukNet` branch. We identify this critical finding as "Instability Poisoning," where `NaN`/`Inf` gradients from the adaptive branch destroy the training of the model. Finally, we propose and validate an advanced architecture that uses "Late Fusion" to completely isolate the gradient pathways. We demonstrate that this successfully solves the instability problem, remaining perfectly stable up to K=30 while retaining its SOTA performance across all graph types. This work identifies a critical architectural pitfall in hybrid GNN design and provides the robust architectural solution.
- North America > United States > Wisconsin (0.05)
- North America > United States > Texas (0.05)
- Asia > Middle East > Republic of Türkiye > Antalya Province > Antalya (0.04)