AITopics

Country: Asia > China (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry:

Information Technology (0.67)
Transportation > Ground > Road (0.66)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Neural Information Processing SystemsMar-21-2026, 21:51:59 GMT

ActFusion: a Unified Diffusion Model for Action Segmentation and Anticipation

Temporal action segmentation and long-term action anticipation are two popular vision tasks for the temporal analysis of actions in videos. Despite apparent relevance and potential complementarity, these two problems have been investigated as separate and distinct tasks. In this work, we tackle these two problems, action segmentation, and action anticipation, jointly using a unified diffusion model dubbed ActFusion. The key idea to unification is to train the model to effectively handle both visible and invisible parts of the sequence in an integrated manner;the visible part is for temporal segmentation, and the invisible part is for future anticipation. To this end, we introduce a new anticipative masking strategy during training in which a late part of the video frames is masked as invisible, and learnable tokens replace these frames to learn to predict the invisible future.Experimental results demonstrate the bi-directional benefits between action segmentation and anticipation.ActFusion achieves the state-of-the-art performance across the standard benchmarks of 50 Salads, Breakfast, and GTEA, outperforming task-specific models in both of the two tasks with a single unified model through joint learning.

artificial intelligence, machine learning, proceedings, (7 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.60)
Information Technology > Artificial Intelligence > Machine Learning (0.43)

Neural Information Processing SystemsFeb-17-2026, 03:58:12 GMT

ActFusion: a Unified Diffusion Model for Action Segmentation and Anticipation

Temporal action segmentation and long-term action anticipation are two popular vision tasks for the temporal analysis of actions in videos.

artificial intelligence, machine learning, natural language, (17 more...)

Country:

Europe > Germany > North Rhine-Westphalia > Cologne Region > Bonn (0.04)
Asia > South Korea > Gyeongsangbuk-do > Pohang (0.04)
Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
Africa > Central African Republic > Ombella-M'Poko > Bimbo (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry:

Education (0.67)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Neural Information Processing SystemsFeb-7-2026, 14:38:08 GMT

08a7229eaba3b35cd8f933ac678f2096-Paper-Datasets_and_Benchmarks_Track.pdf

Annotateddimensionsinclude Valence, Arousal and Emotion label, with annotations gathered using Prolific.

annotation, artificial intelligence, machine learning, (19 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > Western Europe (0.04)
(7 more...)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (0.68)
Information Technology > Artificial Intelligence > Vision (0.68)

arXiv.org Artificial IntelligenceDec-8-2025

Uni-Hand: Universal Hand Motion Forecasting in Egocentric Views

Ma, Junyi, Bao, Wentao, Xu, Jingyi, Sun, Guanzhong, Zheng, Yu, Zhang, Erhang, Chen, Xieyuanli, Wang, Hesheng

Forecasting how human hands move in egocentric views is critical for applications like augmented reality and human-robot policy transfer. Recently, several hand trajectory prediction (HTP) methods have been developed to generate future possible hand waypoints, which still suffer from insufficient prediction targets, inherent modality gaps, entangled hand-head motion, and limited validation in downstream tasks. To address these limitations, we present a universal hand motion forecasting framework considering multi-modal input, multi-dimensional and multi-target prediction patterns, and multi-task affordances for downstream applications. We harmonize multiple modalities by vision-language fusion, global context incorporation, and task-aware text embedding injection, to forecast hand waypoints in both 2D and 3D spaces. A novel dual-branch diffusion is proposed to concurrently predict human head and hand movements, capturing their motion synergy in egocentric vision. By introducing target indicators, the prediction model can forecast the specific joint waypoints of the wrist or the fingers, besides the widely studied hand center points. In addition, we enable Uni-Hand to additionally predict hand-object interaction states (contact/separation) to facilitate downstream tasks better. As the first work to incorporate downstream task evaluation in the literature, we build novel benchmarks to assess the real-world applicability of hand motion forecasting algorithms. The experimental results on multiple publicly available datasets and our newly proposed benchmarks demonstrate that Uni-Hand achieves the state-of-the-art performance in multi-dimensional and multi-target hand motion forecasting. Extensive validation in multiple downstream tasks also presents its impressive human-robot policy transfer to enable robotic manipulation, and effective feature enhancement for action anticipation/recognition.

artificial intelligence, diffusion, uni-hand, (15 more...)

2511.12878

Country: Asia (0.46)

Genre: Research Report (0.63)

Technology: Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.68)

arXiv.org Machine LearningDec-2-2025

Foundation Priors

Misra, Sanjog

Foundation models, and in particular large language models, can generate highly informative responses, prompting growing interest in using these ''synthetic'' outputs as data in empirical research and decision-making. This paper introduces the idea of a foundation prior, which shows that model-generated outputs are not as real observations, but draws from the foundation prior induced prior predictive distribution. As such synthetic data reflects both the model's learned patterns and the user's subjective priors, expectations, and biases. We model the subjectivity of the generative process by making explicit the dependence of synthetic outputs on the user's anticipated data distribution, the prompt-engineering process, and the trust placed in the foundation model. We derive the foundation prior as an exponential-tilted, generalized Bayesian update of the user's primitive prior, where a trust parameter governs the weight assigned to synthetic data. We then show how synthetic data and the associated foundation prior can be incorporated into standard statistical and econometric workflows, and discuss their use in applications such as refining complex models, informing latent constructs, guiding experimental design, and augmenting random-coefficient and partially linear specifications. By treating generative outputs as structured, explicitly subjective priors rather than as empirical observations, the framework offers a principled way to harness foundation models in empirical work while avoiding the conflation of synthetic ''facts'' with real data.

foundation model, real data, synthetic data, (16 more...)

arXiv.org Machine Learning

2512.01107

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)

arXiv.org Artificial IntelligenceNov-18-2025

Intention-Guided Cognitive Reasoning for Egocentric Long-Term Action Anticipation

Chu, Qiaohui, Zhang, Haoyu, Liu, Meng, Feng, Yisen, Shi, Haoxiang, Nie, Liqiang

Long-term action anticipation from egocentric video is critical for applications such as human-computer interaction and assistive technologies, where anticipating user intent enables proactive and context-aware AI assistance. However, existing approaches suffer from three key limitations: 1) un-derutilization of fine-grained visual cues from hand-object interactions, 2) neglect of semantic dependencies between verbs and nouns, and 3) lack of explicit cognitive reasoning, limiting generalization and long-term forecasting ability. To overcome these challenges, we propose INSIGHT, a unified two-stage framework for egocentric action anticipation. In the first stage, INSIGHT focuses on extracting semantically rich features from hand-object interaction regions and enhances action representations using a verb-noun co-occurrence matrix. In the second stage, it introduces a reinforcement learning-based module that simulates explicit cognitive reasoning through a structured process: visual perception (think) intention inference (reason) action anticipation (answer). Extensive experiments on Ego4D, EPIC-Kitchens-55, and EGTEA Gaze+ benchmarks show that INSIGHT achieves state-of-the-art performance, demonstrating its effectiveness and strong generalization capability. Introduction In real-world applications such as human-computer interaction (Azam and Desai 2024; Plizzari et al. 2024), augmented reality (Abreu et al. 2024; Xu et al. 2024), and assistive systems for visually impaired individuals (Lee et al. 2024; Xiao et al. 2025), AI agents must accurately interpret user intent and demonstrate effective long-term planning capabilities within egocentric vision scenarios.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

2508.01742

Country: Asia > China (0.46)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(3 more...)

arXiv.org Artificial IntelligenceNov-13-2025

Predict and Resist: Long-Term Accident Anticipation under Sensor Noise

Liu, Xingcheng, Rao, Bin, Guan, Yanchen, Wang, Chengyue, Liao, Haicheng, Zhang, Jiaxun, Lin, Chengyu, Zhu, Meixin, Li, Zhenning

Accident anticipation is essential for proactive and safe autonomous driving, where even a brief advance warning can enable critical evasive actions. However, two key challenges hinder real-world deployment: (1) noisy or degraded sensory inputs from weather, motion blur, or hardware limitations, and (2) the need to issue timely yet reliable predictions that balance early alerts with false-alarm suppression. We propose a unified framework that integrates diffusion-based denoising with a time-aware actor-critic model to address these challenges. The diffusion module reconstructs noise-resilient image and object features through iterative refinement, preserving critical motion and interaction cues under sensor degradation. In parallel, the actor-critic architecture leverages long-horizon temporal reasoning and time-weighted rewards to determine the optimal moment to raise an alert, aligning early detection with reliability. Experiments on three benchmark datasets (DAD, CCD, A3D) demonstrate state-of-the-art accuracy and significant gains in mean time-to-accident, while maintaining robust performance under Gaussian and impulse noise. Qualitative analyses further show that our model produces earlier, more stable, and human-aligned predictions in both routine and highly complex traffic scenarios, highlighting its potential for real-world, safety-critical deployment.

anticipation, machine learning, reinforcement learning, (17 more...)

2511.0864

Country: Asia > China (0.28)

Genre: Research Report (0.64)

Industry:

Information Technology (1.00)
Transportation > Ground > Road (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Vision (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
(2 more...)

arXiv.org Artificial IntelligenceNov-11-2025

ROAR: Robust Accident Recognition and Anticipation for Autonomous Driving

Liu, Xingcheng, Guan, Yanchen, Liao, Haicheng, He, Zhengbing, Li, Zhenning

Accurate accident anticipation is essential for enhancing the safety of autonomous vehicles (A Vs). However, existing methods often assume ideal conditions, overlooking challenges such as sensor failures, environmental disturbances, and data imperfections, which can significantly degrade prediction accuracy. Additionally, previous models have not adequately addressed the considerable variability in driver behavior and accident rates across different vehicle types. To overcome these limitations, this study introduces ROAR, a novel approach for accident detection and prediction. ROAR combines Discrete Wavelet Transform (DWT), a self-adaptive object-aware module, and dynamic focal loss to tackle these challenges. The DWT effectively extracts features from noisy and incomplete data, while the object-aware module improves accident prediction by focusing on high-risk vehicles and modeling the spatial-temporal relationships among traffic agents. Evaluated on three widely used datasets--Dashcam Accident Dataset (DAD), Car Crash Dataset (CCD), and AnAn Accident Detection (A3D)--our model consistently outperforms existing baselines in key metrics such as Average Precision (AP) and mean Time-to-Accident (mTT A). These results demonstrate the model's robustness in real-world conditions, particularly in handling sensor degradation, environmental noise, and imbalanced data distributions. This work offers a promising solution for reliable and accurate accident anticipation in complex traffic environments. INTRODUCTION Traffic accidents are a persistent global issue, causing significant harm to both individuals and society. With the rise of autonomous driving, the need to proactively address this challenge has never been more pressing [1].

artificial intelligence, data quality, machine learning, (19 more...)