AITopics | human-object interaction detection

Detecting Any Human-Object Interaction Relationship: Universal HOIDetector with Spatial Prompt Learning on Foundation Models

Neural Information Processing SystemsApr-24-2026, 05:31:16 GMT

Human-object interaction (HOI) detection aims to comprehend the intricate relationships between humans and objects, predicting < human,action,object >triplets, and serving as the foundation for numerous computer vision tasks. The complexity and diversity of human-object interactions in the real world, however, pose significant challenges for both annotation and recognition, particularly in recognizing interactions within an open world context. This study explores the universal interaction recognition in an open-world setting through the use of Vision-Language (VL) foundation models and large language models (LLMs). The proposed method is dubbed as UniHOI. We conduct a deep analysis of the three hierarchical features inherent in visual HOI detectors and propose a method for high-level relation extraction aimed at VL foundation models, which we call HO prompt-based learning. Our design includes an HOPrompt-guided Decoder (HOPD), facilitates the association of high-level relation representations in the foundation model with various HO pairs within the image. Furthermore, we utilize a LLM (i.e.

large language model, machine learning, natural language, (13 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Sports (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

42b7c2f6d320d1fe1afa899a6319d6d7-Paper-Conference.pdf

Neural Information Processing SystemsFeb-19-2026, 08:36:42 GMT

detection, human-object interaction detection, interaction, (14 more...)

Neural Information Processing Systems

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > Middle East > Jordan (0.04)
Asia > India > Karnataka > Bengaluru (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

c2e065133af98888ab11a549abed2cc3-Paper-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 23:41:42 GMT

large language model, machine learning, natural language, (15 more...)

Neural Information Processing Systems

Country: Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
(3 more...)

Add feedback

8fd5bc08e744fe0dfe798c61d1575a22-Paper-Conference.pdf

Neural Information Processing SystemsFeb-15-2026, 20:42:46 GMT

detection, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country:

Asia > China (0.04)
Africa > Central African Republic > Ombella-M'Poko > Bimbo (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

9415416201aa201902d1743c7e65787b-Paper-Conference.pdf

Neural Information Processing SystemsFeb-10-2026, 20:27:13 GMT

detection, human-object interaction detection, transformer, (13 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.05)
Asia > India > Karnataka > Bengaluru (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
(2 more...)

Add feedback

2a54def490213ee10631b991c5acc6b5-Paper-Conference.pdf

Neural Information Processing SystemsFeb-10-2026, 00:55:41 GMT

detection, diffusion model, human-object interaction detection, (14 more...)

Neural Information Processing Systems

Country: Asia > China > Shaanxi Province > Xi'an (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Detecting Any Human-Object Interaction Relationship: Universal HOI Detector with Spatial Prompt Learning on Foundation Models

Neural Information Processing SystemsDec-27-2025, 16:35:02 GMT

The proposed method is dubbed as UniHOI .

detection, interaction, proceedings, (8 more...)

Neural Information Processing Systems

Country: Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

RLIP: Relational Language-Image Pre-training for Human-Object Interaction Detection

Neural Information Processing SystemsDec-25-2025, 17:07:03 GMT

Prior work has demonstrated the benefits of effective architecture design and integration of relevant cues for more accurate HOI detection. However, the design of an appropriate pre-training strategy for this task remains underexplored by existing approaches. To address this gap, we propose $\textit{Relational Language-Image Pre-training}$ (RLIP), a strategy for contrastive pre-training that leverages both entity and relation descriptions. To make effective use of such pre-training, we make three technical contributions: (1) a new $\textbf{Par}$allel entity detection and $\textbf{Se}$quential relation inference (ParSe) architecture that enables the use of both entity and relation descriptions during holistically optimized pre-training; (2) a synthetic data generation framework, Label Sequence Extension, that expands the scale of language data available within each minibatch; (3) ambiguity-suppression mechanisms, Relation Quality Labels and Relation Pseudo-Labels, to mitigate the influence of ambiguous/noisy samples in the pre-training data. Through extensive experiments, we demonstrate the benefits of these contributions, collectively termed RLIP-ParSe, for improved zero-shot, few-shot and fine-tuning HOI detection performance as well as increased robustness to learning from noisy annotations.

human-object interaction detection, name change, relational language-image pre-training, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Mitigating Long-Tail Bias in HOI Detection via Adaptive Diversity Cache

Jiang, Yuqiu, Qiao, Xiaozhen, Mei, Tianyu, Huang, Haojian, Chen, Yifan, Zheng, Ye, Sun, Zhe

arXiv.org Artificial IntelligenceNov-25-2025

Human-Object Interaction (HOI) detection is a fundamental task in computer vision, empowering machines to comprehend human-object relationships in diverse real-world scenarios. Recent advances in VLMs have significantly improved HOI detection by leveraging rich cross-modal representations. However, most existing VLM-based approaches rely heavily on additional training or prompt tuning, resulting in substantial computational overhead and limited scalability, particularly in long-tailed scenarios where rare interactions are severely underrepresented. In this paper, we propose the Adaptive Diversity Cache (ADC) module, a novel training-free and plug-and-play mechanism designed to mitigate long-tail bias in HOI detection. ADC constructs class-specific caches that accumulate high-confidence and diverse feature representations during inference. The method incorporates frequency-aware cache adaptation that favors rare categories and is designed to enable robust prediction calibration without requiring additional training or fine-tuning. Extensive experiments on HICO-DET and V-COCO datasets show that ADC consistently improves existing HOI detectors, achieving up to +8.57\% mAP gain on rare categories and +4.39\% on the full dataset, demonstrating its effectiveness in mitigating long-tail bias while preserving overall performance.

detection, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2511.18811

Country: Asia > China (0.94)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

UniHOI: Unified Human-Object Interaction Understanding via Unified Token Space

Yang, Panqi, Jing, Haodong, Zheng, Nanning, Ma, Yongqiang

arXiv.org Artificial IntelligenceNov-20-2025

In the field of human-object interaction (HOI), detection and generation are two dual tasks that have traditionally been addressed separately, hindering the development of comprehensive interaction understanding. To address this, we propose UniHOI, which jointly models HOI detection and generation via a unified token space, thereby effectively promoting knowledge sharing and enhancing generalization. Specifically, we introduce a symmetric interaction-aware attention module and a unified semi-supervised learning paradigm, enabling effective bidirectional mapping between images and interaction semantics even under limited annotations. Extensive experiments demonstrate that UniHOI achieves state-of-the-art performance in both HOI detection and generation. Specifically, UniHOI improves accuracy by 4.9% on long-tailed HOI detection and boosts interaction metrics by 42.0% on open-vocabulary generation tasks.

artificial intelligence, hoi detection, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2511.15046

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.31)

Add feedback

Filters

Collaborating Authors

human-object interaction detection

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Detecting Any Human-Object Interaction Relationship: Universal HOIDetector with Spatial Prompt Learning on Foundation Models

42b7c2f6d320d1fe1afa899a6319d6d7-Paper-Conference.pdf

c2e065133af98888ab11a549abed2cc3-Paper-Conference.pdf

8fd5bc08e744fe0dfe798c61d1575a22-Paper-Conference.pdf

9415416201aa201902d1743c7e65787b-Paper-Conference.pdf

2a54def490213ee10631b991c5acc6b5-Paper-Conference.pdf

Detecting Any Human-Object Interaction Relationship: Universal HOI Detector with Spatial Prompt Learning on Foundation Models

RLIP: Relational Language-Image Pre-training for Human-Object Interaction Detection

Mitigating Long-Tail Bias in HOI Detection via Adaptive Diversity Cache

UniHOI: Unified Human-Object Interaction Understanding via Unified Token Space