AITopics | zero-shot hoi detection

Visual Diversity and Region-aware Prompt Learning for Zero-shot HOIDetection

Neural Information Processing SystemsJun-21-2026, 18:55:13 GMT

Zero-shot Human-Object Interaction detection aims to localize humans and objects in an image and recognize their interaction, even when specific verb-object pairs are unseen during training. Recent works have shown promising results using prompt learning with pretrained vision-language models such as CLIP, which align natural language prompts with visual features in a shared embedding space. However, existing approaches still fail to handle the visual complexity of interaction--including (1) intra-class visual diversity, where instances of the same verb appear in diverse poses and contexts, and (2) inter-class visual entanglement, where distinct verbs yield visually similar patterns. To address these challenges, we propose VDRP, a framework for Visual Diversity and Region-aware Prompt learning. First, we introduce a visual diversity-aware prompt learning strategy that injects group-wise visual variance into the context embedding.

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

8fd5bc08e744fe0dfe798c61d1575a22-Paper-Conference.pdf

Neural Information Processing SystemsFeb-15-2026, 20:42:46 GMT

detection, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country:

Asia > China (0.04)
Africa > Central African Republic > Ombella-M'Poko > Bimbo (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

654f61ecd998c9095d30d42c03b832aa-Paper-Conference.pdf

Neural Information Processing SystemsFeb-15-2026, 11:55:39 GMT

detection, hoi class, proceedings, (13 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Asia > Singapore (0.04)
North America > United States > Mississippi (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

Add feedback

654f61ecd998c9095d30d42c03b832aa-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 04:39:16 GMT

detection, hoi class, proceedings, (13 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Asia > Singapore (0.04)
North America > United States > Mississippi (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

Add feedback

8fd5bc08e744fe0dfe798c61d1575a22-Paper-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 01:22:17 GMT

detection, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country:

Asia > China (0.04)
Africa > Central African Republic > Ombella-M'Poko > Bimbo (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

EZ-HOI: VLM Adaptation via Guided Prompt Learning for Zero-Shot HOI Detection

Neural Information Processing SystemsMay-27-2025, 03:42:31 GMT

Detecting Human-Object Interactions (HOI) in zero-shot settings, where models must handle unseen classes, poses significant challenges. Existing methods that rely on aligning visual encoders with large Vision-Language Models (VLMs) to tap into the extensive knowledge of VLMs, require large, computationally expensive models and encounter training difficulties. Adapting VLMs with prompt learning offers an alternative to direct alignment. However, fine-tuning on task-specific datasets often leads to overfitting to seen classes and suboptimal performance on unseen classes, due to the absence of unseen class labels. To address these challenges, we introduce a novel prompt learning-based framework for Efficient Zero-Shot HOI detection (EZ-HOI).

large language model, natural language, zero-shot hoi detection, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback