AITopics | unseen

Collaborating Authors

unseen

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Visual Diversity and Region-aware Prompt Learning for Zero-shot HOIDetection

Neural Information Processing SystemsJun-21-2026, 18:55:13 GMT

Zero-shot Human-Object Interaction detection aims to localize humans and objects in an image and recognize their interaction, even when specific verb-object pairs are unseen during training. Recent works have shown promising results using prompt learning with pretrained vision-language models such as CLIP, which align natural language prompts with visual features in a shared embedding space. However, existing approaches still fail to handle the visual complexity of interaction--including (1) intra-class visual diversity, where instances of the same verb appear in diverse poses and contexts, and (2) inter-class visual entanglement, where distinct verbs yield visually similar patterns. To address these challenges, we propose VDRP, a framework for Visual Diversity and Region-aware Prompt learning. First, we introduce a visual diversity-aware prompt learning strategy that injects group-wise visual variance into the context embedding.

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

VLMbench: ACompositional Benchmark for Vision-and-Language Manipulation

Neural Information Processing SystemsApr-24-2026, 08:55:05 GMT

Benefiting from language flexibility and compositionality, humans naturally intend to use language to command an embodied agent for complex tasks such as navigation and object manipulation. In this work, we aim to fill the blank of the last mile of embodied agents--object manipulation by following human guidance, e.g., "move the red mug next to the box while keeping it upright." To this end, we introduce an Automatic Manipulation Solver (AMSolver) system and build a Vision-and-Language Manipulation benchmark (VLMbench) based on it, containing various language instructions on categorized robotic manipulation tasks. Specifically, modular rule-based task templates are created to automatically generate robot demonstrations with language instructions, consisting of diverse object shapes and appearances, action types, and motion constraints. We also develop a keypoint-based model 6D-CLIPort to deal with multi-view observations and language input and output a sequence of 6 degrees of freedom (DoF) actions. We hope the new simulator and benchmark will facilitate future research on language-guided robotic manipulation.

artificial intelligence, constraint, language instruction, (12 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Genre: Research Report (0.68)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.54)

Add feedback

e4cdb4090e04816422afcbb08d4badcf-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 11:48:17 GMT

large language model, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country: Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report > New Finding (0.93)

Industry: Leisure & Entertainment (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Synthesize Policies for Transfer and Adaptation across Tasks and Environments

Hexiang Hu, Liyu Chen, Boqing Gong, Fei Sha

Neural Information Processing SystemsFeb-12-2026, 02:37:35 GMT

Wefurther propose newtraining methods todisentangle the embeddings, making them both distinctive signatures of the environments and tasks and effective building blocks for composing the policies.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.04)

Technology:

Information Technology > Artificial Intelligence > Robots (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.33)

Add feedback

04543a88eae2683133c1acbef5a6bf77-Supplemental-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsDec-27-2025, 16:19:48 GMT

container, demonstration, scene initialization, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Robots (0.47)

Add feedback

VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation

Neural Information Processing SystemsDec-27-2025, 16:19:44 GMT

agent, constraint, language instruction, (11 more...)

Neural Information Processing Systems

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
North America > United States > California > Santa Cruz County > Santa Cruz (0.04)

Genre: Research Report (0.68)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.54)

Add feedback

Counterfactual Vision-and-Language Navigation: Unravelling the Unseen

Neural Information Processing SystemsDec-23-2025, 22:57:32 GMT

The task of vision-and-language navigation (VLN) requires an agent to follow text instructions to find its way through simulated household environments. A prominent challenge is to train an agent capable of generalising to new environments at test time, rather than one that simply memorises trajectories and visual details observed during training. We propose a new learning strategy that learns both from observations and generated counterfactual environments. We describe an effective algorithm to generate counterfactual observations on the fly for VLN, as linear combinations of existing environments. Simultaneously, we encourage the agent's actions to remain stable between original and counterfactual environments through our novel training objective-effectively removing the spurious features that otherwise bias the agent. Our experiments show that this technique provides significant improvements in generalisation on benchmarks for Room-to-Room navigation and Embodied Question Answering.

counterfactual vision-and-language navigation, name change, unravelling, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.41)

Add feedback

UNSEEN: Enhancing Dataset Pruning from a Generalization Perspective

Xu, Furui, Wang, Shaobo, Zhang, Jiajun, Sun, Chenghao, Tang, Haixiang, Zhang, Linfeng

arXiv.org Artificial IntelligenceNov-19-2025

The growing scale of datasets in deep learning has introduced significant computational challenges. Dataset pruning addresses this challenge by constructing a compact but informative coreset from the full dataset with comparable performance. Previous approaches typically establish scoring metrics based on specific criteria to identify representative samples. However, these methods predominantly rely on sample scores obtained from the model's performance during the training (i.e., fitting) phase. As scoring models achieve near-optimal performance on training data, such fitting-centric approaches induce a dense distribution of sample scores within a narrow numerical range. This concentration reduces the distinction between samples and hinders effective selection. To address this challenge, we conduct dataset pruning from the perspective of generalization, i.e., scoring samples based on models not exposed to them during training. We propose a plug-and-play framework, UNSEEN, which can be integrated into existing dataset pruning methods. Additionally, conventional score-based methods are single-step and rely on models trained solely on the complete dataset, providing limited perspective on the importance of samples. To address this limitation, we scale UNSEEN to multi-step scenarios and propose an incremental selection technique through scoring models trained on varying coresets, and optimize the quality of the coreset dynamically. Extensive experiments demonstrate that our method significantly outperforms existing state-of-the-art (SOTA) methods on CIFAR-10, CIFAR-100, and ImageNet-1K. Notably, on ImageNet-1K, UNSEEN achieves lossless performance while reducing training data by 30\%.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2511.12988

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.46)

Add feedback

Rethinking what Matters: Effective and Robust Multilingual Realignment for Low-Resource Languages

Nguyen, Quang Phuoc, Anugraha, David, Gaschi, Felix, Cheng, Jun Bin, Lee, En-Shiun Annie

arXiv.org Artificial IntelligenceNov-11-2025

Realignment is a promising strategy to improve cross-lingual transfer in multilingual language models. However, empirical results are mixed and often unreliable, particularly for typologically distant or low-resource languages (LRLs) compared to English. Moreover, word realignment tools often rely on high-quality parallel data, which can be scarce or noisy for many LRLs. In this work, we conduct an extensive empirical study to investigate whether realignment truly benefits from using all available languages, or if strategically selected subsets can offer comparable or even improved cross-lingual transfer, and study the impact on LRLs. Our controlled experiments show that realignment can be particularly effective for LRLs and that using carefully selected, linguistically diverse subsets can match full multilingual alignment, and even outperform it for unseen LRLs. This indicates that effective realignment does not require exhaustive language coverage and can reduce data collection overhead, while remaining both efficient and robust when guided by informed language selection.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2511.06497

Country:

North America > United States (1.00)
Europe (0.92)
Asia > Middle East > UAE (0.46)
North America > Canada > Ontario (0.28)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.86)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.47)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)

Add feedback

UniCoD: Enhancing Robot Policy via Unified Continuous and Discrete Representation Learning

Zhang, Jianke, Hu, Yucheng, Guo, Yanjiang, Chen, Xiaoyu, Liu, Yichen, Chen, Wenna, Lu, Chaochao, Chen, Jianyu

arXiv.org Artificial IntelligenceNov-5-2025

Building generalist robot policies that can handle diverse tasks in open-ended environments is a central challenge in robotics. To leverage knowledge from large-scale pretraining, prior work (VLA) has typically built generalist policies either on top of vision-language understanding models (VLMs) or generative models. However, both semantic understanding from vision-language pretraining and visual dynamics modeling from visual-generation pretraining are crucial for embodied robots. Recent unified models of generation and understanding have demonstrated strong capabilities in both comprehension and generation through large-scale pretraining. We posit that robotic policy learning can likewise benefit from the combined strengths of understanding, planning, and continuous future representation learning. Building on this insight, we introduce UniCoD, which acquires the ability to dynamically model high-dimensional visual features through pretraining on over 1M internet-scale instructional manipulation videos. Subsequently, UniCoD is fine-tuned on data collected from the robot embodiment, enabling the learning of mappings from predictive representations to action tokens. Extensive experiments show our approach consistently outperforms baseline methods in terms of 9\% and 12\% across simulation environments and real-world out-of-distribution tasks.

arxiv preprint arxiv, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2510.10642

Genre: Research Report > New Finding (0.93)

Industry: Leisure & Entertainment (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback