AITopics | objectness

Towards 3DObjectness Learning in an Open World

Neural Information Processing SystemsJun-19-2026, 05:27:56 GMT

Recent advancements in 3D object detection and novel category detection have made significant progress, yet research on learning generalized 3D objectness remains insufficient. In this paper, we delve into learning open-world 3D objectness, which focuses on detecting all objects in a 3D scene, including novel objects unseen during training. Traditional closed-set 3D detectors struggle to generalize to openworld scenarios, while directly incorporating 3D open-vocabulary models for openworld ability struggles with vocabulary expansion and semantic overlap. To achieve generalized 3D object discovery, we propose OP3Det, a class-agnostic OpenWorld Prompt-free 3DDetector to detect any objects within 3D scenes without relying on hand-crafted text prompts. We introduce the strong generalization and zero-shot capabilities of 2D foundation models, utilizing both 2D semantic priors and 3D geometric priors for class-agnostic proposals to broaden 3D object discovery. Then, by integrating complementary information from point cloud and RGB image in the cross-modal mixture of experts, OP3Det dynamically routes uni-modal and multi-modal features to learn generalized 3D objectness. Extensive experiments demonstrate the extraordinary performance of OP3Det, which significantly surpasses existing open-world 3D detectors by up to 16.0% in AR and achieves a 13.5% improvement compared to closed-world 3D detectors.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology (0.67)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Towards 3D Objectness Learning in an Open World

Neural Information Processing SystemsJun-13-2026, 02:13:07 GMT

Recent advancements in 3D object detection and novel category detection have made significant progress, yet research on learning generalized 3D objectness remains insufficient. In this paper, we delve into learning open-world 3D objectness, which focuses on detecting all objects in a 3D scene, including novel objects unseen during training. Traditional closed-set 3D detectors struggle to generalize to open-world scenarios, while directly incorporating 3D open-vocabulary models for open-world ability struggles with vocabulary expansion and semantic overlap. To achieve generalized 3D object discovery, We propose OP3Det, a class-agnostic Open-World Prompt-free 3D Detector to detect any objects within 3D scenes without relying on hand-crafted text prompts. We introduce the strong generalization and zero-shot capabilities of 2D foundation models, utilizing both 2D semantic priors and 3D geometric priors for class-agnostic proposals to broaden 3D object discovery. Then, by integrating complementary information from point cloud and RGB image in the cross-modal mixture of experts, OP3Det dynamically routes uni-modal and multi-modal features to learn generalized 3D objectness. Extensive experiments demonstrate the extraordinary performance of OP3Det, which significantly surpasses existing open-world 3D detectors by up to 16.0% in AR and achieves a 13.5% improvement compared to closed-world 3D detectors.

artificial intelligence, name change, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.77)

Add feedback

SubmodBoxes: Near-Optimal Search for a Set of Diverse Object Proposals

Qing Sun, Dhruv Batra

Neural Information Processing SystemsFeb-18-2026, 21:26:21 GMT

A number of problems in Computer Vision and Machine Learning involve searching for a set of bounding boxes or rectangular windows.

artificial intelligence, marginal gain, submodboxe, (13 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (0.91)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.31)

Add feedback

6d9cb7de5e8ac30bd5e8734bc96a35c1-Paper.pdf

Neural Information Processing SystemsFeb-9-2026, 06:33:15 GMT

computer vision, segmentation, video, (10 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts (0.04)
Asia > China > Guangxi Province > Nanning (0.04)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.67)

Add feedback

20885c72ca35d75619d6a378edea9f76-Paper.pdf

Neural Information Processing SystemsFeb-7-2026, 19:25:51 GMT

Object detection has achieved promising success, but requires large-scale fullyannotated data, which is time-consuming and labor-extensive. Therefore, we consider object detection with mixedsupervision, which learns novelobject categories using weak annotations with thehelpoffullannotations ofexistingbase objectcategories.

artificial intelligence, category, machine learning, (19 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

The Emergence of Objectness: Learning Zero-shot Segmentation from Videos

Neural Information Processing SystemsDec-24-2025, 06:32:55 GMT

Humans can easily detect and segment moving objects simply by observing how they move, even without knowledge of object semantics. Inspired by this, we develop a zero-shot unsupervised approach for learning object segmentations. The model comprises two visual pathways: an appearance pathway that segments individual RGB images into coherent object regions, and a motion pathway that predicts the flow vector for each region between consecutive video frames. The two pathways jointly reconstruct a new representation called segment flow. This decoupled representation of appearance and motion is trained in a self-supervised manner to reconstruct one frame from another.When pretrained on an unlabeled video corpus, the model can be useful for a variety of applications, including 1) primary object segmentation from a single image in a zero-shot fashion; 2) moving object segmentation from a video with unsupervised test-time adaptation; 3) image semantic segmentation by supervised fine-tuning on a labeled image dataset. We demonstrate encouraging experimental results on all of these tasks using pretrained models.

learning zero-shot segmentation, name change, objectness, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.54)

Add feedback

A Neural Affinity Framework for Abstract Reasoning: Diagnosing the Compositional Gap in Transformer Architectures via Procedural Task Taxonomy

Ingram, Miguel, Merritt, Arthur Joseph III

arXiv.org Artificial IntelligenceDec-9-2025

Responding to Hodel et al.'s (2024) call for a formal definition of task relatedness in re-arc, we present the first 9-category taxonomy of all 400 tasks, validated at 97.5% accuracy via rule-based code analysis. We prove the taxonomy's visual coherence by training a CNN on raw grid pixels (95.24% accuracy on S3, 36.25% overall, 3.3x chance), then apply the taxonomy diagnostically to the original ARC-AGI-2 test set. Our curriculum analysis reveals 35.3% of tasks exhibit low neural affinity for Transformers--a distributional bias mirroring ARC-AGI-2. To probe this misalignment, we fine-tuned a 1.7M-parameter Transformer across 302 tasks, revealing a profound Compositional Gap: 210 of 302 tasks (69.5%) achieve >80% cell accuracy (local patterns) but <10% grid accuracy (global synthesis). This provides direct evidence for a Neural Affinity Ceiling Effect, where performance is bounded by architectural suitability, not curriculum. Applying our framework to Li et al.'s independent ViTARC study (400 specialists, 1M examples each) confirms its predictive power: Very Low affinity tasks achieve 51.9% versus 77.7% for High affinity (p<0.001), with a task at 0% despite massive data. The taxonomy enables precise diagnosis: low-affinity tasks (A2) hit hard ceilings, while high-affinity tasks (C1) reach 99.8%. These findings indicate that progress requires hybrid architectures with affinity-aligned modules. We release our validated taxonomy,

accuracy, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2512.07109

Genre: Research Report > Experimental Study (0.48)

Industry: Education (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)
Information Technology > Artificial Intelligence > Machine Learning (0.67)

Add feedback

6d9cb7de5e8ac30bd5e8734bc96a35c1-Paper.pdf

Neural Information Processing SystemsAug-15-2025, 01:23:37 GMT

artificial intelligence, machine learning, segmentation, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts (0.04)
Asia > China > Guangxi Province > Nanning (0.04)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.67)

Add feedback

The Emergence of Objectness: Learning Zero-shot Segmentation from Videos

Neural Information Processing SystemsMay-26-2025, 21:27:28 GMT

Humans can easily detect and segment moving objects simply by observing how they move, even without knowledge of object semantics. Inspired by this, we develop a zero-shot unsupervised approach for learning object segmentations. The model comprises two visual pathways: an appearance pathway that segments individual RGB images into coherent object regions, and a motion pathway that predicts the flow vector for each region between consecutive video frames. The two pathways jointly reconstruct a new representation called segment flow. This decoupled representation of appearance and motion is trained in a self-supervised manner to reconstruct one frame from another.When pretrained on an unlabeled video corpus, the model can be useful for a variety of applications, including 1) primary object segmentation from a single image in a zero-shot fashion; 2) moving object segmentation from a video with unsupervised test-time adaptation; 3) image semantic segmentation by supervised fine-tuning on a labeled image dataset.

artificial intelligence, large language model, natural language, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)

Add feedback

A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning

Wen, Xin, Zhao, Bingchen, Chen, Yilun, Pang, Jiangmiao, Qi, Xiaojuan

arXiv.org Artificial IntelligenceMar-10-2025

Pre-trained vision models (PVMs) are fundamental to modern robotics, yet their optimal configuration remains unclear. Through systematic evaluation, we find that while DINO and iBOT outperform MAE across visuomotor control and perception tasks, they struggle when trained on non-(single-)object-centric (NOC) data--a limitation strongly correlated with their diminished ability to learn object-centric representations. This investigation indicates that the ability to form object-centric representations from the non-object-centric robotics dataset is the key to success for PVMs. Motivated by this discovery, we designed SlotMIM, a method that induces object-centric representations by introducing a semantic bottleneck to reduce the number of prototypes to encourage the emergence of objectness as well as cross-view consistency regularization for encouraging multiview invariance. Our experiments encompass pre-training on object-centric, scene-centric, web-crawled, and ego-centric data. Across all settings, our approach learns transferrable representations and achieves significant improvements over prior work in image recognition, scene understanding, and robot learning evaluations. When scaled up with million-scale datasets, our method also demonstrates superior data efficiency and scalability. Our code and models are publicly available at https://github.com/CVMI-Lab/SlotMIM.

dataset, representation, slotmim, (17 more...)

arXiv.org Artificial Intelligence

2503.0696

Country: