AITopics | Object-Oriented Architecture

Collaborating Authors

Object-Oriented Architecture

News Overviews Instructional Materials AI-Alerts Classics

Supplementary Material for Language Conditioned Spatial Relation Reasoning for 3D Object Grounding

Neural Information Processing SystemsMar-26-2025, 22:33:05 GMT

A.1 Computing horizontal and vertical angles between two objects For two objects A and B, assume that the coordinates of the object center are (x We can compute the Euclidean distance between A and B as d. A.2 Rotation augmentation When performing rotation augmentation, we rotate the whole point cloud by different angles. Specifically, we randomly select one angle among [0, 90, 180, 270 ]. A.3 Training losses We use auxiliary losses L They are all cross entropy losses. When training the teacher model, all the losses are used.

artificial intelligence, machine learning, object-oriented architecture, (14 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.50)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.30)

Add feedback

Language Conditioned Spatial Relation Reasoning for 3D Object Grounding

Neural Information Processing SystemsMar-26-2025, 22:33:01 GMT

Localizing objects in 3D scenes based on natural language requires understanding and reasoning about spatial relations. In particular, it is often crucial to distinguish similar objects referred by the text, such as "the left most chair" and "a chair next to the window". In this work we propose a language-conditioned transformer model for grounding 3D objects and their spatial relations.

machine learning, natural language, object-oriented architecture, (21 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Industry: Education (0.70)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.66)

Add feedback

Cooperative Holistic Scene Understanding: Unifying 3D Object, Layout, and Camera Pose Estimation

Siyuan Huang, Siyuan Qi, Yinxue Xiao, Yixin Zhu, Ying Nian Wu, Song-Chun Zhu

Neural Information Processing SystemsMar-26-2025, 10:50:43 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, object-oriented architecture, (15 more...)

Neural Information Processing Systems

Country: North America (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Vision > Video Understanding (0.41)

Add feedback

BiPrompt-SAM: Enhancing Image Segmentation via Explicit Selection between Point and Text Prompts

Xu, Suzhe, Peng, Jialin, Zhang, Chengyuan

arXiv.org Artificial IntelligenceMar-25-2025

Segmentation is a fundamental task in computer vision, with prompt-driven methods gaining prominence due to their flexibility. The recent Segment Anything Model (SAM) has demonstrated powerful point-prompt segmentation capabilities, while text-based segmentation models offer rich semantic understanding. However, existing approaches rarely explore how to effectively combine these complementary modalities for optimal segmentation performance. This paper presents BiPrompt-SAM, a novel dual-modal prompt segmentation framework that fuses the advantages of point and text prompts through an explicit selection mechanism. Specifically, we leverage SAM's inherent ability to generate multiple mask candidates, combined with a semantic guidance mask from text prompts, and explicitly select the most suitable candidate based on similarity metrics. This approach can be viewed as a simplified Mixture of Experts (MoE) system, where the point and text modules act as distinct "experts," and the similarity scoring serves as a rudimentary "gating network." We conducted extensive evaluations on both the Endovis17 medical dataset and RefCOCO series natural image datasets. On Endovis17, BiPrompt-SAM achieved 89.55\% mDice and 81.46\% mIoU, comparable to state-of-the-art specialized medical segmentation models. On the RefCOCO series datasets, our method attained 87.1\%, 86.5\%, and 85.8\% IoU, significantly outperforming existing approaches. Experiments demonstrate that our explicit dual-selection method effectively combines the spatial precision of point prompts with the semantic richness of text prompts, particularly excelling in scenarios involving semantically complex objects, multiple similar objects, and partial occlusions. BiPrompt-SAM not only provides a simple yet effective implementation but also offers a new perspective on multi-modal prompt fusion.

biprompt-sam, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2503.19769

Genre: Research Report (0.82)

Industry: Health & Medicine (0.98)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.35)

Add feedback

Modelling and unsupervised learning of symmetric deformable object categories

James Thewlis, Hakan Bilen, Andrea Vedaldi

Neural Information Processing SystemsMar-23-2025, 17:24:34 GMT

We propose a new approach to model and learn, without manual supervision, the symmetries of natural objects, such as faces or flowers, given only images as input. It is well known that objects that have a symmetric structure do not usually result in symmetric images due to articulation and perspective effects. This is often tackled by seeking the intrinsic symmetries of the underlying 3D shape, which is very difficult to do when the latter cannot be recovered reliably from data. We show that, if only raw images are given, it is possible to look instead for symmetries in the space of object deformations. We can then learn symmetries from an unstructured collection of images of the object as an extension of the recently-introduced object frame representation, modified so that object symmetries reduce to the obvious symmetry groups in the normalized space. We also show that our formulation provides an explanation of the ambiguities that arise in recovering the pose of symmetric objects from their shape or images and we provide a way of discounting such ambiguities in learning.

artificial intelligence, machine learning, object-oriented architecture, (18 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.52)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

UMB: Understanding Model Behavior for Open-World Object Detection

Neural Information Processing SystemsMar-23-2025, 11:12:11 GMT

Open-World Object Detection (OWOD) is a challenging task that requires the detector to identify unlabeled objects and continuously demands the detector to learn new knowledge based on existing ones. Existing methods primarily focus on recalling unknown objects, neglecting to explore the reasons behind them. This paper aims to understand the model's behavior in predicting the unknown category. First, we model the text attribute and the positive sample probability, obtaining their empirical probability, which can be seen as the detector's estimation of the likelihood of the target with certain known attributes being predicted as the foreground. Then, we jointly decide whether the current object should be categorized in the unknown category based on the empirical, the in-distribution, and the out-of-distribution probability. Finally, based on the decision-making process, we can infer the similarity of an unknown object to known classes and identify the attribute with the most significant impact on the decision-making process. This additional information can help us understand the behavior of the model's prediction in the unknown class. The evaluation results on the Real-World Object Detection (RWD) benchmark, which consists of five real-world application datasets, show that we surpassed the previous state-of-the-art (SOTA) with an absolute gain of 5.3 mAP for unknown classes, reaching 20.5 mAP. Our code is available at https://github.com/xxyzll/UMB.

artificial intelligence, dataset, object-oriented architecture, (17 more...)

Neural Information Processing Systems

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > Experimental Study (0.93)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.93)

Add feedback

CAMERA_READY.pdf

Neural Information Processing SystemsMar-22-2025, 18:53:51 GMT

We present a slot-wise, object-based transition model that decomposes a scene into objects, aligns them (with respect to a slot-wise object memory) to maintain a consistent order across time, and predicts how those objects evolve over successive frames. The model is trained end-to-end without supervision using transition losses at the level of the object-structured representation rather than pixels. Thanks to the introduction of our novel alignment module, the model deals properly with two issues that are not handled satisfactorily by other transition models, namely object persistence and object identity. We show that the combination of an objectlevel loss and correct object alignment over time enables the model to outperform a state-of-the-art baseline, and allows it to deal well with object occlusion and re-appearance in partially observable environments.

machine learning, object-oriented architecture, transition model, (19 more...)

Neural Information Processing Systems

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.84)

Add feedback

Supplementary Material

Neural Information Processing SystemsMar-21-2025, 15:10:40 GMT

We printed a checkerboard with a 9x10 grid of blocks, each measuring 87 mm x 87 mm. Parameter Value Model Architecture Panoptic-PolarNet Test Batch Size 2 Val Batch Size 2 Test Batch size 1 post proc threshold 0.1 post proc nms kernel 5 post proc top k 100 center loss MSE offset loss L1 center loss weight 100 offset loss weight 10 enable SAP True SAP start epoch 30 SAP rate 0.01 Table 3: Parameters for Panoptic Segmentation model Model mIoU (%) Semantic Segmentation Cylinder3D 67.8 Panoptic Segmentation Panoptic-PolarNet 59.5 4D Panoptic Segmentation 4D-StOP 58.8 Table 6: Models of various tasks used in our experiments and their performances on SemanticKITTI The results reveal a significant variance in performance across different categories. The dataset is divided into 17 and 6 categories, respectively. Ground' and'Roads', as opposed to grouping anything related to ground as a single category. Overall, the performance across these tasks underscores the challenges posed by our dataset's With our dataset, future work can focus on improving the model's capacity to handle such diverse The raw data, processed data, and framework code can be found on our website.

category, machine learning, object-oriented architecture, (18 more...)

Neural Information Processing Systems

Technology: