AITopics | voxel space

752df938681b2cf15e5fc9689f0bcf3a-Paper-Conference.pdf

Neural Information Processing SystemsFeb-9-2026, 20:38:53 GMT

computer vision, detection, voxel space, (12 more...)

Neural Information Processing Systems

Country:

Asia > China > Hong Kong (0.04)
South America > Brazil (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Unifying Voxel-based Representation with Transformer for 3D Object Detection

Neural Information Processing SystemsDec-24-2025, 11:28:19 GMT

In this work, we present a unified framework for multi-modality 3D object detection, named UVTR. The proposed method aims to unify multi-modality representations in the voxel space for accurate and robust single-or cross-modality 3D detection. To this end, the modality-specific space is first designed to represent different inputs in the voxel feature space. Different from previous work, our approach preserves the voxel space without height compression to alleviate semantic ambiguity and enable spatial connections. To make full use of the inputs from different sensors, the cross-modality interaction is then proposed, including knowledge transfer and modality fusion. In this way, geometry-aware expressions in point clouds and context-rich features in images are well utilized for better performance and robustness. The transformer decoder is applied to efficiently sample features from the unified space with learnable positions, which facilitates object-level interactions. In general, UVTR presents an early attempt to represent different modalities in a unified framework.

name change, transformer, unifying voxel-based representation, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

Unifying Voxel-based Representation with Transformer for 3D Object Detection Y anwei Li

Neural Information Processing SystemsAug-15-2025, 23:44:26 GMT

Detecting 3D objects with multi-modality sensors ( i.e., LiDAR and camera) is regarded as a fundamental task in real-world scenes. For accurate object detection, data from different modalities are utilized to provide complementary knowledge, like accurate positions from point clouds and rich context from images.

artificial intelligence, detection, machine learning, (14 more...)

Neural Information Processing Systems

Country:

Asia > China > Hong Kong (0.04)
South America > Brazil (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

VoxRep: Enhancing 3D Spatial Understanding in 2D Vision-Language Models via Voxel Representation

Dao, Alan, Buppodom, Norapat

arXiv.org Artificial IntelligenceMar-27-2025

Comprehending 3D environments is vital for intelligent systems in domains like robotics and autonomous navigation. Voxel grids offer a structured representation of 3D space, but extracting high-level semantic meaning remains challenging. This paper proposes a novel approach utilizing a Vision-Language Model (VLM) to extract "voxel semantics"-object identity, color, and location-from voxel data. Critically, instead of employing complex 3D networks, our method processes the voxel space by systematically slicing it along a primary axis (e.g., the Z-axis, analogous to CT scan slices). These 2D slices are then formatted and sequentially fed into the image encoder of a standard VLM. The model learns to aggregate information across slices and correlate spatial patterns with semantic concepts provided by the language component. This slice-based strategy aims to leverage the power of pre-trained 2D VLMs for efficient 3D semantic understanding directly from voxel representations.

information, natural language, object-oriented architecture, (16 more...)

arXiv.org Artificial Intelligence

2503.21214

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.87)

Add feedback

FetchBot: Object Fetching in Cluttered Shelves via Zero-Shot Sim2Real

Liu, Weiheng, Wan, Yuxuan, Wang, Jilong, Kuang, Yuxuan, Shi, Xuesong, Li, Haoran, Zhao, Dongbin, Zhang, Zhizheng, Wang, He

arXiv.org Artificial IntelligenceFeb-25-2025

Object fetching from cluttered shelves is an important capability for robots to assist humans in real-world scenarios. Achieving this task demands robotic behaviors that prioritize safety by minimizing disturbances to surrounding objects, an essential but highly challenging requirement due to restricted motion space, limited fields of view, and complex object dynamics. In this paper, we introduce FetchBot, a sim-to-real framework designed to enable zero-shot generalizable and safety-aware object fetching from cluttered shelves in real-world settings. To address data scarcity, we propose an efficient voxel-based method for generating diverse simulated cluttered shelf scenes at scale and train a dynamics-aware reinforcement learning (RL) policy to generate object fetching trajectories within these scenes. This RL policy, which leverages oracle information, is subsequently distilled into a vision-based policy for real-world deployment. Considering that sim-to-real discrepancies stem from texture variations mostly while from geometric dimensions rarely, we propose to adopt depth information estimated by full-fledged depth foundation models as the input for the vision-based policy to mitigate sim-to-real gap. To tackle the challenge of limited views, we design a novel architecture for learning multi-view representations, allowing for comprehensive encoding of cluttered shelf scenes. This enables FetchBot to effectively minimize collisions while fetching objects from varying positions and depths, ensuring robust and safety-aware operation. Both simulation and real-robot experiments demonstrate FetchBot's superior generalization ability, particularly in handling a broad range of real-world scenarios, includ

arxiv preprint arxiv, representation, scenario, (14 more...)

arXiv.org Artificial Intelligence

2502.17894

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Unifying Voxel-based Representation with Transformer for 3D Object Detection

Neural Information Processing SystemsOct-11-2024, 15:45:25 GMT

In this work, we present a unified framework for multi-modality 3D object detection, named UVTR. The proposed method aims to unify multi-modality representations in the voxel space for accurate and robust single- or cross-modality 3D detection. To this end, the modality-specific space is first designed to represent different inputs in the voxel feature space. Different from previous work, our approach preserves the voxel space without height compression to alleviate semantic ambiguity and enable spatial connections. To make full use of the inputs from different sensors, the cross-modality interaction is then proposed, including knowledge transfer and modality fusion.

object detection, transformer, unifying voxel-based representation, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (0.69)
Information Technology > Artificial Intelligence > Machine Learning (0.42)

Add feedback

A Reduced-Dimension fMRI Shared Response Model Po-Hsuan Chen 1, Janice Chen

Neural Information Processing SystemsMar-13-2024, 02:45:24 GMT

Multi-subject fMRI data is critical for evaluating the generality and validity of findings across subjects, and its effective utilization helps improve analysis sensitivity. We develop a shared response model for aggregating multi-subject fMRI data that accounts for different functional topographies among anatomically aligned datasets. Our model demonstrates improved sensitivity in identifying a shared response for a variety of datasets and anatomical brain regions of interest. Furthermore, by removing the identified shared response, it allows improved detection of group differences. The ability to identify what is shared and what is not shared opens the model to a wide range of multi-subject fMRI studies.

correlation, dataset, srm, (15 more...)

Neural Information Processing Systems

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Cognitive Science > Neuroscience (0.68)

Add feedback

Understanding coordinate systems and DICOM for deep learning medical image analysis

#artificialintelligenceJul-15-2021, 08:02:18 GMT

Multiple introductory concepts regarding deep learning in medical imaging, such as coordinate system and dicom data extraction from the machine learning perspective.

coordinate system, medical image, transformation, (14 more...)

#artificialintelligence

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

An Introduction to Biomedical Image Analysis with TensorFlow and DLTK

#artificialintelligenceJul-4-2018, 15:56:02 GMT

A class imbalance during training will have a larger impact on rare phenomena (e.g.

artificial intelligence, database, machine learning, (16 more...)

#artificialintelligence

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.96)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.32)

Add feedback

A Reduced-Dimension fMRI Shared Response Model

Chen, Po-Hsuan (Cameron), Chen, Janice, Yeshurun, Yaara, Hasson, Uri, Haxby, James, Ramadge, Peter J.

Neural Information Processing SystemsDec-31-2015

Multi-subject fMRI data is critical for evaluating the generality and validity of findings across subjects, and its effective utilization helps improve analysis sensitivity. We develop a shared response model for aggregating multi-subject fMRI data that accounts for different functional topographies among anatomically aligned datasets. Our model demonstrates improved sensitivity in identifying a shared response for a variety of datasets and anatomical brain regions of interest. Furthermore, by removing the identified shared response, it allows improved detection of group differences. The ability to identify what is shared and what is not shared opens the model to a wide range of multi-subject fMRI studies.

artificial intelligence, machine learning, srm, (17 more...)

Neural Information Processing Systems

Industry: