AITopics | Ren, Yuan

Collaborating Authors

Ren, Yuan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

HIPPo: Harnessing Image-to-3D Priors for Model-free Zero-shot 6D Pose Estimation

Liu, Yibo, Jiang, Zhaodong, Xu, Binbin, Wu, Guile, Ren, Yuan, Cao, Tongtong, Liu, Bingbing, Yang, Rui Heng, Rasouli, Amir, Shan, Jinjun

arXiv.org Artificial IntelligenceFeb-14-2025

This work focuses on model-free zero-shot 6D object pose estimation for robotics applications. While existing methods can estimate the precise 6D pose of objects, they heavily rely on curated CAD models or reference images, the preparation of which is a time-consuming and labor-intensive process. Moreover, in real-world scenarios, 3D models or reference images may not be available in advance and instant robot reaction is desired. In this work, we propose a novel framework named HIPPo, which eliminates the need for curated CAD models and reference images by harnessing image-to-3D priors from Diffusion Models, enabling model-free zero-shot 6D pose estimation. Specifically, we construct HIPPo Dreamer, a rapid image-to-mesh model built on a multiview Diffusion Model and a 3D reconstruction foundation model. Our HIPPo Dreamer can generate a 3D mesh of any unseen objects from a single glance in just a few seconds. Then, as more observations are acquired, we propose to continuously refine the diffusion prior mesh model by joint optimization of object geometry and appearance. This is achieved by a measurement-guided scheme that gradually replaces the plausible diffusion priors with more reliable online observations. Consequently, HIPPo can instantly estimate and track the 6D pose of a novel object and maintain a complete mesh for immediate robotic applications. Thorough experiments on various benchmarks show that HIPPo outperforms state-of-the-art methods in 6D object pose estimation when prior reference images are limited.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2502.10606

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision > Video Understanding (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)

Add feedback

UniGaussian: Driving Scene Reconstruction from Multiple Camera Models via Unified Gaussian Representations

Ren, Yuan, Wu, Guile, Li, Runhao, Yang, Zheyuan, Liu, Yibo, Chen, Xingxin, Cao, Tongtong, Liu, Bingbing

arXiv.org Artificial IntelligenceNov-22-2024

Urban scene reconstruction is crucial for real-world autonomous driving simulators. Although existing methods have achieved photorealistic reconstruction, they mostly focus on pinhole cameras and neglect fisheye cameras. In fact, how to effectively simulate fisheye cameras in driving scene remains an unsolved problem. In this work, we propose UniGaussian, a novel approach that learns a unified 3D Gaussian representation from multiple camera models for urban scene reconstruction in autonomous driving. Our contributions are two-fold. First, we propose a new differentiable rendering method that distorts 3D Gaussians using a series of affine transformations tailored to fisheye camera models. This addresses the compatibility issue of 3D Gaussian splatting with fisheye cameras, which is hindered by light ray distortion caused by lenses or mirrors. Besides, our method maintains real-time rendering while ensuring differentiability. Second, built on the differentiable rendering method, we design a new framework that learns a unified Gaussian representation from multiple camera models. By applying affine transformations to adapt different camera models and regularizing the shared Gaussians with supervision from different modalities, our framework learns a unified 3D Gaussian representation with input data from multiple sources and achieves holistic driving scene understanding. As a result, our approach models multiple sensors (pinhole and fisheye cameras) and modalities (depth, semantic, normal and LiDAR point clouds). Our experiments show that our method achieves superior rendering quality and fast rendering speed for driving scene simulation.

artificial intelligence, driving scene reconstruction, unified gaussian representation, (2 more...)

arXiv.org Artificial Intelligence

2411.15355

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Vision (1.00)

Add feedback

AutoSplat: Constrained Gaussian Splatting for Autonomous Driving Scene Reconstruction

Khan, Mustafa, Fazlali, Hamidreza, Sharma, Dhruv, Cao, Tongtong, Bai, Dongfeng, Ren, Yuan, Liu, Bingbing

arXiv.org Artificial IntelligenceJul-3-2024

Realistic scene reconstruction and view synthesis are essential for advancing autonomous driving systems by simulating safety-critical scenarios. 3D Gaussian Splatting excels in real-time rendering and static scene reconstructions but struggles with modeling driving scenarios due to complex backgrounds, dynamic objects, and sparse views. We propose AutoSplat, a framework employing Gaussian splatting to achieve highly realistic reconstructions of autonomous driving scenes. By imposing geometric constraints on Gaussians representing the road and sky regions, our method enables multi-view consistent simulation of challenging scenarios including lane changes. Leveraging 3D templates, we introduce a reflected Gaussian consistency constraint to supervise both the visible and unseen side of foreground objects. Moreover, to model the dynamic appearance of foreground objects, we estimate residual spherical harmonics for each foreground Gaussian. Extensive experiments on Pandaset and KITTI demonstrate that AutoSplat outperforms state-of-the-art methods in scene reconstruction and novel view synthesis across diverse driving scenarios. Visit our project page at https://autosplat.github.io/.

artificial intelligence, foreground, gaussian, (11 more...)

arXiv.org Artificial Intelligence

2407.02598

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Japan > Honshū > Chūbu (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry:

Transportation > Ground > Road (0.93)
Automobiles & Trucks (0.93)
Information Technology > Robotics & Automation (0.84)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.84)

Add feedback

Outlier-Aware Training for Low-Bit Quantization of Structural Re-Parameterized Networks

Niu, Muqun, Ren, Yuan, Li, Boyu, Ding, Chenchen

arXiv.org Artificial IntelligenceFeb-11-2024

Lightweight design of Convolutional Neural Networks (CNNs) requires co-design efforts in the model architectures and compression techniques. As a novel design paradigm that separates training and inference, a structural re-parameterized (SR) network such as the representative RepVGG revitalizes the simple VGG-like network with a high accuracy comparable to advanced and often more complicated networks. However, the merging process in SR networks introduces outliers into weights, making their distribution distinct from conventional networks and thus heightening difficulties in quantization. To address this, we propose an operator-level improvement for training called Outlier Aware Batch Normalization (OABN). Additionally, to meet the demands of limited bitwidths while upkeeping the inference accuracy, we develop a clustering-based non-uniform quantization framework for Quantization-Aware Training (QAT) named ClusterQAT. Integrating OABN with ClusterQAT, the quantized performance of RepVGG is largely enhanced, particularly when the bitwidth falls below 8.

artificial intelligence, machine learning, quantization, (16 more...)

arXiv.org Artificial Intelligence

2402.072

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

MV-DeepSDF: Implicit Modeling with Multi-Sweep Point Clouds for 3D Vehicle Reconstruction in Autonomous Driving

Liu, Yibo, Zhu, Kelly, Wu, Guile, Ren, Yuan, Liu, Bingbing, Liu, Yang, Shan, Jinjun

arXiv.org Artificial IntelligenceAug-21-2023

Reconstructing 3D vehicles from noisy and sparse partial point clouds is of great significance to autonomous driving. Most existing 3D reconstruction methods cannot be directly applied to this problem because they are elaborately designed to deal with dense inputs with trivial noise. In this work, we propose a novel framework, dubbed MV-DeepSDF, which estimates the optimal Signed Distance Function (SDF) shape representation from multi-sweep point clouds to reconstruct vehicles in the wild. Although there have been some SDF-based implicit modeling methods, they only focus on single-view-based reconstruction, resulting in low fidelity. In contrast, we first analyze multi-sweep consistency and complementarity in the latent feature space and propose to transform the implicit space shape estimation problem into an element-to-set feature extraction problem. Then, we devise a new architecture to extract individual element-level representations and aggregate them to generate a set-level predicted latent code. This set-level latent code is an expression of the optimal 3D shape in the implicit space, and can be subsequently decoded to a continuous SDF of the vehicle. In this way, our approach learns consistent and complementary information among multi-sweeps for 3D vehicle reconstruction. We conduct thorough experiments on two real-world autonomous driving datasets (Waymo and KITTI) to demonstrate the superiority of our approach over state-of-the-art alternative methods both qualitatively and quantitatively.

artificial intelligence, multi-sweep point cloud, vehicle reconstruction, (3 more...)

arXiv.org Artificial Intelligence

2309.16715

Genre: Research Report (0.40)

Industry:

Transportation > Ground > Road (0.80)
Information Technology > Robotics & Automation (0.80)
Automobiles & Trucks (0.80)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.80)
Information Technology > Artificial Intelligence > Vision (0.53)

Add feedback

S3CNet: A Sparse Semantic Scene Completion Network for LiDAR Point Clouds

Cheng, Ran, Agia, Christopher, Ren, Yuan, Li, Xinhai, Bingbing, Liu

arXiv.org Artificial IntelligenceDec-16-2020

With the increasing reliance of self-driving and similar robotic systems on robust 3D vision, the processing of LiDAR scans with deep convolutional neural networks has become a trend in academia and industry alike. Prior attempts on the challenging Semantic Scene Completion task - which entails the inference of dense 3D structure and associated semantic labels from "sparse" representations - have been, to a degree, successful in small indoor scenes when provided with dense point clouds or dense depth maps often fused with semantic segmentation maps from RGB images. However, the performance of these systems drop drastically when applied to large outdoor scenes characterized by dynamic and exponentially sparser conditions. Likewise, processing of the entire sparse volume becomes infeasible due to memory limitations and workarounds introduce computational inefficiency as practitioners are forced to divide the overall volume into multiple equal segments and infer on each individually, rendering real-time performance impossible. In this work, we formulate a method that subsumes the sparsity of large-scale environments and present S3CNet, a sparse convolution based neural network that predicts the semantically completed scene from a single, unified LiDAR point cloud. We show that our proposed method outperforms all counterparts on the 3D task, achieving state-of-the art results on the SemanticKITTI benchmark. Furthermore, we propose a 2D variant of S3CNet with a multi-view fusion strategy to complement our 3D network, providing robustness to occlusions and extreme sparsity in distant regions. We conduct experiments for the 2D semantic scene completion task and compare the results of our sparse 2D network against several leading LiDAR segmentation models adapted for bird's eye view segmentation on two open-source datasets.

completion, deep learning, neural network, (16 more...)

arXiv.org Artificial Intelligence

2012.09242

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Soundness Preserving Approximation for TBox Reasoning

Ren, Yuan (University of Aberdeen) | Pan, Jeff Z. (University of Aberdeen) | Zhao, Yuting (University of Aberdeen)

AAAI ConferencesJul-15-2010

Large scale ontology applications require efficient and robust description logic (DL) reasoning services. Expressive DLs usually have very high worst case complexity while tractable DLs are restricted in terms of expressive power. This brings a new challenge: can users use expressive DLs to build their ontologies and still enjoy the efficient services as in tractable languages. In this paper, we present a soundness preserving approximate reasoning framework for TBox reasoning in OWL2-DL. The ontologies are encoded into EL++ with additional data structures. A tractable algorithm is presented to classify such approximation by realizing more and more inference patterns. Preliminary evaluation shows that our approach can classify existing benchmarks in large scale efficiently with a high recall.

approximation, artificial intelligence, reasoning, (16 more...)

AAAI Conferences

Twenty-Fourth AAAI Conference on Artificial Intelligence

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)

Add feedback