Goto

Collaborating Authors

 camera ray




CVRecon: Rethinking 3D Geometric Feature Learning For Neural Reconstruction

Feng, Ziyue, Yang, Liang, Guo, Pengsheng, Li, Bing

arXiv.org Artificial Intelligence

Recent advances in neural reconstruction using posed image sequences have made remarkable progress. However, due to the lack of depth information, existing volumetric-based techniques simply duplicate 2D image features of the object surface along the entire camera ray. We contend this duplication introduces noise in empty and occluded spaces, posing challenges for producing high-quality 3D geometry. Drawing inspiration from traditional multi-view stereo methods, we propose an end-to-end 3D neural reconstruction framework CVRecon, designed to exploit the rich geometric embedding in the cost volumes to facilitate 3D geometric feature learning. Furthermore, we present Ray-contextual Compensated Cost Volume (RCCV), a novel 3D geometric feature representation that encodes view-dependent information with improved integrity and robustness. Through comprehensive experiments, we demonstrate that our approach significantly improves the reconstruction quality in various metrics and recovers clear fine details of the 3D geometries. Our extensive ablation studies provide insights into the development of effective 3D geometric feature learning schemes. Project page: https://cvrecon.ziyue.cool/


RePAST: Relative Pose Attention Scene Representation Transformer

Safin, Aleksandr, Duckworth, Daniel, Sajjadi, Mehdi S. M.

arXiv.org Artificial Intelligence

The Scene Representation Transformer (SRT) is a recent method to render novel views at interactive rates. Since SRT uses camera poses with respect to an arbitrarily chosen reference camera, it is not invariant to the order of the input views. As a result, SRT is not directly applicable to large-scale scenes where the reference frame would need to be changed regularly. In this work, we propose Relative Pose Attention SRT (RePAST): Instead of fixing a reference frame at the input, we inject pairwise relative camera pose information directly into the attention mechanism of the Transformers. This leads to a model that is by definition invariant to the choice of any global reference frame, while still retaining the full capabilities of the original method. Empirical results show that adding this invariance to the model does not lead to a loss in quality. We believe that this is a step towards applying fully latent transformer-based rendering methods to large-scale scenes.


Breakthrough AI Technique Enables Real-Time Rendering of Scenes in 3D From 2D Images

#artificialintelligence

To represent a 3D scene from a 2D image, a light field network encodes the 360-degree light field of the 3D scene into a neural network that directly maps each camera ray to the color observed by that ray. The new machine-learning system can generate a 3D scene from an image about 15,000 times faster than other methods. Humans are pretty good at looking at a single two-dimensional image and understanding the full three-dimensional scene that it captures. Artificial intelligence agents are not. Yet a machine that needs to interact with objects in the world -- like a robot designed to harvest crops or assist with surgery -- must be able to infer properties about a 3D scene from observations of the 2D images it's trained on.


NeRF

Communications of the ACM

We present a method that achieves state-of-the-art results for synthesizing novel views of complex scenes by optimizing an underlying continuous volumetric scene function using a sparse set of input views. Our algorithm represents a scene using a fully connected (nonconvolutional) deep network, whose input is a single continuous 5D coordinate (spatial location (x, y, z) and viewing direction (ϑ,)) and whose output is the volume density and view-dependent emitted radiance at that spatial location. We synthesize views by querying 5D coordinates along camera rays and use classic volume rendering techniques to project the output colors and densities into an image. Because volume rendering is naturally differentiable, the only input required to optimize our representation is a set of images with known camera poses. We describe how to effectively optimize neural radiance fields to render photorealistic novel views of scenes with complicated geometry and appearance, and demonstrate results that outperform prior work on neural rendering and view synthesis. In this work, we address the long-standing problem of view synthesis in a new way. View synthesis is the problem of rendering new views of a scene from a given set of input images and their respective camera poses.


Technique enables real-time rendering of scenes in 3D

#artificialintelligence

Humans are pretty good at looking at a single two-dimensional image and understanding the full three-dimensional scene that it captures. Artificial intelligence agents are not. Yet a machine that needs to interact with objects in the world--like a robot designed to harvest crops or assist with surgery--must be able to infer properties about a 3D scene from observations of the 2D images it's trained on. While scientists have had success using neural networks to infer representations of 3D scenes from images, these machine learning methods aren't fast enough to make them feasible for many real-world applications. A new technique demonstrated by researchers at MIT and elsewhere is able to represent 3D scenes from images about 15,000 times faster than some existing models.


Technique enables real-time rendering of scenes in 3D

#artificialintelligence

Humans are pretty good at looking at a single two-dimensional image and understanding the full three-dimensional scene that it captures. Artificial intelligence agents are not. Yet a machine that needs to interact with objects in the world -- like a robot designed to harvest crops or assist with surgery -- must be able to infer properties about a 3D scene from observations of the 2D images it's trained on. While scientists have had success using neural networks to infer representations of 3D scenes from images, these machine learning methods aren't fast enough to make them feasible for many real-world applications. A new technique demonstrated by researchers at MIT and elsewhere is able to represent 3D scenes from images about 15,000 times faster than some existing models.


Active Safety Envelopes using Light Curtains with Probabilistic Guarantees

Ancha, Siddharth, Pathak, Gaurav, Narasimhan, Srinivasa G., Held, David

arXiv.org Artificial Intelligence

To safely navigate unknown environments, robots must accurately perceive dynamic obstacles. Instead of directly measuring the scene depth with a LiDAR sensor, we explore the use of a much cheaper and higher resolution sensor: programmable light curtains. Light curtains are controllable depth sensors that sense only along a surface that a user selects. We use light curtains to estimate the safety envelope of a scene: a hypothetical surface that separates the robot from all obstacles. We show that generating light curtains that sense random locations (from a particular distribution) can quickly discover the safety envelope for scenes with unknown objects. Importantly, we produce theoretical safety guarantees on the probability of detecting an obstacle using random curtains. We combine random curtains with a machine learning based model that forecasts and tracks the motion of the safety envelope efficiently. Our method accurately estimates safety envelopes while providing probabilistic safety guarantees that can be used to certify the efficacy of a robot perception system to detect and avoid dynamic obstacles. We evaluate our approach in a simulated urban driving environment and a real-world environment with moving pedestrians using a light curtain device and show that we can estimate safety envelopes efficiently and effectively. Project website: https://siddancha.github.io/projects/active-safety-envelopes-with-guarantees