camera ray
- Asia > China > Guangdong Province > Shenzhen (0.04)
- North America > United States > Texas > Kleberg County (0.04)
- North America > United States > Texas > Chambers County (0.04)
- (3 more...)
CVRecon: Rethinking 3D Geometric Feature Learning For Neural Reconstruction
Feng, Ziyue, Yang, Liang, Guo, Pengsheng, Li, Bing
Recent advances in neural reconstruction using posed image sequences have made remarkable progress. However, due to the lack of depth information, existing volumetric-based techniques simply duplicate 2D image features of the object surface along the entire camera ray. We contend this duplication introduces noise in empty and occluded spaces, posing challenges for producing high-quality 3D geometry. Drawing inspiration from traditional multi-view stereo methods, we propose an end-to-end 3D neural reconstruction framework CVRecon, designed to exploit the rich geometric embedding in the cost volumes to facilitate 3D geometric feature learning. Furthermore, we present Ray-contextual Compensated Cost Volume (RCCV), a novel 3D geometric feature representation that encodes view-dependent information with improved integrity and robustness. Through comprehensive experiments, we demonstrate that our approach significantly improves the reconstruction quality in various metrics and recovers clear fine details of the 3D geometries. Our extensive ablation studies provide insights into the development of effective 3D geometric feature learning schemes. Project page: https://cvrecon.ziyue.cool/
- North America > United States > New York (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
RePAST: Relative Pose Attention Scene Representation Transformer
Safin, Aleksandr, Duckworth, Daniel, Sajjadi, Mehdi S. M.
The Scene Representation Transformer (SRT) is a recent method to render novel views at interactive rates. Since SRT uses camera poses with respect to an arbitrarily chosen reference camera, it is not invariant to the order of the input views. As a result, SRT is not directly applicable to large-scale scenes where the reference frame would need to be changed regularly. In this work, we propose Relative Pose Attention SRT (RePAST): Instead of fixing a reference frame at the input, we inject pairwise relative camera pose information directly into the attention mechanism of the Transformers. This leads to a model that is by definition invariant to the choice of any global reference frame, while still retaining the full capabilities of the original method. Empirical results show that adding this invariance to the model does not lead to a loss in quality. We believe that this is a step towards applying fully latent transformer-based rendering methods to large-scale scenes.
Breakthrough AI Technique Enables Real-Time Rendering of Scenes in 3D From 2D Images
To represent a 3D scene from a 2D image, a light field network encodes the 360-degree light field of the 3D scene into a neural network that directly maps each camera ray to the color observed by that ray. The new machine-learning system can generate a 3D scene from an image about 15,000 times faster than other methods. Humans are pretty good at looking at a single two-dimensional image and understanding the full three-dimensional scene that it captures. Artificial intelligence agents are not. Yet a machine that needs to interact with objects in the world -- like a robot designed to harvest crops or assist with surgery -- must be able to infer properties about a 3D scene from observations of the 2D images it's trained on.
- North America > United States (0.15)
- Asia > Singapore (0.05)
NeRF
We present a method that achieves state-of-the-art results for synthesizing novel views of complex scenes by optimizing an underlying continuous volumetric scene function using a sparse set of input views. Our algorithm represents a scene using a fully connected (nonconvolutional) deep network, whose input is a single continuous 5D coordinate (spatial location (x, y, z) and viewing direction (ϑ,)) and whose output is the volume density and view-dependent emitted radiance at that spatial location. We synthesize views by querying 5D coordinates along camera rays and use classic volume rendering techniques to project the output colors and densities into an image. Because volume rendering is naturally differentiable, the only input required to optimize our representation is a set of images with known camera poses. We describe how to effectively optimize neural radiance fields to render photorealistic novel views of scenes with complicated geometry and appearance, and demonstrate results that outperform prior work on neural rendering and view synthesis. In this work, we address the long-standing problem of view synthesis in a new way. View synthesis is the problem of rendering new views of a scene from a given set of input images and their respective camera poses.
- North America > United States > California > Alameda County > Berkeley (0.05)
- North America > United States > Oklahoma > Beaver County (0.05)
- North America > United States > California > Santa Clara County > Mountain View (0.04)
- (3 more...)
Technique enables real-time rendering of scenes in 3D
Humans are pretty good at looking at a single two-dimensional image and understanding the full three-dimensional scene that it captures. Artificial intelligence agents are not. Yet a machine that needs to interact with objects in the world--like a robot designed to harvest crops or assist with surgery--must be able to infer properties about a 3D scene from observations of the 2D images it's trained on. While scientists have had success using neural networks to infer representations of 3D scenes from images, these machine learning methods aren't fast enough to make them feasible for many real-world applications. A new technique demonstrated by researchers at MIT and elsewhere is able to represent 3D scenes from images about 15,000 times faster than some existing models.
- North America > United States > Oklahoma > Beaver County (0.20)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
Technique enables real-time rendering of scenes in 3D
Humans are pretty good at looking at a single two-dimensional image and understanding the full three-dimensional scene that it captures. Artificial intelligence agents are not. Yet a machine that needs to interact with objects in the world -- like a robot designed to harvest crops or assist with surgery -- must be able to infer properties about a 3D scene from observations of the 2D images it's trained on. While scientists have had success using neural networks to infer representations of 3D scenes from images, these machine learning methods aren't fast enough to make them feasible for many real-world applications. A new technique demonstrated by researchers at MIT and elsewhere is able to represent 3D scenes from images about 15,000 times faster than some existing models.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.40)
- North America > United States > Oklahoma > Beaver County (0.20)
- Asia > Singapore (0.05)
Active Safety Envelopes using Light Curtains with Probabilistic Guarantees
Ancha, Siddharth, Pathak, Gaurav, Narasimhan, Srinivasa G., Held, David
To safely navigate unknown environments, robots must accurately perceive dynamic obstacles. Instead of directly measuring the scene depth with a LiDAR sensor, we explore the use of a much cheaper and higher resolution sensor: programmable light curtains. Light curtains are controllable depth sensors that sense only along a surface that a user selects. We use light curtains to estimate the safety envelope of a scene: a hypothetical surface that separates the robot from all obstacles. We show that generating light curtains that sense random locations (from a particular distribution) can quickly discover the safety envelope for scenes with unknown objects. Importantly, we produce theoretical safety guarantees on the probability of detecting an obstacle using random curtains. We combine random curtains with a machine learning based model that forecasts and tracks the motion of the safety envelope efficiently. Our method accurately estimates safety envelopes while providing probabilistic safety guarantees that can be used to certify the efficacy of a robot perception system to detect and avoid dynamic obstacles. We evaluate our approach in a simulated urban driving environment and a real-world environment with moving pedestrians using a light curtain device and show that we can estimate safety envelopes efficiently and effectively. Project website: https://siddancha.github.io/projects/active-safety-envelopes-with-guarantees
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > Latvia > Riga Municipality > Riga (0.04)
- Asia > China (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)