camera feature
Leveraging Vision-Centric Multi-Modal Expertise for 3D Object Detection
Current research is primarily dedicated to advancing the accuracy of camera-only 3D object detectors (apprentice) through the knowledge transferred from LiDAR- or multi-modal-based counterparts (expert). However, the presence of the domain gap between LiDAR and camera features, coupled with the inherent incompatibility in temporal fusion, significantly hinders the effectiveness of distillation-based enhancements for apprentices. Motivated by the success of uni-modal distillation, an apprentice-friendly expert model would predominantly rely on camera features, while still achieving comparable performance to multi-modal models. To this end, we introduce VCD, a framework to improve the camera-only apprentice model, including an apprentice-friendly multi-modal expert and temporal-fusion-friendly distillation supervision. The multi-modal expert VCD-E adopts an identical structure as that of the camera-only apprentice in order to alleviate the feature disparity, and leverages LiDAR input as a depth prior to reconstruct the 3D scene, achieving the performance on par with other heterogeneous multi-modal experts. Additionally, a fine-grained trajectory-based distillation module is introduced with the purpose of individually rectifying the motion misalignment for each object in the scene.
Kaninfradet3D:A Road-side Camera-LiDAR Fusion 3D Perception Model based on Nonlinear Feature Extraction and Intrinsic Correlation
Liu, Pei, Zheng, Nanfang, Li, Yiqun, Chen, Junlan, Pu, Ziyuan
With the development of AI-assisted driving, numerous methods have emerged for ego-vehicle 3D perception tasks, but there has been limited research on roadside perception. With its ability to provide a global view and a broader sensing range, the roadside perspective is worth developing. LiDAR provides precise three-dimensional spatial information, while cameras offer semantic information. These two modalities are complementary in 3D detection. However, adding camera data does not increase accuracy in some studies since the information extraction and fusion procedure is not sufficiently reliable. Recently, Kolmogorov-Arnold Networks (KANs) have been proposed as replacements for MLPs, which are better suited for high-dimensional, complex data. Both the camera and the LiDAR provide high-dimensional information, and employing KANs should enhance the extraction of valuable features to produce better fusion outcomes. This paper proposes Kaninfradet3D, which optimizes the feature extraction and fusion modules. To extract features from complex high-dimensional data, the model's encoder and fuser modules were improved using KAN Layers. Cross-attention was applied to enhance feature fusion, and visual comparisons verified that camera features were more evenly integrated. This addressed the issue of camera features being abnormally concentrated, negatively impacting fusion. Compared to the benchmark, our approach shows improvements of +9.87 mAP and +10.64 mAP in the two viewpoints of the TUMTraf Intersection Dataset and an improvement of +1.40 mAP in the roadside end of the TUMTraf V2X Cooperative Perception Dataset. The results indicate that Kaninfradet3D can effectively fuse features, demonstrating the potential of applying KANs in roadside perception tasks.
BiCo-Fusion: Bidirectional Complementary LiDAR-Camera Fusion for Semantic- and Spatial-Aware 3D Object Detection
3D object detection is an important task that has been widely applied in autonomous driving. Recently, fusing multi-modal inputs, i.e., LiDAR and camera data, to perform this task has become a new trend. Existing methods, however, either ignore the sparsity of Lidar features or fail to preserve the original spatial structure of LiDAR and the semantic density of camera features simultaneously due to the modality gap. To address issues, this letter proposes a novel bidirectional complementary Lidar-camera fusion framework, called BiCo-Fusion that can achieve robust semantic- and spatial-aware 3D object detection. The key insight is to mutually fuse the multi-modal features to enhance the semantics of LiDAR features and the spatial awareness of the camera features and adaptatively select features from both modalities to build a unified 3D representation. Specifically, we introduce Pre-Fusion consisting of a Voxel Enhancement Module (VEM) to enhance the semantics of voxel features from 2D camera features and Image Enhancement Module (IEM) to enhance the spatial characteristics of camera features from 3D voxel features. Both VEM and IEM are bidirectionally updated to effectively reduce the modality gap. We then introduce Unified Fusion to adaptively weight to select features from the enchanted Lidar and camera features to build a unified 3D representation. Extensive experiments demonstrate the superiority of our BiCo-Fusion against the prior arts. Project page: https://t-ys.github.io/BiCo-Fusion/.
Lift-Attend-Splat: Bird's-eye-view camera-lidar fusion using transformers
Gunn, James, Lenyk, Zygmunt, Sharma, Anuj, Donati, Andrea, Buburuzan, Alexandru, Redford, John, Mueller, Romain
Combining complementary sensor modalities is crucial to providing robust perception for safety-critical robotics applications such as autonomous driving (AD). Recent state-of-the-art camera-lidar fusion methods for AD rely on monocular depth estimation which is a notoriously difficult task compared to using depth information from the lidar directly. Here, we find that this approach does not leverage depth as expected and show that naively improving depth estimation does not lead to improvements in object detection performance and that, strikingly, removing depth estimation altogether does not degrade object detection performance. This suggests that relying on monocular depth could be an unnecessary architectural bottleneck during camera-lidar fusion. In this work, we introduce a novel fusion method that bypasses monocular depth estimation altogether and instead selects and fuses camera and lidar features in a bird's-eye-view grid using a simple attention mechanism. We show that our model can modulate its use of camera features based on the availability of lidar features and that it yields better 3D object detection on the nuScenes dataset than baselines relying on monocular depth estimation.
MaskedFusion360: Reconstruct LiDAR Data by Querying Camera Features
Wagner, Royden, Klemp, Marvin, Lopez, Carlos Fernandez
In self-driving applications, LiDAR data provides accurate information about distances in 3D but lacks the semantic richness of camera data. Therefore, state-of-the-art methods for perception in urban scenes fuse data from both sensor types. In this work, we introduce a novel self-supervised method to fuse LiDAR and camera data for self-driving applications. We build upon masked autoencoders (MAEs) and train deep learning models to reconstruct masked LiDAR data from fused LiDAR and camera features. In contrast to related methods that use birds-eye-view representations, we fuse features from dense spherical LiDAR projections and features from fish-eye camera crops with a similar field of view. Therefore, we reduce the learned spatial transformations to moderate perspective transformations and do not require additional modules to generate dense LiDAR representations. Code is available at: https://github.com/KIT-MRT/masked-fusion-360
Here's all the new stuff Google's Pixel 3 phone cameras can do
The Pixel 2 had arguably the best smartphone camera on the market, and Google wants to make sure it stays that way. During its Pixel unveiling today, it introduced a raft of new camera features for the Pixel 3 and Pixel 3XL smartphones, including an improved zoom, wider-angle camera, smile and blink detection, bokeh control and more -- all with just a single lens on the back. The quality is apparently good enough for Terrence Malick, who shot a video that was featured at the event, so it might be good enough for the rest of us, too. Some of the features are enabled with the fresh hardware, to be sure. There's a brand new 12.2-megapixel sensor on the back, with a sharper wide-angle lens to allow for zooming.
LG G7 ThinQ Revealed With Super Bright Display, Google Assistant And Smarter Cameras
LG Electronics has finally unveiled the LG G7 ThinQ in New York City. As expected, the LG G7 ThinQ features a notch on its display and impressive specs that could rival other Android flagships on the market. It features a tall 19.5:9 aspect ratio display and is covered by Corning Gorilla Glass 5. The G7 ThinQ also has a super bright display at 1,000 nits and supports 100 percent of the DCI-P3 color gamut. The display is said to be so bright that users should have no problem with visibility even when it's under direct sunlight.
LG V30S ThinQ Intelligent Camera Features Explained
LG Electronics took the wraps off of its upgraded V30 smartphone, aptly called V30S ThinQ, at Mobile World Congress 2018. The device has a striking resemblance to last year's model, but it has something new to offer: three intelligent camera features. After showcasing its AI platform ThinQ this past January, LG is now proudly showcasing its first mobile device that comes with such a technology. At the Barcelona event, LG unveiled the LG V30S ThinQ with smart features intended to provide an enhanced experience to users. "Many companies talk about AI but we're already delivering on our promise by integrating intelligent technology in the LG V30S ThinQ to features most commonly used by our customers for a whole new level of convenience never before available in a smartphone of this caliber," LG Mobile President Hwang Jeong-hwan said in a press release. What Hwang was referring to in his statement are the intelligent camera features of the new V30S ThinQ.
Samsung Galaxy S9 News Backs Up Camera Features, 3D Recognition Upgrades
Rumors continue to back up claims that Samsung will include 3D facial recognition technology on the upcoming Galaxy S9 smartphone. Most recently, Korean media shared reports that Samsung has ordered the components needed for 3D technology upgrades on the Galaxy S9 front camera. These upgrades are expected to improve both facial recognition and iris scanning technology Samsung already uses on its predecessor devices. News of Samsung using 3D facial recognition technology surfaced after the announcement and release of the iPhone X, which features Face ID as its sole biometrics option. Meanwhile, Samsung's recent flagships, including the Galaxy S8 and Galaxy Note 8 include 2D facial recognition and iris scanning technology, in addition to a rear fingerprint scanner. Though Samsung has not confirmed any rumors about the Galaxy S9, the device will likely include this same biometrics configuration with the aforementioned upgrades.
iOS 11 camera features may include scene recognition
Smartphones may have effectively killed off dedicated point-and-shoot cameras, but Apple is looking to them for inspiration with iOS 11. Developers have dug through beta firmware for the HomePod, and tucked inside the code for Apple's smart speaker, there are hints that the next version of its mobile OS will feature something called "SmartCam." It will tune camera settings based on the scene it detects pic.twitter.com/7duyvh5Ecj If you've ever used a point-and-shoot camera, the feature should sound pretty familiar: different scene modes and photo settings depending on what you're shooting. The "smart" in its name suggests that maybe machine learning will play a role here as well, potentially analyzing the scene for you and picking the best settings. This might not use machine learning to improve photography a la what Google does with the Pixel, but it could make Apple's woefully basic camera app a little more full featured.