s-eye view
CRT-Fusion: Camera, Radar, Temporal Fusion Using Motion Information for 3D Object Detection
Accurate and robust 3D object detection is a critical component in autonomous vehicles and robotics. While recent radar-camera fusion methods have made significant progress by fusing information in the bird's-eye view (BEV) representation, they often struggle to effectively capture the motion of dynamic objects, leading to limited performance in real-world scenarios. In this paper, we introduce CRT-Fusion, a novel framework that integrates temporal information into radar-camera fusion to address this challenge. Our approach comprises three key modules: Multi-View Fusion (MVF), Motion Feature Estimator (MFE), and Motion Guided Temporal Fusion (MGTF).
CRT-Fusion: Camera, Radar, Temporal Fusion Using Motion Information for 3D Object Detection
Accurate and robust 3D object detection is a critical component in autonomous vehicles and robotics. While recent radar-camera fusion methods have made significant progress by fusing information in the bird's-eye view (BEV) representation, they often struggle to effectively capture the motion of dynamic objects, leading to limited performance in real-world scenarios. In this paper, we introduce CRT-Fusion, a novel framework that integrates temporal information into radar-camera fusion to address this challenge. Our approach comprises three key modules: Multi-View Fusion (MVF), Motion Feature Estimator (MFE), and Motion Guided Temporal Fusion (MGTF). The MFE module conducts two simultaneous tasks: estimation of pixel-wise velocity information and BEV segmentation.
A Resource Efficient Fusion Network for Object Detection in Bird's-Eye View using Camera and Raw Radar Data
Chandrasekaran, Kavin, Grigorescu, Sorin, Dubbelman, Gijs, Jancura, Pavol
Cameras can be used to perceive the environment around the vehicle, while affordable radar sensors are popular in autonomous driving systems as they can withstand adverse weather conditions unlike cameras. However, radar point clouds are sparser with low azimuth and elevation resolution that lack semantic and structural information of the scenes, resulting in generally lower radar detection performance. In this work, we directly use the raw range-Doppler (RD) spectrum of radar data, thus avoiding radar signal processing. We independently process camera images within the proposed comprehensive image processing pipeline. Specifically, first, we transform the camera images to Bird's-Eye View (BEV) Polar domain and extract the corresponding features with our camera encoder-decoder architecture. The resultant feature maps are fused with Range-Azimuth (RA) features, recovered from the RD spectrum input from the radar decoder to perform object detection. We evaluate our fusion strategy with other existing methods not only in terms of accuracy but also on computational complexity metrics on RADIal dataset.
- Europe > Switzerland (0.04)
- Europe > Romania > Centru Development Region > Brașov County > Brașov (0.04)
- Europe > Netherlands > North Brabant > Eindhoven (0.04)
- Asia > Middle East > Israel (0.04)
YOLO-BEV: Generating Bird's-Eye View in the Same Way as 2D Object Detection
Liu, Chang, Zhou, Liguo, Huang, Yanliang, Knoll, Alois
Vehicle perception systems strive to achieve comprehensive and rapid visual interpretation of their surroundings for improved safety and navigation. We introduce YOLO-BEV, an efficient framework that harnesses a unique surrounding cameras setup to generate a 2D bird's-eye view of the vehicular environment. By strategically positioning eight cameras, each at a 45-degree interval, our system captures and integrates imagery into a coherent 3x3 grid format, leaving the center blank, providing an enriched spatial representation that facilitates efficient processing. In our approach, we employ YOLO's detection mechanism, favoring its inherent advantages of swift response and compact model structure. Instead of leveraging the conventional YOLO detection head, we augment it with a custom-designed detection head, translating the panoramically captured data into a unified bird's-eye view map of ego car. Preliminary results validate the feasibility of YOLO-BEV in real-time vehicular perception tasks. With its streamlined architecture and potential for rapid deployment due to minimized parameters, YOLO-BEV poses as a promising tool that may reshape future perspectives in autonomous driving systems.
- North America > United States > Nevada > Clark County > Las Vegas (0.04)
- Europe > Switzerland (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- (2 more...)
Pishgu: Universal Path Prediction Network Architecture for Real-time Cyber-physical Edge Systems
Noghre, Ghazal Alinezhad, Katariya, Vinit, Pazho, Armin Danesh, Neff, Christopher, Tabkhi, Hamed
Path prediction is an essential task for many real-world Cyber-Physical Systems (CPS) applications, from autonomous driving and traffic monitoring/management to pedestrian/worker safety. These real-world CPS applications need a robust, lightweight path prediction that can provide a universal network architecture for multiple subjects (e.g., pedestrians and vehicles) from different perspectives. However, most existing algorithms are tailor-made for a unique subject with a specific camera perspective and scenario. This article presents Pishgu, a universal lightweight network architecture, as a robust and holistic solution for path prediction. Pishgu's architecture can adapt to multiple path prediction domains with different subjects (vehicles, pedestrians), perspectives (bird's-eye, high-angle), and scenes (sidewalk, highway). Our proposed architecture captures the inter-dependencies within the subjects in each frame by taking advantage of Graph Isomorphism Networks and the attention module. We separately train and evaluate the efficacy of our architecture on three different CPS domains across multiple perspectives (vehicle bird's-eye view, pedestrian bird's-eye view, and human high-angle view). Pishgu outperforms state-of-the-art solutions in the vehicle bird's-eye view domain by 42% and 61% and pedestrian high-angle view domain by 23% and 22% in terms of ADE and FDE, respectively. Additionally, we analyze the domain-specific details for various datasets to understand their effect on path prediction and model interpretation. Finally, we report the latency and throughput for all three domains on multiple embedded platforms showcasing the robustness and adaptability of Pishgu for real-world integration into CPS applications.
- North America > United States > North Carolina > Mecklenburg County > Charlotte (0.04)
- Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
- North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
- (4 more...)
- Information Technology (0.89)
- Transportation > Ground > Road (0.66)
3D Parametric Room Representation with RoomPlan
More recently the release of LiDAR sensor functionality in Apple iPhone and iPad has begun a new era in scene understanding for the computer vision and developer communities. Fundamental research in scene understanding combined with the advances in ML can now impact everyday experiences. A variety of methods are addressing different parts of the challenge, like depth estimation, 3D reconstruction, instance segmentation, object detection, and more. Among these problems, creating a 3D floor plan is becoming key for many applications in augmented reality, robotics, e-commerce, games, and real estate. To address automatic 3D floor-plan generation, Apple released RoomPlan in 2022.
Machine learning gives us a dog's-eye view
Dog's minds are being read! Researchers have used fMRI (functional magnetic resonance imaging) scans of dogs' brains and a machine learning tool to reconstruct what the pooch is seeing. The results suggest that dogs are more interested in what is happening than who or what is involved. The results of the experiment conducted at Emory University in Georgia in the US are published in the Journal of Visualized Experiments. Two unrestrained dogs were shown three 30-minute videos.
- North America > United States (0.25)
- Europe > United Kingdom > Scotland (0.05)
- Africa > Mozambique (0.05)
- Health & Medicine > Health Care Technology (0.59)
- Health & Medicine > Diagnostic Medicine > Imaging (0.56)
HDMapNet: An Online HD Map Construction and Evaluation Framework
Li, Qi, Wang, Yue, Wang, Yilun, Zhao, Hang
High-definition map (HD map) construction is a crucial problem for autonomous driving. This problem typically involves collecting high-quality point clouds, fusing multiple point clouds of the same scene, annotating map elements, and updating maps constantly. This pipeline, however, requires a vast amount of human efforts and resources which limits its scalability. Additionally, traditional HD maps are coupled with centimeter-level accurate localization which is unreliable in many scenarios. In this paper, we argue that online map learning, which dynamically constructs the HD maps based on local sensor observations, is a more scalable way to provide semantic and geometry priors to self-driving vehicles than traditional pre-annotated HD maps. Meanwhile, we introduce an online map learning method, titled HDMapNet. It encodes image features from surrounding cameras and/or point clouds from LiDAR, and predicts vectorized map elements in the bird's-eye view. We benchmark HDMapNet on the nuScenes dataset and show that in all settings, it performs better than baseline methods. Of note, our fusion-based HDMapNet outperforms existing methods by more than 50% in all metrics. To accelerate future research, we develop customized metrics to evaluate map learning performance, including both semantic-level and instance-level ones. By introducing this method and metrics, we invite the community to study this novel map learning problem. We will release our code and evaluation kit to facilitate future development.
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Instructional Material (0.68)
- Research Report (0.50)
- Transportation > Ground > Road (0.35)
- Education (0.34)
Ring unveils the Floodlight Cam Wired Pro, with radar-powered bird's-eye view
Just a couple of months after Ring unwrapped its new, radar-enabled aerial view for the Video Doorbell Pro 2, the Amazon-owned smart brand is now rolling out the clever technology to its updated wired floodlight. At the same time, Ring says it's bringing a color version of its pre-roll video feature to a fourth generation of its battery-powered video doorbell. Slated to ship on May 6 for $250 (you can preorder starting today), the Ring Floodlight Cam Wired Pro will boast both Bird's-Eye View and 3D Motion Detection, a pair of features powered by radar rather than infrared motion sensors. Meanwhile, the Ring Video Doorbell 4 is set to arrive April 28 for $200, and it will add color to the pre-roll functionality that debuted on last year's Video Doorbell 3 Plus. An upgrade to 2019's well received Floodlight Cam, the revamped Floodlight Cam Wired Pro arrives with the same 1080p video resolution while adding HDR for a needed contrast boost, along with a 140-degree (horizontal) by 60-degree (vertical) field of view.
- Commercial Services & Supplies > Security & Alarm Services (0.60)
- Energy (0.38)
Ring's next video doorbell uses radar and includes a bird's-eye view
Ring unveiled a new advanced video doorbell that uses radar to better detect objects and provide a bird's-eye view. The Ring Video Doorbell Pro 2 uses the radar as part of its 3D motion detection and will allow users to select distance thresholds to decide when the camera will start recording. The radar also powers a new bird's-eye view to give homeowners an aerial view of their property and a map of when an event triggering the motion detection started. "With the introduction of these radar-based features, we're reinventing what our devices can do to give our customers a more precise picture of what is happening at home," said Jamie Siminoff, Ring's founder and chief inventor, in a statement. End of an era:West Coast chain Fry's Electronics is going out of business after 36 years'Buy and try':Samsung wants to give you 100 days to try its foldable devices The pro doorbell will also add head-to-toe video and an array microphone for clearer audio.