box annotation
- Europe > Switzerland > Zürich > Zürich (0.41)
- North America > United States > Illinois > Cook County > Chicago (0.40)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- (5 more...)
- Information Technology (0.93)
- Energy (0.67)
- Government > Regional Government (0.67)
- Information Technology > Information Management (1.00)
- Information Technology > Artificial Intelligence > Vision (0.93)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.47)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
792dd774336314c3c27a04bb260cf2cf-Supplemental.pdf
Finally,we train our model for 8hours on asingle V100GPU. We provide an illustration of our weakly supervised phrase grounding model in Figure 4b (this supplemental). Specifically,we create context-preserving negativecaptions for an image by substituting anoun in its original caption with negativenouns, that are sampled from apretrained BERT [17] model. Forexample,inthecase where only one cross-attention layer is used, adding the sentence-level contrastive loss leads to a 2.5%intheR@1accuracy. These videos contain transcribed narrations thatareeither uploaded manually byusersor aretheoutputofanautomatic speech recognition (ASR) system.
- Europe > Switzerland > Zürich > Zürich (0.41)
- North America > United States > Illinois > Cook County > Chicago (0.40)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- (5 more...)
- Information Technology (0.93)
- Energy (0.67)
- Government > Regional Government (0.67)
- Information Technology > Information Management (1.00)
- Information Technology > Artificial Intelligence > Vision (0.93)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.47)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.71)
- Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.68)
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper addresses the issue of object detection, in particular the challenge of obtaining bounding boxes on a scale similar to which category labels exist for object categorization. The authors side-step this challenge by proposing to adapt object classifiers for the detection task. Their algorithm is fairly simple and straightforward, which is not a bad thing in itself. Their experimental protocol uses 100 categories for training (with both category labels and bounding boxes), and tests on 100 left-out categories.
A Supplementary
In this supplementary material, we provide the following additions to the main submission: A.1. We use ReLU as the activation function. We provide an illustration of our weakly supervised phrase grounding model in Figure 4b (this supplemental). To incorporate our proposed CoMMA into the model of Gupta et al . Finally, the sentence loss is weighted by a hyperparameter.
Dur360BEV: A Real-world 360-degree Single Camera Dataset and Benchmark for Bird-Eye View Mapping in Autonomous Driving
E, Wenke, Yuan, Chao, Li, Li, Sun, Yixin, Gaus, Yona Falinie A., Atapour-Abarghouei, Amir, Breckon, Toby P.
We present Dur360BEV, a novel spherical camera autonomous driving dataset equipped with a high-resolution 128-channel 3D LiDAR and a RTK-refined GNSS/INS system, along with a benchmark architecture designed to generate Bird-Eye-View (BEV) maps using only a single spherical camera. This dataset and benchmark address the challenges of BEV generation in autonomous driving, particularly by reducing hardware complexity through the use of a single 360-degree camera instead of multiple perspective cameras. Within our benchmark architecture, we propose a novel spherical-image-to-BEV module that leverages spherical imagery and a refined sampling strategy to project features from 2D to 3D. Our approach also includes an innovative application of focal loss, specifically adapted to address the extreme class imbalance often encountered in BEV segmentation tasks, that demonstrates improved segmentation performance on the Dur360BEV dataset. The results show that our benchmark not only simplifies the sensor setup but also achieves competitive performance.
- Research Report > New Finding (0.48)
- Overview > Innovation (0.48)
- Transportation > Ground > Road (1.00)
- Automobiles & Trucks (1.00)
- Information Technology > Robotics & Automation (0.94)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.97)
- Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.68)