Talk2BEV: Language-enhanced Bird's-eye View Maps for Autonomous Driving

Choudhary, Tushar, Dewangan, Vikrant, Chandhok, Shivam, Priyadarshan, Shubham, Jain, Anushka, Singh, Arun K., Srivastava, Siddharth, Jatavallabhula, Krishna Murthy, Krishna, K. Madhava

Nov-14-2023–arXiv.org Artificial Intelligence

Talk2BEV is a large vision-language model (LVLM) interface for bird's-eye view (BEV) maps in autonomous driving contexts. While existing perception systems for autonomous driving scenarios have largely focused on a pre-defined (closed) set of object categories and driving scenarios, Talk2BEV blends recent advances in general-purpose language and vision models with BEV-structured map representations, eliminating the need for task-specific models. This enables a single system to cater to a variety of autonomous driving tasks encompassing visual and spatial reasoning, predicting the intents of traffic actors, and decision-making based on visual cues. We extensively evaluate Talk2BEV on a large number of scene understanding tasks that rely on both the ability to interpret free-form natural language queries, and in grounding these queries to the visual context embedded into the language-enhanced BEV map. To enable further research in LVLMs for autonomous driving scenarios, we develop and release Talk2BEV-Bench, a benchmark encompassing 1000 human-annotated BEV scenarios, with more than 20,000 questions and ground-truth responses from the NuScenes dataset.

lvlm, talk2bev, vehicle, (15 more...)

arXiv.org Artificial Intelligence

Nov-14-2023

arXiv.org PDF

Add feedback

Country:
- North America > Canada
  - British Columbia (0.04)
- Europe > Estonia
  - Tartu County > Tartu (0.04)

Genre:
- Research Report (0.40)

Industry:
- Information Technology (1.00)
- Automobiles & Trucks (1.00)
- Transportation > Ground
  - Road (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Robots > Autonomous Vehicles (1.00)
  - Representation & Reasoning (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.49)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found