Logic-RAG: Augmenting Large Multimodal Models with Visual-Spatial Knowledge for Road Scene Understanding

Kabir, Imran, Reza, Md Alimoor, Billah, Syed

Mar-16-2025–arXiv.org Artificial Intelligence

Large multimodal models (LMMs) are increasingly integrated into autonomous driving systems for user interaction. However, their limitations in fine-grained spatial reasoning pose challenges for system interpretability and user trust. We introduce Logic-RAG, a novel Retrieval-Augmented Generation (RAG) framework that improves LMMs' spatial understanding in driving scenarios. Logic-RAG constructs a dynamic knowledge base (KB) about object-object relationships in first-order logic (FOL) using a perception module, a query-to-logic embedder, and a logical inference engine. We evaluated Logic-RAG on visual-spatial queries using both synthetic and real-world driving videos. When using popular LMMs (GPT-4V, Claude 3.5) as proxies for an autonomous driving system, these models achieved only 55% accuracy on synthetic driving scenes and under 75% on real-world driving scenes. Augmenting them with Logic-RAG increased their accuracies to over 80% and 90%, respectively. An ablation study showed that even without logical inference, the fact-based context constructed by Logic-RAG alone improved accuracy by 15%. Logic-RAG is extensible: it allows seamless replacement of individual components with improved versions and enables domain experts to compose new knowledge in both FOL and natural language. In sum, Logic-RAG addresses critical spatial reasoning deficiencies in LMMs for autonomous driving applications. Code and data are available at https://github.com/Imran2205/LogicRAG.

large language model, logic & formal reasoning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

Mar-16-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Pennsylvania > Centre County
    - State College (0.04)
  - Iowa > Polk County
    - Des Moines (0.04)
- Asia
  - Singapore (0.04)
  - Indonesia > Bali (0.04)

Genre:
- Research Report > New Finding (0.66)

Industry:
- Transportation (0.90)
- Information Technology (0.75)

Technology:
- Information Technology > Artificial Intelligence
  - Robots (1.00)
  - Natural Language > Large Language Model (1.00)
  - Representation & Reasoning
    - Spatial Reasoning (1.00)
    - Logic & Formal Reasoning (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found