SEER-VAR: Semantic Egocentric Environment Reasoner for Vehicle Augmented Reality

Lai, Yuzhi, Yuan, Shenghai, Li, Peizheng, Lou, Jun, Zell, Andreas

arXiv.org Artificial Intelligence 

Unlike existing systems that assume static or single-view settings, SEER-V AR dynamically separates cabin and road scenes via depth-guided vision-language grounding. Two SLAM branches track egocentric motion in each context, while a GPT -based module generates context-aware overlays such as dashboard cues and hazard alerts. To support evaluation, we introduce EgoSLAM-Drive, a real-world dataset featuring synchronized egocentric views, 6DoF ground-truth poses, and AR annotations across diverse driving scenarios. Experiments demonstrate that SEER-V AR achieves robust spatial alignment and perceptually coherent AR rendering across varied environments. As one of the first to explore LLM-based AR recommendation in egocentric driving, we address the lack of comparable systems through structured prompting and detailed user studies. Results show that SEER-V AR enhances perceived scene understanding, overlay relevance, and driver ease, providing an effective foundation for future research in this direction. Code and dataset will be made open source.