Comparative Multi-View Language Grounding
Mitra, Chancharik, Anwar, Abrar, Corona, Rodolfo, Klein, Dan, Darrell, Trevor, Thomason, Jesse
–arXiv.org Artificial Intelligence
In this work, we consider the task of resolving object referents when given a comparative language description. We present a Multi-view Approach to Grounding in Context (MAGiC) that leverages transformers to pragmatically reason over both objects given multiple image views and a language description. In contrast to past efforts that attempt to connect vision and language for this task without fully considering the resulting referential context, MAGiC makes use of the comparative information by jointly reasoning over multiple views of both object referent candidates and the referring language expression. We present an analysis demonstrating that comparative reasoning contributes to SOTA performance on the SNARE object reference task.
arXiv.org Artificial Intelligence
Nov-13-2023
- Country:
- North America > United States > California (0.28)
- Genre:
- Research Report > New Finding (0.46)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning > Neural Networks
- Deep Learning (0.46)
- Natural Language (1.00)
- Representation & Reasoning > Object-Oriented Architecture (0.68)
- Robots (0.94)
- Vision (1.00)
- Machine Learning > Neural Networks
- Information Technology > Artificial Intelligence