Expert Systems
REVIVE: RegionalVisualRepresentationMattersin Knowledge-BasedVisualQuestionAnswering
This paper revisits visual representation in knowledge-based visual question answering(VQA)anddemonstrates thatusingregionalinformation inabetterway can significantly improve the performance. While visual representation is extensively studied in traditional VQA, it is under-explored in knowledge-based VQA even though these two tasks share the common spirit, i.e., rely on visual inputtoanswerthequestion.
Long-Horizon Planning for Multi-Agent Robots in Partially Observable Environments
The ability of Language Models (LMs) to understand natural language makes them a powerful tool for parsing human instructions into task plans for autonomous robots. Unlike traditional planning methods that rely on domain-specific knowledge and handcrafted rules, LMs generalize from diverse data and adapt to various tasks with minimal tuning, acting as a compressed knowledge base.