DKPROMPT: Domain Knowledge Prompting Vision-Language Models for Open-World Planning

Zhang, Xiaohan, Altaweel, Zainab, Hayamizu, Yohei, Ding, Yan, Amiri, Saeid, Yang, Hao, Kaminski, Andy, Esselink, Chad, Zhang, Shiqi

Jun-25-2024–arXiv.org Artificial Intelligence

Prompting foundation models such as large language models (LLMs) and vision-language models (VLMs) requires extensive domain knowledge and manual efforts, resulting in the so-called "prompt engineering" problem. To improve the performance of foundation models, one can provide examples explicitly [1] or implicitly [2], or encourage intermediate reasoning steps [3, 4]. Despite all the efforts, their performance in long-horizon reasoning tasks is still limited. Classical planning methods, including those defined by Planning Domain Definition Language (PDDL), are strong in ensuring the soundness, completeness and efficiency in planning tasks [5]. However, those classical planners rely on predefined states and actions, and do not perform well in open-world scenarios. We aim to enjoy the openness of VLMs in scene understanding while retaining the strong long-horizon reasoning capabilities of classical planners. Our key idea is to extract domain knowledge from classical planners for prompting VLMs towards enabling classical planners that are visually grounded and responsive to open-world situations. Given the natural connection between planning symbols and human language, this paper investigates how pre-trained VLMs can assist the robot in realizing symbolic plans generated by classical planners, while avoiding the engineering efforts of checking the outcomes of each action.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

Jun-25-2024

arXiv.org PDF

Add feedback

Country:
- Asia > Middle East
  - Israel (0.14)
- North America > United States
  - New York (0.14)

Genre:
- Overview (0.68)
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.47)
  - Natural Language > Large Language Model (1.00)
  - Representation & Reasoning > Planning & Scheduling (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found