Open Visual Knowledge Extraction via Relation-Oriented Multimodality Model Prompting

Feb-8-2025, 18:31:47 GMT–Neural Information Processing Systems

Images contain rich relational knowledge that can help machines understand the world. Existing methods on visual knowledge extraction often rely on the pre-defined format (e.g., sub-verb-obj tuples) or vocabulary (e.g., relation types), restricting the expressiveness of the extracted knowledge. In this work, we take a first exploration to a new paradigm of open visual knowledge extraction. To achieve this, we present OpenVik which consists of an open relational region detector to detect regions potentially containing relational knowledge and a visual knowledge generator that generates format-free knowledge by prompting the large multimodality model with the detected region of interest. We also explore two data enhancement techniques for diversifying the generated format-free visual knowledge.

knowledge, open visual knowledge extraction, relation-oriented multimodality model prompting, (2 more...)

Neural Information Processing Systems

Feb-8-2025, 18:31:47 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology
  - Data Science > Data Mining (0.90)
  - Artificial Intelligence (0.69)