Open Visual Knowledge Extraction via Relation-Oriented Multimodality Model Prompting

Neural Information Processing Systems 

Existing methods on visual knowledge extraction often rely on the predefined format (e.g., sub-verb-obj tuples) or vocabulary (e.g., relation types), restricting the expressiveness of the extracted knowledge. In this work, we take a first exploration to a new paradigm of open visual knowledge extraction.