Zero-shot Visual Relation Detection via Composite Visual Cues from Large Language Models Lin Li

Neural Information Processing Systems 

Then, it leverages large language models (LLMs) to generate description-based prompts (or visual cues) for each component.