Zero-shot Visual Relation Detection via Composite Visual Cues from Large Language Models

Open in new window