Zero-shot Visual Relation Detection via Composite Visual Cues from Large Language Models ************Supplementary Document*****
–Neural Information Processing Systems
This supplementary document is organized as follows: The details about stimulated spatial images generation mentioned in Sec. B. The implementation details mentioned in Sec. D. The broader impacts of the proposed method are discussed in Sec. E. The limitations of the proposed method are presented in Sec. We propose to simulate the spatial relationship between the subject and object by generating a finite set of spatial images, as mentioned in Sec. Each spatial image represents the bboxes of the subject and object, where the subject's bounding box is visually denoted by a red box, and the object's bounding box is denoted by a green box. We define four essential attributes, namely shape, size, relative position, and distance, to describe the spatial relationships between the subject and object.
Neural Information Processing Systems
Mar-27-2025, 15:12:33 GMT
- Technology: