Single-Stage Visual Relationship Learning using Conditional Queries

Oct-11-2024, 02:37:40 GMT–Neural Information Processing Systems

Research in scene graph generation (SGG) usually considers two-stage models, that is, detecting a set of entities, followed by combining them and labeling all possible relationships. While showing promising results, the pipeline structure induces large parameter and computation overhead, and typically hinders end-to-end optimizations. To address this, recent research attempts to train single-stage models that are more computationally efficient. With the advent of DETR, a set-based detection model, one-stage models attempt to predict a set of subject-predicate-object triplets directly in a single shot. However, SGG is inherently a multi-task learning problem that requires modeling entity and predicate distributions simultaneously.

conditional query, multi-task learning problem, single-stage visual relationship learning, (1 more...)

Neural Information Processing Systems

Oct-11-2024, 02:37:40 GMT

Conferences Web Page

Add feedback

Industry:
- Education > Focused Education > Special Education (0.32)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)