Goto

Collaborating Authors

 visual object and relation


Joint Modeling of Visual Objects and Relations for Scene Graph Generation (Supplementary Material)

Neural Information Processing Systems

Now, we can exactly derive that q (G) = ˆ p( G|I) . The definitions of potential function φ and ψ follow those in JM-SGG model. Figure 1: The scene graphs generated by JM-SGG model. In these examples, factor update is able to correct some wrong relation labels ( e.g.


Joint Modeling of Visual Objects and Relations for Scene Graph Generation

Neural Information Processing Systems

An in-depth scene understanding usually requires recognizing all the objects and their relations in an image, encoded as a scene graph. Most existing approaches for scene graph generation first independently recognize each object and then predict their relations independently. Though these approaches are very efficient, they ignore the dependency between different objects as well as between their relations. In this paper, we propose a principled approach to jointly predict the entire scene graph by fully capturing the dependency between different objects and between their relations. Specifically, we establish a unified conditional random field (CRF) to model the joint distribution of all the objects and their relations in a scene graph. We carefully design the potential functions to enable relational reasoning among different objects according to knowledge graph embedding methods. We further propose an efficient and effective algorithm for inference based on mean-field variational inference, in which we first provide a warm initialization by independently predicting the objects and their relations according to the current model, followed by a few iterations of relational reasoning. Experimental results on both the relationship retrieval and zero-shot relationship retrieval tasks prove the efficiency and efficacy of our proposed approach.


Joint Modeling of Visual Objects and Relations for Scene Graph Generation (Supplementary Material)

Neural Information Processing Systems

Now, we can exactly derive that q (G) = ˆ p( G|I) . The definitions of potential function φ and ψ follow those in JM-SGG model. Figure 1: The scene graphs generated by JM-SGG model. In these examples, factor update is able to correct some wrong relation labels ( e.g.


Joint Modeling of Visual Objects and Relations for Scene Graph Generation

Neural Information Processing Systems

An in-depth scene understanding usually requires recognizing all the objects and their relations in an image, encoded as a scene graph. Most existing approaches for scene graph generation first independently recognize each object and then predict their relations independently. Though these approaches are very efficient, they ignore the dependency between different objects as well as between their relations. In this paper, we propose a principled approach to jointly predict the entire scene graph by fully capturing the dependency between different objects and between their relations. Specifically, we establish a unified conditional random field (CRF) to model the joint distribution of all the objects and their relations in a scene graph.