Goto

Collaborating Authors

 scene graph generation


Joint Modeling of Visual Objects and Relations for Scene Graph Generation (Supplementary Material)

Neural Information Processing Systems

Based on the formulation of the likelihood function pΘ(G|I) = fΘ(G,I)/ZΘ(I), we can reformulate the gradient of log-likelihood function as: ΘL(Θ) = EG pd[ Θ log fΘ(G,I)] Θ log ZΘ(I). Theorem 2. In the initialization phase, the potential function ψtriplet(r,yoh,yot) for modeling label dependency is omitted in p(G|I), yielding a simplified model distribution ˆp(G|I). Now, we can exactly derive that q(G) = ˆp(G|I). Theorem 3. In the update phase, we use the full expression of p(G|I) with the potential function ψtriplet(r,yoh,yot) for modeling label dependency. In this case, maximizing L(q) is equivalent to minimizing the KL divergence term, and the minimum occurs when q(yo) = p(yo,I).


LinkNet: Relational Embedding for Scene Graph

Neural Information Processing Systems

Objects and their relationships are critical contents for image understanding. A scene graph provides a structured description that captures these properties of an image. However, reasoning about the relationships between objects is very challenging and only a few recent works have attempted to solve the problem of generating a scene graph from an image. In this paper, we present a novel method that improves scene graph generation by explicitly modeling inter-dependency among the entire object instances. We design a simple and effective relational embedding module that enables our model to jointly represent connections among all related objects, rather than focus on an object in isolation. Our novel method significantly benefits two main parts of the scene graph generation task: object classification and relationship classification. Using it on top of a basic Faster R-CNN, our model achieves state-of-the-art results on the Visual Genome benchmark.



4D Panoptic Scene Graph Generation Jingkang Y ang

Neural Information Processing Systems

Traditional 3D scene graph methods may recognize the static elements of this scene, such as identifying a booth situated on the ground. However, a more ideal, advanced, and dynamic perception is required for real-world scenarios.







Joint Modeling of Visual Objects and Relations for Scene Graph Generation (Supplementary Material)

Neural Information Processing Systems

Now, we can exactly derive that q (G) = ˆ p( G|I) . The definitions of potential function φ and ψ follow those in JM-SGG model. Figure 1: The scene graphs generated by JM-SGG model. In these examples, factor update is able to correct some wrong relation labels ( e.g.