Joint Modeling of Visual Objects and Relations for Scene Graph Generation (Supplementary Material)

Apr-25-2026, 14:22:48 GMT–Neural Information Processing Systems

Based on the formulation of the likelihood function pΘ(G|I) = fΘ(G,I)/ZΘ(I), we can reformulate the gradient of log-likelihood function as: ΘL(Θ) = EG pd[ Θ log fΘ(G,I)] Θ log ZΘ(I). Theorem 2. In the initialization phase, the potential function ψtriplet(r,yoh,yot) for modeling label dependency is omitted in p(G|I), yielding a simplified model distribution ˆp(G|I). Now, we can exactly derive that q(G) = ˆp(G|I). Theorem 3. In the update phase, we use the full expression of p(G|I) with the potential function ψtriplet(r,yoh,yot) for modeling label dependency. In this case, maximizing L(q) is equivalent to minimizing the KL divergence term, and the minimum occurs when q(yo) = p(yo,I).

artificial intelligence, const, triplet, (10 more...)

Neural Information Processing Systems

Apr-25-2026, 14:22:48 GMT

Conferences PDF

Add feedback

Country:
- Asia > China (0.15)
- North America > Canada
  - Quebec (0.15)

Technology:
- Information Technology > Artificial Intelligence (0.49)

Duplicate Docs Excel Report

Title
Joint Modeling of Visual Objects and Relations for Scene Graph Generation (Supplementary Material)

Similar Docs Excel Report more

Title	Similarity	Source
None found