Learning Human-Human Interactions in Images from Weak Textual Supervision

Open in new window