Discriminative Bimodal Networks for Visual Localization and Detection with Natural Language Queries

Open in new window