Self-Training Vision Language BERTs with a Unified Conditional Model

Open in new window