Tackling Vision Language Tasks Through Learning Inner Monologues

Open in new window