Unifying Vision-and-Language Tasks via Text Generation