OmniVL: OneFoundationModelforImage-Language andVideo-Language Tasks