Microsoft Proposes GODIVA, A Text-To-Video Machine Learning Framework

#artificialintelligence 

A collaboration between Microsoft Research Asia and Duke University has produced a machine learning system capable of generating video solely from a text prompt, without the use of Generative Adversarial Networks (GANs). The project is titled GODIVA (Generating Open-DomaIn Videos from nAtural Descriptions), and builds on some of the approaches used by OpenAI's DALL-E image synthesis system, revealed earlier this year. Early results from GODIVA, with frames from videos created from two prompts. The top two examples were generated from the prompt'Play golf on grass', and the bottom third from the prompt'A baseball game is played'. GODIVA uses the Vector Quantised-Variational AutoEncoder (VQ-VAE) model first introduced by researchers from Google's DeepMind project in 2018, and also an essential component in DALL-E's transformational capabilities. Earlier work: VQ-VAE infers frames from very limited supplied source material.