Goto

Collaborating Authors

 make-a-video


Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning

Girdhar, Rohit, Singh, Mannat, Brown, Andrew, Duval, Quentin, Azadi, Samaneh, Rambhatla, Sai Saketh, Shah, Akbar, Yin, Xi, Parikh, Devi, Misra, Ishan

arXiv.org Artificial Intelligence

We present Emu Video, a text-to-video generation model that factorizes the generation into two steps: first generating an image conditioned on the text, and then generating a video conditioned on the text and the generated image. We identify critical design decisions--adjusted noise schedules for diffusion, and multi-stage training--that enable us to directly generate high quality and high resolution videos, without requiring a deep cascade of models as in prior work. In human evaluations, our generated videos are strongly preferred in quality compared to all prior work--81% vs. Google's Imagen Video, 90% vs. Nvidia's PYOCO, and 96% vs. Meta's Make-A-Video. Our model outperforms commercial solutions such as RunwayML's Gen2 and Pika Labs. Finally, our factorizing approach naturally lends itself to animating images based on a user's text prompt, where our generations are preferred 96% over prior work.


ChatGPT-4 is coming this week and will be able to turn text into VIDEO

Daily Mail - Science & tech

ChatGPT, the revolutionary chatbot powered by artificial intelligence (AI), will soon be able to do much more than send human-like text messages. A Microsoft executive has revealed that the next version - set to be released this week - will be able to turn text prompts into unique videos. The tech giant has invested heavily in ChatGPT, and has already unveiled a host of new products which incorporate it as an AI assistant, like search engine Bing. But this updated version, dubbed GPT-4 and tipped to launch on Thursday, will have'multimodal models', according to Microsoft Germany CTO Andreas Braun. This means that it will be able to generate content in multiple formats, like audio clips, images and video clips, from a text prompt.


The 5 most important recent developments in AI

#artificialintelligence

From solving maths and science problems to translating with astonishing accuracy between hundreds of languages – not to mention generating images and videos based on a natural language prompt – AI is making strides pretty much across the board. In this article, I'll briefly discuss some of the most recent (and the most exciting!) So, without further ado, let's dive in! Released on 1 August 2022, Minerva is a language model capable of not only solving maths and science problems submitted in the form of natural language, but also of providing its reasoning behind the answer. So far, Google has built three versions of the model, getting bigger with each iteration.


How AI Transformed the Art World in 2022

#artificialintelligence

The AI community has a new obsession. It's called'generative artificial intelligence', and it refers to the idea of having computers take over creative tasks such as writing, filmmaking, and graphic design. AI art generators are paving a new path towards the freedom of artistic expression. In an extremely short period, they've allowed everybody with internet access and a keyboard to generate incredible art from simple text prompts. Considering the current state of things, it's too early to tell whether this new wave of apps will end up costing artists and illustrators their jobs. What seems clear though is that these tools are already being used in creative industries.


Meta Announces Video Generation AI Model Make-a-Video

#artificialintelligence

Meta AI recently announced Make-A-Video, a text-to-video generation AI model. Make-A-Video is trained using publicly available image-text pairs and video-only data and achieves state-of-the-art performance on the UCF-101 video-generation benchmark. The model and a set of experiments were described in a paper published on arXiv. Unlike some other text-to-video (T2V) models, Make-a-Video does not require a dataset of text-video pairs. Instead, it is based on existing text-image pair models, which generate single-frame images from a text description.


Make-A-Video: Text-to-Video Generation's Next... Generation? - NAB Amplify

#artificialintelligence

The inevitable has happened, albeit a little sooner than expected. After all the hoopla surrounding text-to-image AI generators in recent months, Meta is first out of the gate with a text-to-video version. Perhaps Meta wanted to establish some headline leadership in this space, since the results aren't ready for primetime. But as developments in text-to-image generation has shown, by the time you read this the technology will already have advanced. Meta is only giving a glimpse to the public at the tech it calls Make-A-Video.


📝 📺 Edge#234: Inside Meta AI's Make-A-Video

#artificialintelligence

On Thursdays, we dive deep into one of the freshest research papers or technology frameworks that is worth your attention. Our goal is to keep you up to date with new developments in AI to complement the concepts we debate in other editions of our newsletter. Text-to-Video (T2V) is considered the next frontier for generative artificial intelligence (AI) models. While the text-to-image (T2I) space is experiencing a revolution with models like DALL-E, Stable Diffusion, and Midjouney, T2V still remains a monumental challenge. Recently, researchers from Meta AI unveiled Make-A-Video, a T2V model able to create realistic short video clips from textual inputs.


La veille de la cybersécurité

#artificialintelligence

Not to be outdone by Meta's Make-A-Video, Google today detailed its work on Imagen Video, an AI system that can generate video clips given a text prompt (e.g. While the results aren't perfect -- the looping clips the system generates tend to have artifacts and noise -- Google claims that Imagen Video is a step toward a system with a "high degree of controllability" and world knowledge, including the ability to generate footage in a range of artistic styles. As my colleague Devin Coldewey noted in his piece about Make-A-Video, text-to-video systems aren't new. Earlier this year, a group of researchers from Tsinghua University and the Beijing Academy of Artificial Intelligence released CogVideo, which can translate text into reasonably high-fidelity short clips. But Imagen Video appears to be a significant leap over the previous state-of-the-art, showing an aptitude for animating captions that existing systems would have trouble understanding. "It's definitely an improvement," Matthew Guzdial, an assistant professor at the University of Alberta studying AI and machine learning, told TechCrunch via email.


Meta enters the AI arms race with a creepy DALL-E 2 for video

#artificialintelligence

AI image generation has been let loose and it seems there's no going back. With DALL-E 2 now open to all, another player has entered the fray not wanting to lose out – and it's none other than Facebook's parent company Meta. And while DALL-E 2 currently works its magic only with static images, Meta's revealed that it's working on a similar tool for video. Like with AI image generators such as DALL-E 2, users will be able to type in a descriptive text prompt, and the tool will generate four output options. Named Make-A-Video (give them a break, they were too busy with the tech to work on names) isn't yet public, but Meta AI has been doing requests on Twitter. The results are as creepy as they are astonishing.


Google answers Meta's video-generating AI with its own, dubbed Imagen Video

#artificialintelligence

Not to be outdone by Meta's Make-A-Video, Google today detailed its work on Imagen Video, an AI system that can generate video clips given a text prompt (e.g., "a teddy bear washing dishes"). While the results aren't perfect -- the looping clips the system generates tend to have artifacts and noise -- Google claims that Imagen Video is a step toward a system with a "high degree of controllability" and world knowledge, including the ability to generate footage in a range of artistic styles. As my colleague Devin Coldewey noted in his piece about Make-A-Video, text-to-video systems aren't new. Earlier this year, a group of researchers from Tsinghua University and the Beijing Academy of Artificial Intelligence released CogVideo, which can translate text into reasonably-high-fidelity short clips. But Imagen Video appears to be a significant leap over the previous state-of-the-art, showing an aptitude for animating captions that existing systems would have trouble understanding.