AITopics | c-vivit

Collaborating Authors

c-vivit

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Meet Phenaki: A Machine Learning-Based Model For Generating Videos From Text Prompts And Uses C-ViViT As Video Encoder

#artificialintelligenceOct-15-2022, 22:05:33 GMT

Text-to-image generation is a hot topic in the AI domain, mainly thanks to the open-source release of stable-diffusion. Do you want to see an image of "a teddy bear sleeping in a medieval bed drawn in Van Gogh style"? You can pass a prompt with details, and the stable-diffusion AI will generate a realistic image for you. The X-to-Y generation madness using diffusion models is not just limited to images. You can go from text-to-image, text-to-speech, image-to-image, and the list goes on.

c-vivit, phenaki, video, (10 more...)

#artificialintelligence

Country:

Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.06)
Europe > Austria (0.06)
Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.06)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)

Add feedback

Phenaki: Variable Length Video Generation From Open Domain Textual Description

Villegas, Ruben, Babaeizadeh, Mohammad, Kindermans, Pieter-Jan, Moraldo, Hernan, Zhang, Han, Saffar, Mohammad Taghi, Castro, Santiago, Kunze, Julius, Erhan, Dumitru

arXiv.org Artificial IntelligenceOct-5-2022

We present Phenaki, a model capable of realistic video synthesis, given a sequence of textual prompts. Generating videos from text is particularly challenging due to the computational cost, limited quantities of high quality text-video data and variable length of videos. To address these issues, we introduce a new model for learning video representation which compresses the video to a small representation of discrete tokens. This tokenizer uses causal attention in time, which allows it to work with variable-length videos. To generate video tokens from text we are using a bidirectional masked transformer conditioned on pre-computed text tokens. The generated video tokens are subsequently de-tokenized to create the actual video. To address data issues, we demonstrate how joint training on a large corpus of image-text pairs as well as a smaller number of video-text examples can result in generalization beyond what is available in the video datasets. Compared to the previous video generation methods, Phenaki can generate arbitrary long videos conditioned on a sequence of prompts (i.e. To the best of our knowledge, this is the first time a paper studies generating videos from time variable prompts. In addition, compared to the perframe baselines, the proposed video encoder-decoder computes fewer tokens per video but results in better spatio-temporal consistency. It is now possible to generate realistic high resolution images given a description [34, 35, 32, 38, 59], but generating high quality videos from text remains challenging.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2210.02399

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Michigan (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback