Congratulations to the #ICML2024 award winners

Jul-25-2024, 09:25:58 GMT–AIHub

VideoPoet employs a decoder-only transformer architecture that processes multimodal inputs – including images, videos, text, and audio. The training protocol follows that of Large Language Models (LLMs), consisting of two stages: pretraining and task-specific adaptation. During pretraining, VideoPoet incorporates a mixture of multimodal generative objectives within an autoregressive Transformer framework. The pretrained LLM serves as a foundation that can be adapted for a range of video generation tasks. We present empirical results demonstrating the model's state-of-the-art capabilities in zero-shot video generation, specifically highlighting the ability to generate high-fidelity motions.

dataset, information, language model, (14 more...)

AIHub

Jul-25-2024, 09:25:58 GMT

News Web Page

Add feedback

Country:
- Europe > Austria > Vienna (0.14)

Genre:
- Research Report > New Finding (0.49)
- Personal > Honors
  - Award (0.40)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)