AITopics | Kahatapitiya, Kumara

Collaborating Authors

Kahatapitiya, Kumara

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MarDini: Masked Autoregressive Diffusion for Video Generation at Scale

Liu, Haozhe, Liu, Shikun, Zhou, Zijian, Xu, Mengmeng, Xie, Yanping, Han, Xiao, Pérez, Juan C., Liu, Ding, Kahatapitiya, Kumara, Jia, Menglin, Wu, Jui-Chieh, He, Sen, Xiang, Tao, Schmidhuber, Jürgen, Pérez-Rúa, Juan-Manuel

arXiv.org Artificial IntelligenceOct-26-2024

We introduce MarDini, a new family of video diffusion models that integrate the advantages of masked auto-regression (MAR) into a unified diffusion model (DM) framework. Here, MAR handles temporal planning, while DM focuses on spatial generation in an asymmetric network design: i) a MAR-based planning model containing most of the parameters generates planning signals for each masked frame using low-resolution input; ii) a lightweight generation model uses these signals to produce high-resolution frames via diffusion de-noising. MarDini's MAR enables video generation conditioned on any number of masked frames at any frame positions: a single model can handle video interpolation (e.g., masking middle frames), image-to-video generation (e.g., masking from the second frame onward), and video expansion (e.g., masking half the frames). The efficient design allocates most of the computational resources to the low-resolution planning model, making computationally expensive but important spatio-temporal attention feasible at scale. MarDini sets a new state-of-the-art for video interpolation; meanwhile, within few inference steps, it efficiently generates videos on par with those of much more expensive advanced image-to-video models.

arxiv preprint arxiv, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2410.2028

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Object-Centric Diffusion for Efficient Video Editing

Kahatapitiya, Kumara, Karjauv, Adil, Abati, Davide, Porikli, Fatih, Asano, Yuki M., Habibian, Amirhossein

arXiv.org Artificial IntelligenceJan-11-2024

Diffusion-based video editing have reached impressive quality and can transform either the global style, local structure, and attributes of given video inputs, following textual edit prompts. However, such solutions typically incur heavy memory and computational costs to generate temporally-coherent frames, either in the form of diffusion inversion and/or cross-frame attention. In this paper, we conduct an analysis of such inefficiencies, and suggest simple yet effective modifications that allow significant speed-ups whilst maintaining quality. Moreover, we introduce Object-Centric Diffusion, coined as OCD, to further reduce latency by allocating computations more towards foreground edited regions that are arguably more important for perceptual quality. We achieve this by two novel proposals: i) Object-Centric Sampling, decoupling the diffusion steps spent on salient regions or background, allocating most of the model capacity to the former, and ii) Object-Centric 3D Token Merging, which reduces cost of cross-frame attention by fusing redundant tokens in unimportant background regions. Both techniques are readily applicable to a given video editing model \textit{without} retraining, and can drastically reduce its memory and computational cost. We evaluate our proposals on inversion-based and control-signal-based editing pipelines, and show a latency reduction up to 10x for a comparable synthesis quality.

latency, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2401.05735

Country: Atlantic Ocean (0.14)

Genre: Research Report (1.00)

Industry:

Transportation > Ground > Road (0.47)
Leisure & Entertainment > Sports (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Token Turing Machines

Ryoo, Michael S., Gopalakrishnan, Keerthana, Kahatapitiya, Kumara, Xiao, Ted, Rao, Kanishka, Stone, Austin, Lu, Yao, Ibarz, Julian, Arnab, Anurag

arXiv.org Artificial IntelligenceApr-13-2023

Our model is for handling longer sequence lengths themselves are often inspired by the seminal Neural Turing Machine, and has an not sufficient since we do not want to run our entire transformer external memory consisting of a set of tokens which summarise model for each time step when a new observation the previous history (i.e., frames). This memory is (e.g., a new frame) is provided. This necessitates developing efficiently addressed, read and written using a Transformer models with explicit memories, enabling a model to fuse as the processing unit/controller at each step. The model's relevant past history with current observation to make a prediction memory module ensures that a new observation will only at current time step. Another desideratum for such be processed with the contents of the memory (and not the models, to scale to long sequence lengths, is that the computational entire history), meaning that it can efficiently process long cost at each time step should be constant, regardless sequences with a bounded computational cost at each step. of the length of the previous history. We show that TTM outperforms other alternatives, such as In this paper, we propose Token Turing Machines (TTMs), other Transformer models designed for long sequences and a sequential, auto-regressive model with external memory recurrent neural networks, on two real-world sequential visual and constant computational time complexity at each step.

artificial intelligence, machine learning, transformer, (17 more...)

arXiv.org Artificial Intelligence

2211.09119

Genre:

Workflow (0.88)
Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback