AITopics | Liew, Jun Hao

Collaborating Authors

Liew, Jun Hao

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

PeRFlow: Piecewise Rectified Flow as Universal Plug-and-Play Accelerator

Yan, Hanshu, Liu, Xingchao, Pan, Jiachun, Liew, Jun Hao, Liu, Qiang, Feng, Jiashi

arXiv.org Artificial IntelligenceMay-29-2024

We present Piecewise Rectified Flow (PeRFlow), a flow-based method for accelerating diffusion models. PeRFlow divides the sampling process of generative flows into several time windows and straightens the trajectories in each interval via the reflow operation, thereby approaching piecewise linear flows. PeRFlow achieves superior performance in a few-step generation. Moreover, through dedicated parameterizations, the PeRFlow models inherit knowledge from the pretrained diffusion models. Thus, the training converges fast and the obtained models show advantageous transfer ability, serving as universal plug-and-play accelerators that are compatible with various workflows based on the pre-trained diffusion models. Codes for training and inference are publicly released. https://github.com/magic-research/piecewise-rectified-flow

artificial intelligence, diffusion model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2405.0751

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre:

Workflow (0.50)
Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention

Zhu, Lianghui, Huang, Zilong, Liao, Bencheng, Liew, Jun Hao, Yan, Hanshu, Feng, Jiashi, Wang, Xinggang

arXiv.org Artificial IntelligenceMay-28-2024

Diffusion models with large-scale pre-training have achieved significant success in the field of visual content generation, particularly exemplified by Diffusion Transformers (DiT). However, DiT models have faced challenges with scalability and quadratic complexity efficiency. In this paper, we aim to leverage the long sequence modeling capability of Gated Linear Attention (GLA) Transformers, expanding its applicability to diffusion models. We introduce Diffusion Gated Linear Attention Transformers (DiG), a simple, adoptable solution with minimal parameter overhead, following the DiT design, but offering superior efficiency and effectiveness. In addition to better performance than DiT, DiG-S/2 exhibits $2.5\times$ higher training speed than DiT-S/2 and saves $75.7\%$ GPU memory at a resolution of $1792 \times 1792$. Moreover, we analyze the scalability of DiG across a variety of computational complexity. DiG models, with increased depth/width or augmentation of input tokens, consistently exhibit decreasing FID. We further compare DiG with other subquadratic-time diffusion models. With the same model size, DiG-XL/2 is $4.2\times$ faster than the recent Mamba-based diffusion model at a $1024$ resolution, and is $1.8\times$ faster than DiT with CUDA-optimized FlashAttention-2 under the $2048$ resolution. All these results demonstrate its superior efficiency among the latest diffusion models. Code is released at https://github.com/hustvl/DiG.

artificial intelligence, arxiv preprint arxiv, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2405.18428

Country: Europe > Germany (0.14)

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation

Wang, Weimin, Liu, Jiawei, Lin, Zhijie, Yan, Jiangqiao, Chen, Shuo, Low, Chetwin, Hoang, Tuyen, Wu, Jie, Liew, Jun Hao, Yan, Hanshu, Zhou, Daquan, Feng, Jiashi

arXiv.org Artificial IntelligenceJan-9-2024

The growing demand for high-fidelity video generation from textual descriptions has catalyzed significant research in this field. In this work, we introduce MagicVideo-V2 that integrates the text-to-image model, video motion generator, reference image embedding module and frame interpolation module into an end-to-end video generation pipeline. Benefiting from these architecture designs, MagicVideo-V2 can generate an aesthetically pleasing, high-resolution video with remarkable fidelity and smoothness. It demonstrates superior performance over leading Text-to-Video systems such as Runway, Pika 1.0, Morph, Moon Valley and Stable Video Diffusion model via user evaluation at large scale.

artificial intelligence, machine learning, magicvideo-v2, (14 more...)

arXiv.org Artificial Intelligence

2401.04468

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.93)

Add feedback

Towards Accurate Guided Diffusion Sampling through Symplectic Adjoint Method

Pan, Jiachun, Yan, Hanshu, Liew, Jun Hao, Feng, Jiashi, Tan, Vincent Y. F.

arXiv.org Artificial IntelligenceDec-19-2023

Training-free guided sampling in diffusion models leverages off-the-shelf pre-trained networks, such as an aesthetic evaluation model, to guide the generation process. Current training-free guided sampling algorithms obtain the guidance energy function based on a one-step estimate of the clean image. However, since the off-the-shelf pre-trained networks are trained on clean images, the one-step estimation procedure of the clean image may be inaccurate, especially in the early stages of the generation process in diffusion models. This causes the guidance in the early time steps to be inaccurate. To overcome this problem, we propose Symplectic Adjoint Guidance (SAG), which calculates the gradient guidance in two inner stages. Firstly, SAG estimates the clean image via $n$ function calls, where $n$ serves as a flexible hyperparameter that can be tailored to meet specific image quality requirements. Secondly, SAG uses the symplectic adjoint method to obtain the gradients accurately and efficiently in terms of the memory requirements. Extensive experiments demonstrate that SAG generates images with higher qualities compared to the baselines in both guided image and video generation tasks.

artificial intelligence, guidance, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2312.1203

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing

Shi, Yujun, Xue, Chuhui, Liew, Jun Hao, Pan, Jiachun, Yan, Hanshu, Zhang, Wenqing, Tan, Vincent Y. F., Bai, Song

arXiv.org Artificial IntelligenceDec-10-2023

Accurate and controllable image editing is a challenging task that has attracted significant attention recently. Notably, DragGAN is an interactive point-based image editing framework that achieves impressive editing results with pixel-level precision. However, due to its reliance on generative adversarial networks (GANs), its generality is limited by the capacity of pretrained GAN models. In this work, we extend this editing framework to diffusion models and propose a novel approach DragDiffusion. By harnessing large-scale pretrained diffusion models, we greatly enhance the applicability of interactive point-based editing on both real and diffusion-generated images. Our approach involves optimizing the diffusion latents to achieve precise spatial control. The supervision signal of this optimization process is from the diffusion model's UNet features, which are known to contain rich semantic and geometric information. Moreover, we introduce two additional techniques, namely LoRA fine-tuning and latent-MasaCtrl, to further preserve the identity of the original image. Lastly, we present a challenging benchmark dataset called DragBench -- the first benchmark to evaluate the performance of interactive point-based image editing methods. Experiments across a wide range of challenging cases (e.g., images with multiple objects, diverse object categories, various styles, etc.) demonstrate the versatility and generality of DragDiffusion. Code: https://github.com/Yujun-Shi/DragDiffusion.

artificial intelligence, editing, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2306.14435

Country:

Europe > Netherlands (0.14)
Europe > Germany (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry: Media > Photography (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)

Add feedback

AdjointDPM: Adjoint Sensitivity Method for Gradient Backpropagation of Diffusion Probabilistic Models

Pan, Jiachun, Liew, Jun Hao, Tan, Vincent Y. F., Feng, Jiashi, Yan, Hanshu

arXiv.org Artificial IntelligenceJul-20-2023

Existing customization methods require access to multiple reference examples to align pre-trained diffusion probabilistic models (DPMs) with user-provided concepts. This paper aims to address the challenge of DPM customization when the only available supervision is a differentiable metric defined on the generated contents. Since the sampling procedure of DPMs involves recursive calls to the denoising UNet, na\"ive gradient backpropagation requires storing the intermediate states of all iterations, resulting in extremely high memory consumption. To overcome this issue, we propose a novel method AdjointDPM, which first generates new samples from diffusion models by solving the corresponding probability-flow ODEs. It then uses the adjoint sensitivity method to backpropagate the gradients of the loss to the models' parameters (including conditioning signals, network weights, and initial noises) by solving another augmented ODE. To reduce numerical errors in both the forward generation and gradient backpropagation processes, we further reparameterize the probability-flow ODE and augmented ODE as simple non-stiff ODEs using exponential integration. Finally, we demonstrate the effectiveness of AdjointDPM on three interesting tasks: converting visual effects into identification text embeddings, finetuning DPMs for specific types of stylization, and optimizing initial noise to generate adversarial samples for security auditing.

adjointdpm, artificial intelligence, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2307.10711

Country: North America > United States (0.14)

Genre: Research Report (0.84)

Industry: Information Technology > Security & Privacy (0.89)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (0.81)

Add feedback