AITopics | Park, Yong-Hyun

Collaborating Authors

Park, Yong-Hyun

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

$\textit{Jump Your Steps}$: Optimizing Sampling Schedule of Discrete Diffusion Models

Park, Yong-Hyun, Lai, Chieh-Hsin, Hayakawa, Satoshi, Takida, Yuhta, Mitsufuji, Yuki

arXiv.org Artificial IntelligenceOct-10-2024

Diffusion models have seen notable success in continuous domains, leading to the development of discrete diffusion models (DDMs) for discrete variables. While parallel sampling methods like τ-leaping accelerate this process, they introduce Compounding Decoding Error (CDE), where discrepancies arise between the true distribution and the approximation from parallel token generation, leading to degraded sample quality. More precisely, we derive a practical upper bound on CDE and propose an efficient algorithm for searching for the optimal sampling schedule. Extensive experiments across image, music, and text generation show that JYS significantly improves sampling quality, establishing it as a versatile framework for enhancing DDM performance for fast sampling. Diffusion models (Sohl-Dickstein et al., 2015; Song et al., 2021b; Ho et al., 2020; Song et al., 2021a; Karras et al., 2022) have achieved remarkable success in generation tasks within the continuous domain. However, certain modalities, such as text and music, inherently possess discrete features. Nevertheless, like their continuous counterparts, DDMs encounter a significant bottleneck in sampling speed due to their progressive refinement process. In contrast to continuous-domain diffusion models, where sampling dynamics are driven by samplewise differential equations (Song et al., 2021b), allowing for the direct application of wellestablished numerical methods to accelerate generation, enhancing speed in DDMs poses a significant challenge. To address this, researchers have proposed fast and efficient samplers, including notable methods such as the τ-leaping (Campbell et al., 2022; Lezama et al., 2022; Sun et al., 2023) and k-Gillespie algorithms (Zhao et al., 2024), which facilitate parallel sampling of multiple tokens in a single step. However, this parallel but independent sampling introduces Compounding Decoding Error (CDE) (Lezama et al., 2022), which arises from a mismatch between the training and inference distributions of intermediate latents during parallel sampling. Specifically, while each token is generated according to its marginal distribution, the joint distribution deviates from the learned distribution. To mitigate this issue, the predictor-corrector (PC) sampler (Campbell et al., 2022) has been proposed.

artificial intelligence, diffusion model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2410.07761

Country:

North America > United States (1.00)
Asia (0.93)

Genre: Research Report > New Finding (0.46)

Industry:

Government > Voting & Elections (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Health & Medicine > Health Care Providers & Services (0.67)
(4 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Upsample Guidance: Scale Up Diffusion Models without Training

Hwang, Juno, Park, Yong-Hyun, Jo, Junghyo

arXiv.org Artificial IntelligenceApr-2-2024

Diffusion models have demonstrated superior performance across various generative tasks including images, videos, and audio. However, they encounter difficulties in directly generating high-resolution samples. Previously proposed solutions to this issue involve modifying the architecture, further training, or partitioning the sampling process into multiple stages. These methods have the limitation of not being able to directly utilize pre-trained models as-is, requiring additional work. In this paper, we introduce upsample guidance, a technique that adapts pretrained diffusion model (e.g., $512^2$) to generate higher-resolution images (e.g., $1536^2$) by adding only a single term in the sampling process. Remarkably, this technique does not necessitate any additional training or relying on external models. We demonstrate that upsample guidance can be applied to various models, such as pixel-space, latent space, and video diffusion models. We also observed that the proper selection of guidance scale can improve image quality, fidelity, and prompt alignment.

artificial intelligence, machine learning, resolution, (13 more...)

arXiv.org Artificial Intelligence

2404.01709

Country: Europe > Germany (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.30)

Add feedback

Resolution Chromatography of Diffusion Models

Hwang, Juno, Park, Yong-Hyun, Jo, Junghyo

arXiv.org Artificial IntelligenceDec-6-2023

Diffusion models generate high-resolution images through iterative stochastic processes. In particular, the denoising method is one of the most popular approaches that predicts the noise in samples and denoises it at each time step. It has been commonly observed that the resolution of generated samples changes over time, starting off blurry and coarse, and becoming sharper and finer. In this paper, we introduce "resolution chromatography" that indicates the signal generation rate of each resolution, which is very helpful concept to mathematically explain this coarse-to-fine behavior in generation process, to understand the role of noise schedule, and to design time-dependent modulation. Using resolution chromatography, we determine which resolution level becomes dominant at a specific time step, and experimentally verify our theory with text-to-image diffusion models. We also propose some direct applications utilizing the concept: upscaling pre-trained models to higher resolutions and time-dependent prompt composing. Our theory not only enables a better understanding of numerous pre-existing techniques for manipulating image generation, but also suggests the potential for designing better noise schedules.

artificial intelligence, chromatography, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2401.10247

Country: Asia > Middle East > Israel (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.90)

Add feedback