AITopics | Park, Taesung

Collaborating Authors

Park, Taesung

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Distilling Diffusion Models into Conditional GANs

Kang, Minguk, Zhang, Richard, Barnes, Connelly, Paris, Sylvain, Kwak, Suha, Park, Jaesik, Shechtman, Eli, Zhu, Jun-Yan, Park, Taesung

arXiv.org Artificial IntelligenceJun-13-2024

We propose a method to distill a complex multistep diffusion model into a single-step conditional GAN student model, dramatically accelerating inference, while preserving image quality. Our approach interprets diffusion distillation as a paired image-to-image translation task, using noise-to-image pairs of the diffusion model's ODE trajectory. For efficient regression loss computation, we propose E-LatentLPIPS, a perceptual loss operating directly in diffusion model's latent space, utilizing an ensemble of augmentations. Furthermore, we adapt a diffusion model to construct a multi-scale discriminator with a text alignment loss to build an effective conditional GAN-based formulation. E-LatentLPIPS converges more efficiently than many existing distillation methods, even accounting for dataset construction costs. We demonstrate that our one-step generator outperforms cutting-edge one-step diffusion distillation models -- DMD, SDXL-Turbo, and SDXL-Lightning -- on the zero-shot COCO benchmark.

artificial intelligence, deep learning, machine learning, (11 more...)

arXiv.org Artificial Intelligence

2405.05967

Country:

Asia > South Korea (0.46)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.82)

Industry: Education (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Lazy Diffusion Transformer for Interactive Image Editing

Nitzan, Yotam, Wu, Zongze, Zhang, Richard, Shechtman, Eli, Cohen-Or, Daniel, Park, Taesung, Gharbi, Michaël

arXiv.org Artificial IntelligenceApr-18-2024

We introduce a novel diffusion transformer, LazyDiffusion, that generates partial image updates efficiently. Our approach targets interactive image editing applications in which, starting from a blank canvas or an image, a user specifies a sequence of localized image modifications using binary masks and text prompts. Our generator operates in two phases. First, a context encoder processes the current canvas and user mask to produce a compact global context tailored to the region to generate. Second, conditioned on this context, a diffusion-based transformer decoder synthesizes the masked pixels in a "lazy" fashion, i.e., it only generates the masked region. This contrasts with previous works that either regenerate the full canvas, wasting time and computation, or confine processing to a tight rectangular crop around the mask, ignoring the global image context altogether. Our decoder's runtime scales with the mask size, which is typically small, while our encoder introduces negligible overhead. We demonstrate that our approach is competitive with state-of-the-art inpainting methods in terms of quality and fidelity while providing a 10x speedup for typical user interactions, where the editing mask represents 10% of the image.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2404.12382

Genre: Research Report (0.64)

Industry: Media > Photography (0.61)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

One-Step Image Translation with Text-to-Image Models

Parmar, Gaurav, Park, Taesung, Narasimhan, Srinivasa, Zhu, Jun-Yan

arXiv.org Artificial IntelligenceMar-18-2024

In this work, we address two limitations of existing conditional diffusion models: their slow inference speed due to the iterative denoising process and their reliance on paired data for model fine-tuning. To tackle these issues, we introduce a general method for adapting a single-step diffusion model to new tasks and domains through adversarial learning objectives. Specifically, we consolidate various modules of the vanilla latent diffusion model into a single end-to-end generator network with small trainable weights, enhancing its ability to preserve the input image structure while reducing overfitting. We demonstrate that, for unpaired settings, our model CycleGAN-Turbo outperforms existing GAN-based and diffusion-based methods for various scene translation tasks, such as day-to-night conversion and adding/removing weather effects like fog, snow, and rain. We extend our method to paired settings, where our model pix2pix-Turbo is on par with recent works like Control-Net for Sketch2Photo and Edge2Image, but with a single-step inference. This work suggests that single-step diffusion models can serve as strong backbones for a range of GAN learning objectives.

artificial intelligence, diffusion model, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2403.12036

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Holistic Evaluation of Text-To-Image Models

Lee, Tony, Yasunaga, Michihiro, Meng, Chenlin, Mai, Yifan, Park, Joon Sung, Gupta, Agrim, Zhang, Yunzhi, Narayanan, Deepak, Teufel, Hannah Benita, Bellagente, Marco, Kang, Minguk, Park, Taesung, Leskovec, Jure, Zhu, Jun-Yan, Fei-Fei, Li, Wu, Jiajun, Ermon, Stefano, Liang, Percy

arXiv.org Artificial IntelligenceNov-7-2023

The stunning qualitative improvement of recent text-to-image models has led to their widespread attention and adoption. However, we lack a comprehensive quantitative understanding of their capabilities and risks. To fill this gap, we introduce a new benchmark, Holistic Evaluation of Text-to-Image Models (HEIM). Whereas previous evaluations focus mostly on text-image alignment and image quality, we identify 12 aspects, including text-image alignment, image quality, aesthetics, originality, reasoning, knowledge, bias, toxicity, fairness, robustness, multilinguality, and efficiency. We curate 62 scenarios encompassing these aspects and evaluate 26 state-of-the-art text-to-image models on this benchmark. Our results reveal that no single model excels in all aspects, with different models demonstrating different strengths. We release the generated images and human evaluation results for full transparency at https://crfm.stanford.edu/heim/v1.1.0 and the code at https://github.com/stanford-crfm/helm, which is integrated with the HELM codebase.

holistic evaluation, text-to-image model

arXiv.org Artificial Intelligence

2311.04287

Country: North America > United States > California > Santa Clara County > Palo Alto (0.24)

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Vision (1.00)

Add feedback

Expressive Text-to-Image Generation with Rich Text

Ge, Songwei, Park, Taesung, Zhu, Jun-Yan, Huang, Jia-Bin

arXiv.org Artificial IntelligenceAug-30-2023

Plain text has become a prevalent interface for text-to-image synthesis. However, its limited customization options hinder users from accurately describing desired outputs. For example, plain text makes it hard to specify continuous quantities, such as the precise RGB color value or importance of each word. Furthermore, creating detailed text prompts for complex scenes is tedious for humans to write and challenging for text encoders to interpret. To address these challenges, we propose using a rich-text editor supporting formats such as font style, size, color, and footnote. We extract each word's attributes from rich text to enable local style control, explicit token reweighting, precise color rendering, and detailed region synthesis. We achieve these capabilities through a region-based diffusion process. We first obtain each word's region based on attention maps of a diffusion process using plain text. For each region, we enforce its text attributes by creating region-specific detailed prompts and applying region-specific guidance, and maintain its fidelity against plain-text generation through region-based injections. We present various examples of image generation from rich text and demonstrate that our method outperforms strong baselines with quantitative evaluations.

artificial intelligence, expressive text-to-image generation, rich text

arXiv.org Artificial Intelligence

2304.0672

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.60)

Add feedback

Scaling up GANs for Text-to-Image Synthesis

Kang, Minguk, Zhu, Jun-Yan, Zhang, Richard, Park, Jaesik, Shechtman, Eli, Paris, Sylvain, Park, Taesung

arXiv.org Artificial IntelligenceJun-19-2023

The recent success of text-to-image synthesis has taken the world by storm and captured the general public's imagination. From a technical standpoint, it also marked a drastic change in the favored architecture to design generative image models. GANs used to be the de facto choice, with techniques like StyleGAN. With DALL-E 2, auto-regressive and diffusion models became the new standard for large-scale generative models overnight. This rapid shift raises a fundamental question: can we scale up GANs to benefit from large datasets like LAION? We find that na\"Ively increasing the capacity of the StyleGAN architecture quickly becomes unstable. We introduce GigaGAN, a new GAN architecture that far exceeds this limit, demonstrating GANs as a viable option for text-to-image synthesis. GigaGAN offers three major advantages. First, it is orders of magnitude faster at inference time, taking only 0.13 seconds to synthesize a 512px image. Second, it can synthesize high-resolution images, for example, 16-megapixel pixels in 3.66 seconds. Finally, GigaGAN supports various latent space editing applications such as latent interpolation, style mixing, and vector arithmetic operations.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2303.05511

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.73)

Add feedback

Domain Expansion of Image Generators

Nitzan, Yotam, Gharbi, Michaël, Zhang, Richard, Park, Taesung, Zhu, Jun-Yan, Cohen-Or, Daniel, Shechtman, Eli

arXiv.org Artificial IntelligenceApr-17-2023

Can one inject new concepts into an already trained generative model, while respecting its existing structure and knowledge? We propose a new task - domain expansion - to address this. Given a pretrained generator and novel (but related) domains, we expand the generator to jointly model all domains, old and new, harmoniously. First, we note the generator contains a meaningful, pretrained latent space. Is it possible to minimally perturb this hard-earned representation, while maximally representing the new domains? Interestingly, we find that the latent space offers unused, "dormant" directions, which do not affect the output. This provides an opportunity: By "repurposing" these directions, we can represent new domains without perturbing the original representation. In fact, we find that pretrained generators have the capacity to add several - even hundreds - of new domains! Using our expansion method, one "expanded" model can supersede numerous domain-specific models, without expanding the model size. Additionally, a single expanded generator natively supports smooth transitions between domains, as well as composition of domains. Code and project page available at https://yotamnitzan.github.io/domain-expansion/.

artificial intelligence, generator, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2301.05225

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)
Information Technology > Artificial Intelligence > Natural Language (0.67)

Add feedback

A Customizable Dynamic Scenario Modeling and Data Generation Platform for Autonomous Driving

Shenoy, Jay, Kim, Edward, Yue, Xiangyu, Park, Taesung, Fremont, Daniel, Sangiovanni-Vincentelli, Alberto, Seshia, Sanjit

arXiv.org Artificial IntelligenceNov-30-2020

Safely interacting with humans is a significant challenge for autonomous driving. The performance of this interaction depends on machine learning-based modules of an autopilot, such as perception, behavior prediction, and planning. These modules require training datasets with high-quality labels and a diverse range of realistic dynamic behaviors. Consequently, training such modules to handle rare scenarios is difficult because they are, by definition, rarely represented in real-world datasets. Hence, there is a practical need to augment datasets with synthetic data covering these rare scenarios. In this paper, we present a platform to model dynamic and interactive scenarios, generate the scenarios in simulation with different modalities of labeled sensor data, and collect this information for data augmentation. To our knowledge, this is the first integrated platform for these tasks specialized to the autonomous driving domain.

artificial intelligence, ground transportation, scenario, (15 more...)

arXiv.org Artificial Intelligence

2011.14551

Country: North America > United States (0.15)

Genre: Research Report (0.40)

Industry:

Transportation > Ground > Road (1.00)
Information Technology > Robotics & Automation (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Semantic Image Synthesis with Spatially-Adaptive Normalization

Park, Taesung, Liu, Ming-Yu, Wang, Ting-Chun, Zhu, Jun-Yan

arXiv.org Artificial IntelligenceMar-18-2019

We propose spatially-adaptive normalization, a simple but effective layer for synthesizing photorealistic images given an input semantic layout. Previous methods directly feed the semantic layout as input to the deep network, which is then processed through stacks of convolution, normalization, and nonlinearity layers. We show that this is suboptimal as the normalization layers tend to ``wash away'' semantic information. To address the issue, we propose using the input layout for modulating the activations in normalization layers through a spatially-adaptive, learned transformation. Experiments on several challenging datasets demonstrate the advantage of the proposed method over existing approaches, regarding both visual fidelity and alignment with input layouts. Finally, our model allows user control over both semantic and style as synthesizing images. Code will be available at https://github.com/NVlabs/SPADE .

deep learning, neural network, normalization, (19 more...)

arXiv.org Artificial Intelligence

1903.07291

Country: Europe (0.14)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (0.48)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback