AITopics | Weber, Romann M.

Collaborating Authors

Weber, Romann M.

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models

Sadat, Seyedmorteza, Hilliges, Otmar, Weber, Romann M.

arXiv.org Artificial IntelligenceOct-3-2024

Classifier-free guidance (CFG) is crucial for improving both generation quality and alignment between the input condition and final output in diffusion models. While a high guidance scale is generally required to enhance these aspects, it also causes oversaturation and unrealistic artifacts. In this paper, we revisit the CFG update rule and introduce modifications to address this issue. We first decompose the update term in CFG into parallel and orthogonal components with respect to the conditional model prediction and observe that the parallel component primarily causes oversaturation, while the orthogonal component enhances image quality. Accordingly, we propose down-weighting the parallel component to achieve high-quality generations without oversaturation. Additionally, we draw a connection between CFG and gradient ascent and introduce a new rescaling and momentum method for the CFG update rule based on this insight. Our approach, termed adaptive projected guidance (APG), retains the quality-boosting advantages of CFG while enabling the use of higher guidance scales without oversaturation. APG is easy to implement and introduces practically no additional computational overhead to the sampling process. Through extensive experiments, we demonstrate that APG is compatible with various conditional diffusion models and samplers, leading to improved FID, recall, and saturation scores while maintaining precision comparable to CFG, making our method a superior plug-and-play alternative to standard classifier-free guidance.

apg, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2410.02416

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Maryland (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)

Add feedback

No Training, No Problem: Rethinking Classifier-Free Guidance for Diffusion Models

Sadat, Seyedmorteza, Kansy, Manuel, Hilliges, Otmar, Weber, Romann M.

arXiv.org Artificial IntelligenceJul-2-2024

Classifier-free guidance (CFG) has become the standard method for enhancing the quality of conditional diffusion models. However, employing CFG requires either training an unconditional model alongside the main diffusion model or modifying the training procedure by periodically inserting a null condition. There is also no clear extension of CFG to unconditional models. In this paper, we revisit the core principles of CFG and introduce a new method, independent condition guidance (ICG), which provides the benefits of CFG without the need for any special training procedures. Our approach streamlines the training process of conditional diffusion models and can also be applied during inference on any pre-trained conditional model. Additionally, by leveraging the time-step information encoded in all diffusion networks, we propose an extension of CFG, called time-step guidance (TSG), which can be applied to any diffusion model, including unconditional ones. Our guidance techniques are easy to implement and have the same sampling cost as CFG. Through extensive experiments, we demonstrate that ICG matches the performance of standard CFG across various conditional diffusion models. Moreover, we show that TSG improves generation quality in a manner similar to CFG, without relying on any conditional information.

artificial intelligence, diffusion model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2407.02687

Country: North America > United States > Maryland (0.14)

Genre: Research Report > New Finding (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.31)

Add feedback

LiteVAE: Lightweight and Efficient Variational Autoencoders for Latent Diffusion Models

Sadat, Seyedmorteza, Buhmann, Jakob, Bradley, Derek, Hilliges, Otmar, Weber, Romann M.

arXiv.org Artificial IntelligenceMay-23-2024

Advances in latent diffusion models (LDMs) have revolutionized high-resolution image generation, but the design space of the autoencoder that is central to these systems remains underexplored. In this paper, we introduce LiteVAE, a family of autoencoders for LDMs that leverage the 2D discrete wavelet transform to enhance scalability and computational efficiency over standard variational autoencoders (VAEs) with no sacrifice in output quality. We also investigate the training methodologies and the decoder architecture of LiteVAE and propose several enhancements that improve the training dynamics and reconstruction quality. Our base LiteVAE model matches the quality of the established VAEs in current LDMs with a six-fold reduction in encoder parameters, leading to faster training and lower GPU memory requirements, while our larger model outperforms VAEs of comparable complexity across all evaluated metrics (rFID, LPIPS, PSNR, and SSIM).

artificial intelligence, diffusion model, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2405.14477

Country:

Europe (1.00)
North America > United States > California (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Controllable Inversion of Black-Box Face Recognition Models via Diffusion

Kansy, Manuel, Raël, Anton, Mignone, Graziana, Naruniec, Jacek, Schroers, Christopher, Gross, Markus, Weber, Romann M.

arXiv.org Artificial IntelligenceSep-30-2023

Face recognition models embed a face image into a low-dimensional identity vector containing abstract encodings of identity-specific facial features that allow individuals to be distinguished from one another. We tackle the challenging task of inverting the latent space of pre-trained face recognition models without full model access (i.e. black-box setting). A variety of methods have been proposed in literature for this task, but they have serious shortcomings such as a lack of realistic outputs and strong requirements for the data set and accessibility of the face recognition model. By analyzing the black-box inversion problem, we show that the conditional diffusion model loss naturally emerges and that we can effectively sample from the inverse distribution even without an identity-specific loss. Our method, named identity denoising diffusion probabilistic model (ID3PM), leverages the stochastic nature of the denoising diffusion process to produce high-quality, identity-preserving face images with various backgrounds, lighting, poses, and expressions. We demonstrate state-of-the-art performance in terms of identity preservation and diversity both qualitatively and quantitatively, and our method is the first black-box face recognition model inversion method that offers intuitive control over the generation process.

artificial intelligence, id vector, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2303.13006

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.50)

Industry: Transportation > Air (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

The Score-Difference Flow for Implicit Generative Modeling

Weber, Romann M.

arXiv.org Machine LearningJul-18-2023

Implicit generative modeling (IGM) aims to produce samples of synthetic data matching the characteristics of a target data distribution. Recent work (e.g. score-matching networks, diffusion models) has approached the IGM problem from the perspective of pushing synthetic source data toward the target distribution via dynamical perturbations or flows in the ambient space. In this direction, we present the score difference (SD) between arbitrary target and source distributions as a flow that optimally reduces the Kullback-Leibler divergence between them while also solving the Schroedinger bridge problem. We apply the SD flow to convenient proxy distributions, which are aligned if and only if the original distributions are aligned. We demonstrate the formal equivalence of this formulation to denoising diffusion models under certain conditions. We also show that the training of generative adversarial networks includes a hidden data-optimization sub-problem, which induces the SD flow under certain choices of loss function when the discriminator is optimal. As a result, the SD flow provides a theoretical link between model classes that individually address the three challenges of the "generative modeling trilemma" -- high sample quality, mode coverage, and fast sampling -- thereby setting the stage for a unified approach.

artificial intelligence, machine learning, target distribution, (15 more...)

arXiv.org Machine Learning

2304.12906

Country: Asia > Middle East > Israel (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)

Add feedback

Disentangled Dynamic Representations from Unordered Data

Helminger, Leonhard, Djelouah, Abdelaziz, Gross, Markus, Weber, Romann M.

arXiv.org Machine LearningDec-10-2018

We present a deep generative model that learns disentangled static and dynamic representations of data from unordered input. Our approach exploits regularities in sequential data that exist regardless of the order in which the data is viewed. The result of our factorized graphical model is a well-organized and coherent latent space for data dynamics. We demonstrate our method on several synthetic dynamic datasets and real video data featuring various facial expressions and head poses.

deep learning, neural network, representation, (20 more...)

arXiv.org Machine Learning

1812.03962

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

InspireMe: Learning Sequence Models for Stories

Fortuin, Vincent (Disney Research Zürich, ETH Zürich, Institute for Machine Learning at ETH Zürich) | Weber, Romann M. (Disney Research Zürich) | Schriber, Sasha (Disney Research Zürich) | Wotruba, Diana (Disney Research Zürich) | Gross, Markus (Disney Research Zürich, ETH Zürich)

AAAI ConferencesFeb-8-2018

We present a novel approach to modeling stories using recurrent neural networks. Different story features are extracted using natural language processing techniques and used to encode the stories as sequences. These sequences can be learned by deep neural networks, in order to predict the next story events. The predictions can be used as an inspiration for writers who experience a writer's block. We further assist writers in their creative process by generating visualizations of the character interactions in the story. We show that suggestions from our model are rated as highly as the real scenes from a set of films and that our visualizations can help people in gaining deeper story understanding.

deep learning, neural network, suggestion, (18 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Genre:

Research Report (0.66)
Overview > Innovation (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback