Goto

Collaborating Authors

 morphgan


Towards Controllable Audio Texture Morphing

Gupta, Chitralekha, Kamath, Purnima, Wei, Yize, Li, Zhuoyao, Nanayakkara, Suranga, Wyse, Lonce

arXiv.org Artificial Intelligence

Moreover, linear interpolation between parameters may not result in perceptually linear interpolation between the sounds [1]. In this paper, we propose a data-driven approach to train a Generative The goal of parametric audio texture synthesis is to generate Adversarial Network (GAN) conditioned on "soft-labels" distilled novel sounds with descriptive parameters that match those of a target from the penultimate layer of an audio classifier trained on a texture. McDermott et al. [4] developed a set of statistics based on target set of audio texture classes. We demonstrate that interpolation a cochlear model to describe the perceptually relevant aspects of a between such conditions or control vectors provide smooth morphing given audio texture. Recent works [10, 5, 11] have adapted the seminal between the generated audio textures, and show similar or better work on image style transfer [12] for audio texture synthesis, audio texture morphing capability compared to the state-of-the-art where hand-crafted statistics are replaced with Gram matrix statistics methods. The proposed approach results in a well-organized latent computed as the correlation between feature activations to represent space that generates novel audio outputs while remaining consistent style. Though this method of audio style transfer produces interesting with the semantics of the conditioning parameters. This is a step combinations of the sounds, there is no control of semantic style towards a general data-driven approach to designing generative audio or content features other than through the data examples provided.


MorphGAN: One-Shot Face Synthesis GAN for Detecting Recognition Bias

Ruiz, Nataniel, Theobald, Barry-John, Ranjan, Anurag, Abdelaziz, Ahmed Hussein, Apostoloff, Nicholas

arXiv.org Artificial Intelligence

To detect bias in face recognition networks, it can be useful to probe a network under test using samples in which only specific attributes vary in some controlled way. However, capturing a sufficiently large dataset with specific control over the attributes of interest is difficult. In this work, we describe a simulator that applies specific head pose and facial expression adjustments to images of previously unseen people. The simulator first fits a 3D morphable model to a provided image, applies the desired head pose and facial expression controls, then renders the model into an image. Next, a conditional Generative Adversarial Network (GAN) conditioned on the original image and the rendered morphable model is used to produce the image of the original person with the new facial expression and head pose. We call this conditional GAN -- MorphGAN. Images generated using MorphGAN conserve the identity of the person in the original image, and the provided control over head pose and facial expression allows test sets to be created to identify robustness issues of a facial recognition deep network with respect to pose and expression. Images generated by MorphGAN can also serve as data augmentation when training data are scarce. We show that by augmenting small datasets of faces with new poses and expressions improves the recognition performance by up to 9% depending on the augmentation and data scarcity.