Benoit, Santiago
Secure & Personalized Music-to-Video Generation via CHARCHA
Agarwal, Mehul, Agarwal, Gauri, Benoit, Santiago, Lippman, Andrew, Oh, Jean
Music is a deeply personal experience and our aim is to enhance this with a fullyautomated pipeline for personalized music video generation. Our work allows listeners to not just be consumers but co-creators in the music video generation process by creating personalized, consistent and context-driven visuals based on lyrics, rhythm and emotion in the music. The pipeline combines multimodal translation and generation techniques and utilizes low-rank adaptation on listeners' images to create immersive music videos that reflect both the music and the individual. To ensure the ethical use of users' identity, we also introduce CHARCHA, a facial identity verification protocol that protects people against unauthorized use of their face while at the same time collecting authorized images from users for personalizing their videos. This paper thus provides a secure and innovative framework for creating deeply personalized music videos. Figure 1: Image stills and lyrics from generated music videos for Rick Astley's "Never Gonna Give You Up," with character reference from CHARCHA. The videos use Queratogray Sketch[1], Western Animation Diffusion[2], and Realistic Vision V5.1[3] checkpoint models .
Relay Variational Inference: A Method for Accelerated Encoderless VI
Zadeh, Amir, Benoit, Santiago, Morency, Louis-Philippe
Variational Inference (VI) offers a method for approximating intractable likelihoods. In neural VI, inference of approximate posteriors is commonly done using an encoder. Alternatively, encoderless VI offers a framework for learning generative models from data without encountering suboptimalities caused by amortization via an encoder (e.g. in presence of missing or uncertain data). However, in absence of an encoder, such methods often suffer in convergence due to the slow nature of gradient steps required to learn the approximate posterior parameters. In this paper, we introduce Relay VI (RVI), a framework that dramatically improves both the convergence and performance of encoderless VI. In our experiments over multiple datasets, we study the effectiveness of RVI in terms of convergence speed, loss, representation power and missing data imputation. We find RVI to be a unique tool, often superior in both performance and convergence speed to previously proposed encoderless as well as amortized VI models (e.g.
StarNet: Gradient-free Training of Deep Generative Models using Determined System of Linear Equations
Zadeh, Amir, Benoit, Santiago, Morency, Louis-Philippe
In this paper we present an approach for training deep generative models solely based on solving determined systems of linear equations. A network that uses this approach, called a StarNet, has the following desirable properties: 1) training requires no gradient as solution to the system of linear equations is not stochastic, 2) is highly scalable when solving the system of linear equations w.r.t the latent codes, and similarly for the parameters of the model, and 3) it gives desirable least-square bounds for the estimation of latent codes and network parameters within each layer.