Generative AI
On the Identifiability of Hybrid Deep Generative Models: Meta-Learning as a Solution
The interest in leveraging physics-based inductive bias in deep learning has resulted in recent development of hybrid deep generative models (hybrid-DGMs) that integrates known physics-based mathematical expressions in neural generative models. To identify these hybrid-DGMs requires inferring parameters of the physics-based component along with their neural component. The identifiability of these hybrid-DGMs, however, has not yet been theoretically probed or established. How does the existing theory of the un-identifiability of general DGMs apply to hybrid-DGMs? What may be an effective approach to consutrct a hybrid-DGM with theoretically-proven identifiability? This paper provides the first theoretical probe into the identifiability of hybrid-DGMs, and present meta-learning as a novel solution to construct identifiable hybrid-DGMs. On synthetic and real-data benchmarks, we provide strong empirical evidence for the un-identifiability of existing hybrid-DGMs using unconditional priors, and strong identifiability results of the presented meta-formulations of hybrid-DGMs.
OpenAI: The power and the pride
There is no question that OpenAI pulled off something historic with its release of ChatGPT 3.5 in 2022. It set in motion an AI arms race that has already changed the world in a number of ways and seems poised to have an even greater long-term effect than the short-term disruptions to things like education and employment that we are already beginning to see. How that turns out for humanity is something we are still reckoning with and may be for quite some time. But a pair of recent books both attempt to get their arms around it with accounts of what two leading technology journalists saw at the OpenAI revolution. In Empire of AI: Dreams and Nightmares in Sam Altman's OpenAI, Karen Hao tells the story of the company's rise to power and its far-reaching impact all over the world.
From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion Robin San Roman, Antoine Deleforge
Deep generative models can generate high-fidelity audio conditioned on various types of representations (e.g., mel-spectrograms, Mel-frequency Cepstral Coefficients (MFCC)). Recently, such models have been used to synthesize audio waveforms conditioned on highly compressed representations. Although such methods produce impressive results, they are prone to generate audible artifacts when the conditioning is flawed or imperfect. An alternative modeling approach is to use diffusion models. However, these have mainly been used as speech vocoders (i.e., conditioned on mel-spectrograms) or generating relatively low sampling rate signals. In this work, we propose a high-fidelity multi-band diffusion-based framework that generates any type of audio modality (e.g., speech, music, environmental sounds) from low-bitrate discrete representations. At equal bit rate, the proposed approach outperforms state-of-the-art generative techniques in terms of perceptual quality. Training and evaluation code are available on the facebookresearch/audiocraft github project. Samples are available on the following link.
Diffusion4Audio (12)
Deep generative models can generate high-fidelity audio conditioned on various types of representations (e.g., mel-spectrograms, Mel-frequency Cepstral Coefficients (MFCC)). Recently, such models have been used to synthesize audio waveforms conditioned on highly compressed representations. Although such methods produce impressive results, they are prone to generate audible artifacts when the conditioning is flawed or imperfect. An alternative modeling approach is to use diffusion models. However, these have mainly been used as speech vocoders (i.e., conditioned on mel-spectrograms) or generating relatively low sampling rate signals. In this work, we propose a high-fidelity multi-band diffusion-based framework that generates any type of audio modality (e.g., speech, music, environmental sounds) from low-bitrate discrete representations. At equal bit rate, the proposed approach outperforms state-of-the-art generative techniques in terms of perceptual quality. Training and evaluation code are available on the facebookresearch/audiocraft github project. Samples are available on the following link.
Apple's triple threat: tariffs, AI troubles and a Fortnite fail
This week in tech: Apple struggles on multiple fronts, OpenAI grows increasingly ambitious, and Trump helps some of his fans lose money on cryptocurrency. Long dominant and unassailable, Apple is showing signs of weakness. The CEO, Tim Cook, can't tame Donald Trump's threats of tariffs that would spike the price of an iPhone; Apple's AI offerings pale against its competitors; and the company can't win a Fortnite match โ or a single battle in its legal war with Epic Games โ to save its life. On Friday, the president threatened to levy a 25% tariff on any iPhone not made in the US. Trump said in the post: "I have long ago informed Tim Cook of Apple that I expect their iPhones that will be sold in the United States of America will be manufactured and built in the United States, not India, or anyplace else. If that is not the case, a Tariff of at least 25% must be paid by Apple to the US."
OpenAI's 6.5B new acquisition signals Apple's biggest AI crisis yet
OpenAI CEO Sam Altman sits down with Shannon Bream to discuss the positives and potential negatives of artificial intelligence and the importance of maintaining a lead in the AI industry over China. OpenAI has just made a move that's turning heads across the tech world. The company is acquiring io, the AI device startup founded by Jony Ive, for nearly 6.5 billion. It's a collaboration between Sam Altman, who leads OpenAI, and the designer responsible for some of Apple's most iconic products, including the iPhone and Apple Watch. Together, they want to create a new generation of AI-powered devices that could completely change how we use technology.
Learning semantic similarity in a continuous space
We address the problem of learning semantic representation of questions to measure similarity between pairs as a continuous distance metric. Our work naturally extends Word Mover's Distance (WMD) [1] by representing text documents as normal distributions instead of bags of embedded words. Our learned metric measures the dissimilarity between two questions as the minimum amount of distance the intent (hidden representation) of one question needs to "travel" to match the intent of another question. We first learn to repeat, reformulate questions to infer intents as normal distributions with a deep generative model [2] (variational auto encoder). Semantic similarity between pairs is then learned discriminatively as an optimal transport distance metric (Wasserstein 2) with our novel variational siamese framework. Among known models that can read sentences individually, our proposed framework achieves competitive results on Quora duplicate questions dataset. Our work sheds light on how deep generative models can approximate distributions (semantic representations) to effectively measure semantic similarity with meaningful distance metrics from Information Theory.
Flexible and accurate inference and learning for deep generative models
We introduce a new approach to learning in hierarchical latent-variable generative models called the "distributed distributional code Helmholtz machine", which emphasises flexibility and accuracy in the inferential process. Like the original Helmholtz machine and later variational autoencoder algorithms (but unlike adversarial methods) our approach learns an explicit inference or "recognition" model to approximate the posterior distribution over the latent variables. Unlike these earlier methods, it employs a posterior representation that is not limited to a narrow tractable parametrised form (nor is it represented by samples). To train the generative and recognition models we develop an extended wake-sleep algorithm inspired by the original Helmholtz machine. This makes it possible to learn hierarchical latent models with both discrete and continuous variables, where an accurate posterior representation is essential. We demonstrate that the new algorithm outperforms current state-of-the-art methods on synthetic, natural image patch and the MNIST data sets.