reverb
Reverb: Open-Source ASR and Diarization from Rev
Bhandari, Nishchal, Chen, Danny, Fernández, Miguel Ángel del Río, Delworth, Natalie, Fox, Jennifer Drexler, Jetté, Migüel, McNamara, Quinten, Miller, Corey, Novotný, Ondřej, Profant, Ján, Qin, Nan, Ratajczak, Martin, Robichaud, Jean-Philippe
Today, we are open-sourcing our core speech recognition and diarization models for non-commercial use. We are releasing both a full production pipeline for developers as well as pared-down research models for experimentation. Rev hopes that these releases will spur research and innovation in the fast-moving domain of voice technology. The speech recognition models released today outperform all existing open source speech recognition models across a variety of long-form speech recognition domains.
GEAR: A GPU-Centric Experience Replay System for Large Reinforcement Learning Models
Wang, Hanjing, Sit, Man-Kit, He, Congjie, Wen, Ying, Zhang, Weinan, Wang, Jun, Yang, Yaodong, Mai, Luo
This paper introduces a distributed, GPU-centric experience replay system, GEAR, designed to perform scalable reinforcement learning (RL) with large sequence models (such as transformers). With such models, existing systems such as Reverb face considerable bottlenecks in memory, computation, and communication. GEAR, however, optimizes memory efficiency by enabling the memory resources on GPU servers (including host memory and device memory) to manage trajectory data. Furthermore, it facilitates decentralized GPU devices to expedite various trajectory selection strategies, circumventing computational bottlenecks. GEAR is equipped with GPU kernels capable of collecting trajectories using zero-copy access to host memory, along with remote-directed-memory access over InfiniBand, improving communication efficiency. Cluster experiments have shown that GEAR can achieve performance levels up to 6x greater than Reverb when training state-of-the-art large RL models. GEAR is open-sourced at https://github.com/bigrl-team/gear.
Unsupervised vocal dereverberation with diffusion-based generative models
Saito, Koichi, Murata, Naoki, Uesaka, Toshimitsu, Lai, Chieh-Hsin, Takida, Yuhta, Fukui, Takao, Mitsufuji, Yuki
Removing reverb from reverberant music is a necessary technique to clean up audio for downstream music manipulations. Reverberation of music contains two categories, natural reverb, and artificial reverb. Artificial reverb has a wider diversity than natural reverb due to its various parameter setups and reverberation types. However, recent supervised dereverberation methods may fail because they rely on sufficiently diverse and numerous pairs of reverberant observations and retrieved data for training in order to be generalizable to unseen observations during inference. To resolve these problems, we propose an unsupervised method that can remove a general kind of artificial reverb for music without requiring pairs of data for training. The proposed method is based on diffusion models, where it initializes the unknown reverberation operator with a conventional signal processing technique and simultaneously refines the estimate with the help of diffusion models. We show through objective and perceptual evaluations that our method outperforms the current leading vocal dereverberation benchmarks.
Impulse Response -- data augmentation for audio deep learning
In recent years, deep learning for audio has come a long way with models beating traditional signal processing techniques in many of the downstream tasks. However, many such solutions are trained on "homogeneous" datasets -- datasets where there is little variability in the recording conditions (noise, accent, language, etc.). Many such models do not perform very well (especially audio conversion/synthesis tasks) when used on real world "audio events" which can contain short burst, environment noises, background speakers, poor microphones, etc. While there are many techniques address them, here we concern ourselves with data augmentation with impulse responses, which at times can be really powerful since it simulates different recording environments. An impulse response of a dynamic system describes how it reacts when presented with a brief input signal called the impulse.
Reverb: A Framework For Experience Replay
Cassirer, Albin, Barth-Maron, Gabriel, Brevdo, Eugene, Ramos, Sabela, Boyd, Toby, Sottiaux, Thibault, Kroiss, Manuel
A central component of training in Reinforcement Learning (RL) is Experience: the data used for training. The mechanisms used to generate and consume this data have an important effect on the performance of RL algorithms. In this paper, we introduce Reverb: an efficient, extensible, and easy to use system designed specifically for experience replay in RL. Reverb is designed to work efficiently in distributed configurations with up to thousands of concurrent clients. The flexible API provides users with the tools to easily and accurately configure the replay buffer. It includes strategies for selecting and removing elements from the buffer, as well as options for controlling the ratio between sampled and inserted elements. This paper presents the core design of Reverb, gives examples of how it can be applied, and provides empirical results of Reverb's performance characteristics.
Cooper FX Arcades review: Plumbing the depths of lo-fi guitar effects
Let's get one thing out of the way right up front: Yes, the main conceit of the $329 Cooper FX Arcades is a little gimmicky. It's a guitar pedal into which you stick cards to apply different effects, kinda like a game console. But while the somewhat novel approach to building a multi-effects unit may have helped Arcades garner attention, this pedal is no mere gimmick. A post shared by Tom Majeski (@cooper.fx) Tom Majeski of Cooper FX is not the first person to take this approach. Line 6 had its ToneCore line of pedals in the mid'aughts, Elta had the Console and TipTop Audio sells the Z-DSP. But Z-DSP is a eurorack module, not a guitar pedal.
Real Time Speech Enhancement in the Waveform Domain
Defossez, Alexandre, Synnaeve, Gabriel, Adi, Yossi
We present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is optimized on both time and frequency domains, using multiple loss functions. Empirical evidence shows that it is capable of removing various kinds of background noise including stationary and non-stationary noises, as well as room reverb. Additionally, we suggest a set of data augmentation techniques applied directly on the raw waveform which further improve model performance and its generalization abilities. We perform evaluations on several standard benchmarks, both using objective metrics and human judgements. The proposed model matches state-of-the-art performance of both causal and non causal methods while working directly on the raw waveform.
sonible's AI Powered smart:reverb Delivers A Custom Reverb For Every Input Signal
Often you have to be careful whether the AI addition is only a marketing phrase to sell the plugin better or if it's really useful for the musician. There is a Synthesizer plugin on the market where the former has been confirmed. They already showed this with their intelligent EQ plugins. With their new reverb "smart:reverb", they continue this idea and also use their AI. In this case, the technology is used to creating custom-tailored reverb by adjusting its processing to the individual characteristics of the input material.
Reverb: a framework for experience replay
The use of experience plays a key role in reinforcement learning (RL). How best to use this data is one of the central problems of this field. As RL agents have advanced over recent years, taking on bigger and more complex problems (Atari, Go, StarCraft, Dota), the generated data has grown in both size and complexity. To cope with this complexity many RL systems split the learning problem into two distinct parts: experience producers (actors) and experience consumers (learners) — allowing these different parts to run in parallel. Often a data storage system lies at the intersection between these two components. The question of how to efficiently store and transport the data is itself a challenging engineering problem.