Goto

Collaborating Authors

 wavenet






Deep Voice 2: Multi-Speaker Neural Text-to-Speech

Andrew Gibiansky, Sercan Arik, Gregory Diamos, John Miller, Kainan Peng, Wei Ping, Jonathan Raiman, Yanqi Zhou

Neural Information Processing Systems

We introduce a technique for augmenting neural text-to-speech (TTS) with low-dimensional trainable speaker embeddings to generate different voices from a single model. As a starting point, we show improvements over the two state-of-the-art approaches for single-speaker neural TTS: Deep V oice 1 and Tacotron.


SING: Symbol-to-Instrument Neural Generator

Alexandre Defossez, Neil Zeghidour, Nicolas Usunier, Leon Bottou, Francis Bach

Neural Information Processing Systems

These embeddings are decoded by a single four-layer convolutional network to generate notes from nearly 1000 instruments, 65 pitches per instrument on average and 5 velocities.


Parametric Neural Amp Modeling with Active Learning

Grötschla, Florian, Jiao, Longxiang, Lanzendörfer, Luca A., Wattenhofer, Roger

arXiv.org Artificial Intelligence

ABSTRACT We introduce PANAMA, an active learning framework to train parametric guitar amp models end-to-end using a combination of an LSTM model and a WaveNet-like architecture. With PANAMA, one can create a virtual amp by recording samples that are determined through an ensemble-based active learning strategy to minimize the amount of datapoints needed (i.e., amp knob settings). Our strategy uses gradient-based optimization to maximize the disagreement among ensemble models, in order to identify the most informative dat-apoints. MUSHRA listening tests reveal that, with 75 data-points, our models are able to match the perceptual quality of NAM, the leading open-source non-parametric amp modeler. Index T erms-- neural amp modeling, active learning 1. INTRODUCTION In recent years, data-driven guitar amp modeling has become increasingly popular.



We thank all the reviewers for their valuable comments

Neural Information Processing Systems

We thank all the reviewers for their valuable comments. We would like to clarify that, 'When the model was trained without the mel-spectrogram loss, the training process We also think that applying the L1/L2 loss gives no disadvantage in one-to-one mapping as our work. We will clarify the details of the experiments in Section 3. Table 1: Mean Opinion Scores. All models were trained up to 500k steps. MOS evaluation results are shown in [Table 1].


Reviews: Neural Shuffle-Exchange Networks - Sequence Processing in O(n log n) Time

Neural Information Processing Systems

Also, perfect shuffles are well defined for non-powers-of-two, but the provided description only makes sense for powers of two. Perhaps this would be a good place to mention that you are restricting your attention to sequences of length 2 k, and padding inputs to allow this.