AITopics | Braun, Stefan

Plotting

Braun, Stefan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Approximate Nearest Neighbour Phrase Mining for Contextual Speech Recognition

Bleeker, Maurits, Swietojanski, Pawel, Braun, Stefan, Zhuang, Xiaodan

arXiv.org Artificial IntelligenceAug-16-2023

This paper presents an extension to train end-to-end Context-Aware Transformer Transducer ( CATT ) models by using a simple, yet efficient method of mining hard negative phrases from the latent space of the context encoder. During training, given a reference query, we mine a number of similar phrases using approximate nearest neighbour search. These sampled phrases are then used as negative examples in the context list alongside random and ground truth contextual information. By including approximate nearest neighbour phrases (ANN-P) in the context list, we encourage the learned representation to disambiguate between similar, but not identical, biasing phrases. This improves biasing accuracy when there are several similar phrases in the biasing inventory. We carry out experiments in a large-scale data regime obtaining up to 7% relative word error rate reductions for the contextual portion of test data. We also extend and evaluate CATT approach in streaming applications.

artificial intelligence, inductive learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2304.08862

Country: Europe > Netherlands (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Variable Attention Masking for Configurable Transformer Transducer Speech Recognition

Swietojanski, Pawel, Braun, Stefan, Can, Dogan, da Silva, Thiago Fraga, Ghoshal, Arnab, Hori, Takaaki, Hsiao, Roger, Mason, Henry, McDermott, Erik, Silovsky, Honza, Travadi, Ruchir, Zhuang, Xiaodan

arXiv.org Artificial IntelligenceApr-18-2023

This work studies the use of attention masking in transformer transducer based speech recognition for building a single configurable model for different deployment scenarios. We present a comprehensive set of experiments comparing fixed masking, where the same attention mask is applied at every frame, with chunked masking, where the attention mask for each frame is determined by chunk boundaries, in terms of recognition accuracy and latency. We then explore the use of variable masking, where the attention masks are sampled from a target distribution at training time, to build models that can work in different configurations. Finally, we investigate how a single configurable model can be used to perform both first pass streaming recognition and second pass acoustic rescoring. Experiments show that chunked masking achieves a better accuracy vs latency trade-off compared to fixed masking, both with and without FastEmit. We also show that variable masking improves the accuracy by up to 8% relative in the acoustic re-scoring scenario.

artificial intelligence, machine learning, speech recognition, (17 more...)

arXiv.org Artificial Intelligence

2211.01438

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.74)

Add feedback

Neural Transducer Training: Reduced Memory Consumption with Sample-wise Computation

Braun, Stefan, McDermott, Erik, Hsiao, Roger

arXiv.org Artificial IntelligenceMar-13-2023

The neural transducer is an end-to-end model for automatic speech recognition (ASR). While the model is well-suited for streaming ASR, the training process remains challenging. During training, the memory requirements may quickly exceed the capacity of state-of-the-art GPUs, limiting batch size and sequence lengths. In this work, we analyze the time and space complexity of a typical transducer training setup. We propose a memory-efficient training method that computes the transducer loss and gradients sample by sample. We present optimizations to increase the efficiency and parallelism of the sample-wise method. In a set of thorough benchmarks, we show that our sample-wise method significantly reduces memory usage, and performs at competitive speed when compared to the default batched computation. As a highlight, we manage to compute the transducer loss and gradients for a batch size of 1024, and audio length of 40 seconds, using only 6 GB of memory.

artificial intelligence, computation, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2211.1627

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.56)

Add feedback

SapAugment: Learning A Sample Adaptive Policy for Data Augmentation

Hu, Ting-Yao, Shrivastava, Ashish, Chang, Rick, Koppula, Hema, Braun, Stefan, Hwang, Kyuyeon, Kalini, Ozlem, Tuzel, Oncel

arXiv.org Machine LearningNov-2-2020

Data augmentation methods usually apply the same augmentation (or a mix of them) to all the training samples. For example, to perturb data with noise, the noise is sampled from a Normal distribution with a fixed standard deviation, for all samples. We hypothesize that a hard sample with high training loss already provides strong training signal to update the model parameters and should be perturbed with mild or no augmentation. Perturbing a hard sample with a strong augmentation may also make it too hard to learn from. Furthermore, a sample with low training loss should be perturbed by a stronger augmentation to provide more robustness to a variety of conditions. To formalize these intuitions, we propose a novel method to learn a Sample-Adaptive Policy for Augmentation -- SapAugment. Our policy adapts the augmentation parameters based on the training loss of the data samples. In the example of Gaussian noise, a hard sample will be perturbed with a low variance noise and an easy sample with a high variance noise. Furthermore, the proposed method combines multiple augmentation methods into a methodical policy learning framework and obviates hand-crafting augmentation parameters by trial-and-error. We apply our method on an automatic speech recognition (ASR) task, and combine existing and novel augmentations using the proposed framework. We show substantial improvement, up to 21% relative reduction in word error rate on LibriSpeech dataset, over the state-of-the-art speech augmentation method.

augmentation, neural network, speech recognition, (20 more...)

arXiv.org Machine Learning

2011.01156

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.57)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

LSTM Benchmarks for Deep Learning Frameworks

Braun, Stefan

arXiv.org Machine LearningJun-5-2018

This study provides benchmarks for different implementations of LSTM units between the deep learning frameworks PyTorch, TensorFlow, Lasagne and Keras. The comparison includes cuDNN LSTMs, fused LSTM variants and less optimized, but more flexible LSTM implementations. The benchmarks reflect two typical scenarios for automatic speech recognition, notably continuous speech recognition and isolated digit recognition. These scenarios cover input sequences of fixed and variable length as well as the loss functions CTC and cross entropy. Additionally, a comparison between four different PyTorch versions is included. The code is available online https://github.com/stefbraun/rnn_benchmarks.

deep learning, implementation, neural network, (20 more...)

arXiv.org Machine Learning

1806.01818

Genre: Research Report > Experimental Study (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback