AITopics | deepspeech

2406.12931

Country:

Asia > Bangladesh (0.05)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Health Care Providers & Services (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceJun-13-2023

A Novel Scheme to classify Read and Spontaneous Speech

Kopparapu, Sunil Kumar

The COVID-19 pandemic has led to an increased use of remote telephonic interviews, making it important to distinguish between scripted and spontaneous speech in audio recordings. In this paper, we propose a novel scheme for identifying read and spontaneous speech. Our approach uses a pre-trained DeepSpeech audio-to-alphabet recognition engine to generate a sequence of alphabets from the audio. From these alphabets, we derive features that allow us to discriminate between read and spontaneous speech. Our experimental results show that even a small set of self-explanatory features can effectively classify the two types of speech very effectively.

artificial intelligence, speech, speech recognition, (16 more...)

2306.08012

Country:

Asia > India (0.05)
North America > United States (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine > Therapeutic Area (0.68)

Technology: Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)

#artificialintelligenceNov-2-2022, 14:30:39 GMT

DeepSpeech for Dummies - A Tutorial and Overview

DeepSpeech is a neural network architecture first published by a research team at Baidu. In 2017, Mozilla created an open source implementation of this paper - dubbed "Mozilla DeepSpeech". The original DeepSpeech paper from Baidu popularized the concept of "end-to-end" speech recognition models. "End-to-end" means that the model takes in audio, and directly outputs characters or words. This is compared to traditional speech recognition models, like those built with popular open source libraries such as Kaldi or CMU Sphinx, that predict phonemes, and then convert those phonemes to words in a later, downstream process. The goal of "end-to-end" models, like DeepSpeech, was to simplify the speech recognition pipeline into a single model. In addition, the theory introduced by the Baidu research paper was that training large deep learning models, on large amounts of data, would yield better performance than classical speech recognition models.

artificial intelligence, deepspeech, machine learning, (15 more...)

Genre: Instructional Material > Course Syllabus & Notes (0.40)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

Eberhard, Onno, Zesch, Torsten

Effects of Layer Freezing on Transferring a Speech Recognition System to Under-resourced Languages

arXiv.org Artificial IntelligenceOct-4-2022

In this paper, we investigate the effect of layer freezing on the effectiveness of model transfer in the area of automatic speech recognition. We experiment with Mozilla's DeepSpeech architecture on German and Swiss German speech datasets and compare the results of either training from scratch vs. transferring a pre-trained model. We compare different layer freezing schemes and find that even freezing only one layer already significantly improves results.

artificial intelligence, freezing, machine learning, (15 more...)

2102.04097

Country:

Europe > Germany (0.05)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > Canada > Quebec > Montreal (0.04)
(6 more...)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.53)

#artificialintelligenceApr-29-2022, 06:21:09 GMT

The real cost of cloud computing - VentureBeat - UrIoTNews

We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 – 28. The public cloud is growing rapidly and the market for the technology is expected to reach $1.3 trillion by 2025. The cloud has revolutionized the computing industry and enabled many applications, business models and enterprises, which otherwise wouldn't have been possible. Immediate availability, scalability, minimal capital expenditure and streamlined developer experience are its main advantages -- but it comes at a cost. Due to a lack of in-house infrastructure optimization capabilities, most enterprises stick to the cloud even after achieving certain maturity. To keep cloud spending under control, enterprises have built or acquired tools and services.

artificial intelligence, cloud computing, cloud repatriation, (15 more...)

Industry:

Information Technology > Services (0.52)
Information Technology > Security & Privacy (0.51)

Technology:

Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Speech (0.36)

Markert, Karla, Parracone, Romain, Kulakov, Mykhailo, Sperl, Philip, Kao, Ching-Yu, Böttinger, Konstantin

Visualizing Automatic Speech Recognition -- Means for a Better Understanding?

arXiv.org Artificial IntelligenceFeb-1-2022

Automatic speech recognition (ASR) is improving ever more at mimicking human speech processing. The functioning of ASR, however, remains to a large extent obfuscated by the complex structure of the deep neural networks (DNNs) they are based on. In this paper, we show how so-called attribution methods, that we import from image recognition and suitably adapt to handle audio data, can help to clarify the working of ASR. Taking DeepSpeech, an end-to-end model for ASR, as a case study, we show how these techniques help to visualize which features of the input are the most influential in determining the output. We focus on three visualization techniques: Layer-wise Relevance Propagation (LRP), Saliency Maps, and Shapley Additive Explanations (SHAP). We compare these methods and discuss potential further applications, such as in the detection of adversarial examples.

artificial intelligence, attribution, machine learning, (17 more...)

doi: 10.21437/SPSC.2021-4

2202.00673

Country:

Europe > Italy > Marche > Ancona Province > Ancona (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Tripathi, Ayush, Bhosale, Swapnil, Kopparapu, Sunil Kumar

Automatic Speaker Independent Dysarthric Speech Intelligibility Assessment System

arXiv.org Artificial IntelligenceMar-10-2021

Dysarthria is a condition which hampers the ability of an individual to control the muscles that play a major role in speech delivery. The loss of fine control over muscles that assist the movement of lips, vocal chords, tongue and diaphragm results in abnormal speech delivery. One can assess the severity level of dysarthria by analyzing the intelligibility of speech spoken by an individual. Continuous intelligibility assessment helps speech language pathologists not only study the impact of medication but also allows them to plan personalized therapy. It helps the clinicians immensely if the intelligibility assessment system is reliable, automatic, simple for (a) patients to undergo and (b) clinicians to interpret. Lack of availability of dysarthric data has resulted in development of speaker dependent automatic intelligibility assessment systems which requires patients to speak a large number of utterances. In this paper, we propose (a) a cost minimization procedure to select an optimal (small) number of utterances that need to be spoken by the dysarthric patient, (b) four different speaker independent intelligibility assessment systems which require the patient to speak a small number of words, and (c) the assessment score is close to the perceptual score that the Speech Language Pathologist (SLP) can relate to. The need for small number of utterances to be spoken by the patient and the score being relatable to the SLP benefits both the dysarthric patient and the clinician from usability perspective.

artificial intelligence, machine learning, natural language, (20 more...)

doi: 10.1016/j.csl.2021.101213

2103.06157

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > California > San Diego County > Oceanside (0.04)
North America > Canada > Quebec > Montreal (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Musculoskeletal (1.00)
Education > Assessment & Standards > Assessment Methods (1.00)
Health & Medicine > Therapeutic Area > Neurology > Parkinson's Disease (0.46)
Health & Medicine > Therapeutic Area > Oncology > Head & Neck Cancer (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceFeb-5-2021

Audio Adversarial Examples: Attacks Using Vocal Masks

Tay, Kai Yuan, Ng, Lynnette, Chua, Wei Han, Loke, Lucerne, Ye, Danqi, Chua, Melissa

We construct audio adversarial examples on automatic Speech-To-Text systems . Given any audio waveform, we produce an another by overlaying an audio vocal mask generated from the original audio. We apply our audio adversarial attack to five SOTA STT systems: DeepSpeech, Julius, Kaldi, wav2letter@anywhere and CMUSphinx. In addition, we engaged human annotators to transcribe the adversarial audio. Our experiments show that these adversarial examples fool State-Of-The-Art Speech-To-Text systems, yet humans are able to consistently pick out the speech. The feasibility of this attack introduces a new domain to study machine and human perception of speech.

adversarial example, machine learning, natural language, (20 more...)

2102.02417

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia (0.04)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

#artificialintelligenceMar-3-2020, 00:55:57 GMT

DeepSpeech 0.6: Mozilla's Speech-to-Text Engine Gets Fast, Lean, and Ubiquitous – Mozilla Hacks - the Web developer blog

The Machine Learning team at Mozilla continues work on DeepSpeech, an automatic speech recognition (ASR) engine which aims to make speech recognition technology and trained models openly available to developers. DeepSpeech is a deep learning-based ASR engine with a simple API. We also provide pre-trained English models. Our latest release, version v0.6, offers the highest quality, most feature-packed model so far. In this overview, we'll show how DeepSpeech can transform your applications by enabling client-side, low-latency, and privacy-preserving speech recognition capabilities.

artificial intelligence, deepspeech, machine learning, (18 more...)

Country: Oceania > New Zealand (0.05)

Industry: Health & Medicine (0.49)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

#artificialintelligenceDec-7-2019, 12:47:51 GMT

Mozilla updates DeepSpeech with an English language model that runs 'faster than real time'

DeepSpeech, a suite of speech-to-text and text-to-speech engines maintained by Mozilla's Machine Learning Group, this morning received an update (to version 0.6) that incorporates one of the fastest open source speech recognition models to date. In a blog post, senior research engineer Reuben Morais lays out what's new and enhanced, as well as other spotlight features coming down the pipeline. The latest version of DeepSpeech adds support for TensorFlow Lite, a version of Google's TensorFlow machine learning framework that's optimized for compute-constrained mobile and embedded devices. It has reduced DeepSpeech's package size from 98MB to 3.7MB and its built-in English model size -- which has a 7.5% word error rate on a popular benchmark and which was trained on 5,516 hours of transcribed audio from WAMU (NPR), LibriSpeech, Fisher, Switchboard, and Mozilla's Common Voice English data sets -- from 188MB to 47MB. Plus, it has cut down DeepSpeech's memory consumption by 22 times and boosted its startup speed by over 500 times.

deepspeech, machine learning, natural language, (10 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.88)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.57)