AITopics | asr service

Collaborating Authors

asr service

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Addressing Pitfalls in Auditing Practices of Automatic Speech Recognition Technologies: A Case Study of People with Aphasia

Mei, Katelyn Xiaoying, Choi, Anna Seo Gyeong, Schellmann, Hilke, Sloane, Mona, Koenecke, Allison

arXiv.org Artificial IntelligenceJul-14-2025

Automatic Speech Recognition (ASR) has transformed daily tasks from video transcription to workplace hiring. ASR systems' growing use warrants robust and standardized auditing approaches to ensure automated transcriptions of high and equitable quality. This is especially critical for people with speech and language disorders (such as aphasia) who may disproportionately depend on ASR systems to navigate everyday life. In this work, we identify three pitfalls in existing standard ASR auditing procedures, and demonstrate how addressing them impacts audit results via a case study of six popular ASR systems' performance for aphasia speakers. First, audits often adhere to a single method of text standardization during data pre-processing, which (a) masks variability in ASR performance from applying different standardization methods, and (b) may not be consistent with how users - especially those from marginalized speech communities - would want their transcriptions to be standardized. Second, audits often display high-level demographic findings without further considering performance disparities among (a) more nuanced demographic subgroups, and (b) relevant covariates capturing acoustic information from the input audio. Third, audits often rely on a single gold-standard metric -- the Word Error Rate -- which does not fully capture the extent of errors arising from generative AI models, such as transcription hallucinations. We propose a more holistic auditing framework that accounts for these three pitfalls, and exemplify its results in our case study, finding consistently worse ASR performance for aphasia speakers relative to a control group. We call on practitioners to implement these robust ASR auditing practices that remain flexible to the rapidly changing ASR landscape.

artificial intelligence, audio file, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2506.08846

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Virginia > Albemarle County > Charlottesville (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
(8 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area (0.92)
Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

You don't understand me!: Comparing ASR results for L1 and L2 speakers of Swedish

Cumbal, Ronald, Moell, Birger, Lopes, Jose, Engwall, Olof

arXiv.org Artificial IntelligenceMay-22-2024

The performance of Automatic Speech Recognition (ASR) systems has constantly increased in state-of-the-art development. However, performance tends to decrease considerably in more challenging conditions (e.g., background noise, multiple speaker social conversations) and with more atypical speakers (e.g., children, non-native speakers or people with speech disorders), which signifies that general improvements do not necessarily transfer to applications that rely on ASR, e.g., educational software for younger students or language learners. In this study, we focus on the gap in performance between recognition results for native and non-native, read and spontaneous, Swedish utterances transcribed by different ASR services. We compare the recognition results using Word Error Rate and analyze the linguistic factors that may generate the observed transcription errors.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2405.13379

Country:

Europe > Sweden (0.14)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom (0.04)

Genre:

Research Report > Experimental Study (0.47)
Research Report > New Finding (0.35)

Industry: Education > Educational Setting (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.95)

Add feedback

Global Performance Disparities Between English-Language Accents in Automatic Speech Recognition

DiChristofano, Alex, Shuster, Henry, Chandra, Shefali, Patwari, Neal

arXiv.org Artificial IntelligenceFeb-8-2023

However, many users are familiar with the frustrating experience of repeatedly not being understood by their voice assistant [16], so much so that frustration with ASR has become a culturally-shared source of comedy [4, 32]. Bias auditing of ASR services has quantified these experiences. English language ASR has higher error rates: for Black Americans compared to white Americans [24, 45], for stigmatised British accents compared to favored British accents [28], for Scottish speakers compared to speakers from California and New Zealand [44], for speakers whose first language is a tone language compared to those whose first language is not [2], for speakers with Indian accents compared to speakers who with "American" accents [31], for speakers whose first language is English compared to those for whom it is not [28]. It should go without saying, but everyone has an accent - there is no "unaccented" version of English [26]. Due to colonization and globalization, different Englishes are spoken around the world. While some English accents may be favored by those with class, race, and national origin privilege [28], there is no technical barrier to building an ASR system which works well on any particular accent. So we are left with the question, why does ASR performance vary as it does as a function of the global English accent spoken?

artificial intelligence, asr service, speech recognition, (13 more...)

arXiv.org Artificial Intelligence

2208.01157

Country:

Oceania > New Zealand (0.24)
North America > United States > Missouri > St. Louis County > St. Louis (0.04)
North America > United States > New York > New York County > New York City (0.04)
(25 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (1.00)
Education (0.93)
Government > Regional Government (0.46)

Technology: Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)

Add feedback

SpeechNet: Weakly Supervised, End-to-End Speech Recognition at Industrial Scale

Tang, Raphael, Kumar, Karun, Yang, Gefei, Pandey, Akshat, Mao, Yajie, Belyaev, Vladislav, Emmadi, Madhuri, Murray, Craig, Ture, Ferhan, Lin, Jimmy

arXiv.org Artificial IntelligenceNov-21-2022

End-to-end automatic speech recognition systems represent the state of the art, but they rely on thousands of hours of manually annotated speech for training, as well as heavyweight computation for inference. Of course, this impedes commercialization since most companies lack vast human and computational resources. In this paper, we explore training and deploying an ASR system in the label-scarce, compute-limited setting. To reduce human labor, we use a third-party ASR system as a weak supervision source, supplemented with labeling functions derived from implicit user feedback. To accelerate inference, we propose to route production-time queries across a pool of CUDA graphs of varying input lengths, the distribution of which best matches the traffic's. Compared to our third-party ASR, we achieve a relative improvement in word-error rate of 8% and a speedup of 600%. Our system, called SpeechNet, currently serves 12 million queries per day on our voice-enabled smart television. To our knowledge, this is the first time a large-scale, Wav2vec-based deployment has been described in the academic literature.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2211.1174

Country: North America > United States (0.04)

Genre: Research Report (0.82)

Industry: Information Technology > Services (0.47)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Why AI startups have different economics from classic SaaS startups

#artificialintelligenceMar-16-2020, 05:21:04 GMT

Let's rewind the clock a bit. Back in the day, software vendors would write code, package it, and often distribute physically (through those nifty things called CDs). In this old world, buyers were shouldering most of the operational costs, such as running the applications that they bought on their own local data and compute centers (or laptops and desktops). Then came the advent of faster Internet speeds and cloud computing, which really opened up software development and deployment to a whole new world. With that, we started to see a dramatic shift of infrastructure costs back to the software vendor. That is, under the SaaS world, vendors host and manage web apps in their own data centers or cloud environments, allowing buyers to gradually decrease their investment and expenses associated with managing infrastructure.

artificial intelligence, cloud computing, startup, (17 more...)

#artificialintelligence

Industry: Information Technology > Software (0.95)

Technology:

Information Technology > Cloud Computing (1.00)
Information Technology > Communications > Web (0.95)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.31)

Add feedback