AITopics | partial hypothesis

Collaborating Authors

partial hypothesis

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Efficient Trie-based Biasing using K-step Prediction for Rare Word Recognition

Kwok, Chin Yuen, yip, Jia Qi

arXiv.org Artificial IntelligenceSep-12-2025

Contextual biasing improves rare word recognition of ASR models by prioritizing the output of rare words during decoding. A common approach is Trie-based biasing, which gives "bonus scores" to partial hypothesis (e.g. "Bon") that may lead to the generation of the rare word (e.g. "Bonham"). If the full word ("Bonham") isn't ultimately recognized, the system revokes those earlier bonuses. This revocation is limited to beam search and is computationally expensive, particularly for models with large decoders. To overcome these limitations, we propose adapting ASR models to look ahead and predict multiple steps at once. This avoids the revocation step entirely by better estimating whether a partial hypothesis will lead to the generation of the full rare word. By fine-tuning Whisper with only 10 hours of synthetic data, our method reduces the word error rate on the NSC Part 2 test set from 30.86% to 12.19%.

artificial intelligence, machine learning, speech recognition, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.21437/Interspeech.2025-1290

2509.09196

Country: Asia (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.36)

Add feedback

Iterative Shallow Fusion of Backward Language Model for End-to-End Speech Recognition

Ogawa, Atsunori, Moriya, Takafumi, Kamo, Naoyuki, Tawara, Naohiro, Delcroix, Marc

arXiv.org Artificial IntelligenceOct-17-2023

We propose a new shallow fusion (SF) method to exploit an external backward language model (BLM) for end-to-end automatic speech recognition (ASR). The BLM has complementary characteristics with a forward language model (FLM), and the effectiveness of their combination has been confirmed by rescoring ASR hypotheses as post-processing. In the proposed SF, we iteratively apply the BLM to partial ASR hypotheses in the backward direction (i.e., from the possible next token to the start symbol) during decoding, substituting the newly calculated BLM scores for the scores calculated at the last iteration. To enhance the effectiveness of this iterative SF (ISF), we train a partial sentence-aware BLM (PBLM) using reversed text data including partial sentences, considering the framework of ISF. In experiments using an attention-based encoder-decoder ASR system, we confirmed that ISF using the PBLM shows comparable performance with SF using the FLM. By performing ISF, early pruning of prospective hypotheses can be prevented during decoding, and we can obtain a performance improvement compared to applying the PBLM as post-processing. Finally, we confirmed that, by combining SF and ISF, further performance improvement can be obtained thanks to the complementarity of the FLM and PBLM.

hypothesis, isf, speech recognition, (16 more...)

arXiv.org Artificial Intelligence

2310.1101

Country: Asia > Japan (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Implementing contextual biasing in GPU decoder for online ASR

Nigmatulina, Iuliia, Madikeri, Srikanth, Villatoro-Tello, Esaú, Motliček, Petr, Zuluaga-Gomez, Juan, Pandia, Karthik, Ganapathiraju, Aravind

arXiv.org Artificial IntelligenceJun-23-2023

GPU decoding significantly accelerates the output of ASR predictions. While GPUs are already being used for online ASR decoding, post-processing and rescoring on GPUs have not been properly investigated yet. Rescoring with available contextual information can considerably improve ASR predictions. Previous studies have proven the viability of lattice rescoring in decoding and biasing language model (LM) weights in offline and online CPU scenarios. In real-time GPU decoding, partial recognition hypotheses are produced without lattice generation, which makes the implementation of biasing more complex. The paper proposes and describes an approach to integrate contextual biasing in real-time GPU decoding while exploiting the standard Kaldi GPU decoder. Besides the biasing of partial ASR predictions, our approach also permits dynamic context switching allowing a flexible rescoring per each speech segment directly on GPU. The code is publicly released and tested with open-sourced test sets.

artificial intelligence, natural language, sequence, (18 more...)

arXiv.org Artificial Intelligence

2306.15685

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Switzerland > Vaud > Lausanne (0.04)
Europe > Czechia > South Moravian Region > Brno (0.04)
Asia > India (0.04)

Genre: Research Report (0.40)

Industry: Transportation > Air (0.69)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Graphics (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.72)

Add feedback

Personalized Predictive ASR for Latency Reduction in Voice Assistants

Schwarz, Andreas, He, Di, Van Segbroeck, Maarten, Hethnawi, Mohammed, Rastrow, Ariya

arXiv.org Artificial IntelligenceMay-23-2023

Streaming Automatic Speech Recognition (ASR) in voice assistants can utilize prefetching to partially hide the latency of response generation. Prefetching involves passing a preliminary ASR hypothesis to downstream systems in order to prefetch and cache a response. If the final ASR hypothesis after endpoint detection matches the preliminary one, the cached response can be delivered to the user, thus saving latency. In this paper, we extend this idea by introducing predictive automatic speech recognition, where we predict the full utterance from a partially observed utterance, and prefetch the response based on the predicted utterance. We introduce two personalization approaches and investigate the tradeoff between potential latency gains from successful predictions and the cost increase from failed predictions. We evaluate our methods on an internal voice assistant dataset as well as the public SLURP dataset.

artificial intelligence, machine learning, prediction, (19 more...)

arXiv.org Artificial Intelligence

2305.13794

Country:

North America > United States (0.04)
Europe > Germany (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Common-Frame Model for Object Recognition

Neural Information Processing SystemsApr-6-2023, 15:51:24 GMT

A generative probabilistic model for objects in images is presented. An object consists of a constellation of features. Feature appearance and pose are modeled probabilistically. Scene images are generated by draw- ing a set of objects from a given database, with random clutter sprinkled on the remaining image surface. We study the case where features from the same object share a common reference frame. Moreover, parameters for shape and appearance den- sities are shared across features.

hypothesis, partial hypothesis, test image, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.65)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.34)

Add feedback

Consistent Training and Decoding For End-to-end Speech Recognition Using Lattice-free MMI

Tian, Jinchuan, Yu, Jianwei, Weng, Chao, Zhang, Shi-Xiong, Su, Dan, Yu, Dong, Zou, Yuexian

arXiv.org Artificial IntelligenceDec-29-2021

Recently, End-to-End (E2E) frameworks have achieved remarkable results on various Automatic Speech Recognition (ASR) tasks. However, Lattice-Free Maximum Mutual Information (LF-MMI), as one of the discriminative training criteria that show superior performance in hybrid ASR systems, is rarely adopted in E2E ASR frameworks. In this work, we propose a novel approach to integrate LF-MMI criterion into E2E ASR frameworks in both training and decoding stages. The proposed approach shows its effectiveness on two of the most widely used E2E frameworks including Attention-Based Encoder-Decoders (AEDs) and Neural Transducers (NTs). Experiments suggest that the introduction of the LF-MMI criterion consistently leads to significant performance improvements on various datasets and different E2E ASR frameworks. The best of our models achieves competitive CER of 4.1\% / 4.4\% on Aishell-1 dev/test set; we also achieve significant error reduction on Aishell-2 and Librispeech datasets over strong baselines.

criterion, hypothesis, lf-mmi criterion, (12 more...)

arXiv.org Artificial Intelligence

2112.02498

Country:

Asia > China > Guangdong Province > Shenzhen (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.93)

Add feedback

Search-based Methods to Bound Diagnostic Probabilities in Very Large Belief Nets

Henrion, Max

arXiv.org Artificial IntelligenceMar-20-2013

Since exact probabilistic inference is intractable in general for large multiply connected belief nets, approximate methods are required. A promising approach is to use heuristic search among hypotheses (instantiations of the network) to find the most probable ones, as in the TopN algorithm. Search is based on the relative probabilities of hypotheses which are efficient to compute. Given upper and lower bounds on the relative probability of partial hypotheses, it is possible to obtain bounds on the absolute probabilities of hypotheses. Best-first search aimed at reducing the maximum error progressively narrows the bounds as more hypotheses are examined. Here, qualitative probabilistic analysis is employed to obtain bounds on the relative probability of partial hypotheses for the BN20 class of networks networks and a generalization replacing the noisy OR assumption by negative synergy. The approach is illustrated by application to a very large belief network, QMR-BN, which is a reformulation of the Internist-1 system for diagnosis in internal medicine.

artificial intelligence, expert system, machine learning, (16 more...)

arXiv.org Artificial Intelligence

1303.5721

Country: North America > United States > California (0.46)

Genre: Research Report (0.84)

Industry: Health & Medicine > Therapeutic Area > Internal Medicine (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
(3 more...)

Add feedback

Evaluating and Improving Real-Time Tracking of Children’s Oral Reading

Li, Yuanpeng (Carnegie Mellon University) | Mostow, Jack (Carnegie Mellon University)

AAAI ConferencesMay-20-2012

The accuracy of an automated reading tutor in tracking the reader’s position is affected by phenomena at the frontier of the speech recognizer’s output as it evolves in real time. We define metrics of real-time tracking accuracy computed from the recognizer’s successive partial hypotheses, in contrast to previous metrics computed from the final hypothesis. We analyze the resulting considerable loss in real-time accuracy, and propose and evaluate a method to address it. Our method raises real-time accuracy from 58% to 70%, which should improve the quality of the tutor’s feedback.

accuracy, hypothesis, speech, (16 more...)

AAAI Conferences

Twenty-Fifth International FLAIRS Conference

Country: