AITopics | Pusateri, Ernest

Collaborating Authors

Pusateri, Ernest

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Contextualization of ASR with LLM using phonetic retrieval-based augmentation

Lei, Zhihong, Na, Xingyu, Xu, Mingbin, Pusateri, Ernest, Van Gysel, Christophe, Zhang, Yuanyuan, Han, Shiyi, Huang, Zhen

arXiv.org Artificial IntelligenceSep-11-2024

Large language models (LLMs) have shown superb capability of modeling multimodal signals including audio and text, allowing the model to generate spoken or textual response given a speech input. However, it remains a challenge for the model to recognize personal named entities, such as contacts in a phone book, when the input modality is speech. In this work, we start with a speech recognition task and propose a retrieval-based solution to contextualize the LLM: we first let the LLM detect named entities in speech without any context, then use this named entity as a query to retrieve phonetically similar named entities from a personal database and feed them to the LLM, and finally run context-aware LLM decoding. In a voice assistant task, our solution achieved up to 30.2% relative word error rate reduction and 73.6% relative named entity error rate reduction compared to a baseline system without contextualization. Notably, our solution by design avoids prompting the LLM with the full named entity database, making it highly efficient and applicable to large named entity databases.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2409.15353

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Personalization of CTC-based End-to-End Speech Recognition Using Pronunciation-Driven Subword Tokenization

Lei, Zhihong, Pusateri, Ernest, Han, Shiyi, Liu, Leo, Xu, Mingbin, Ng, Tim, Travadi, Ruchir, Zhang, Youyuan, Hannemann, Mirko, Siu, Man-Hung, Huang, Zhen

arXiv.org Artificial IntelligenceOct-15-2023

Recent advances in deep learning and automatic speech recognition have improved the accuracy of end-to-end speech recognition systems, but recognition of personal content such as contact names remains a challenge. In this work, we describe our personalization solution for an end-to-end speech recognition system based on connectionist temporal classification. Building on previous work, we present a novel method for generating additional subword tokenizations for personal entities from their pronunciations. We show that using this technique in combination with two established techniques, contextual biasing and wordpiece prior normalization, we are able to achieve personal named entity accuracy on par with a competitive hybrid system.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2310.09988

Country: Europe > Czechia (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Acoustic Model Fusion for End-to-end Speech Recognition

Lei, Zhihong, Xu, Mingbin, Han, Shiyi, Liu, Leo, Huang, Zhen, Ng, Tim, Zhang, Yuanyuan, Pusateri, Ernest, Hannemann, Mirko, Deng, Yaqiao, Siu, Man-Hung

arXiv.org Artificial IntelligenceOct-10-2023

Recent advances in deep learning and automatic speech recognition (ASR) have enabled the end-to-end (E2E) ASR system and boosted the accuracy to a new level. The E2E systems implicitly model all conventional ASR components, such as the acoustic model (AM) and the language model (LM), in a single network trained on audio-text pairs. Despite this simpler system architecture, fusing a separate LM, trained exclusively on text corpora, into the E2E system has proven to be beneficial. However, the application of LM fusion presents certain drawbacks, such as its inability to address the domain mismatch issue inherent to the internal AM. Drawing inspiration from the concept of LM fusion, we propose the integration of an external AM into the E2E system to better address the domain mismatch. By implementing this novel approach, we have achieved a significant reduction in the word error rate, with an impressive drop of up to 14.3% across varied test sets. We also discovered that this AM fusion approach is particularly beneficial in enhancing named entity recognition.

end-to-end speech recognition, natural language, text processing, (2 more...)

arXiv.org Artificial Intelligence

2310.07062

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.89)

Add feedback

Neural Language Model Pruning for Automatic Speech Recognition

Emili, Leonardo, Fraga-Silva, Thiago, Pusateri, Ernest, Nußbaum-Thom, Markus, Oualil, Youssef

arXiv.org Artificial IntelligenceOct-5-2023

We study model pruning methods applied to Transformer-based neural network language models for automatic speech recognition. We explore three aspects of the pruning frame work, namely criterion, method and scheduler, analyzing their contribution in terms of accuracy and inference speed. To the best of our knowledge, such in-depth analyses on large-scale recognition systems has not been reported in the literature. In addition, we propose a variant of low-rank approximation suitable for incrementally compressing models, and delivering multiple models with varied target sizes. Among other results, we show that a) data-driven pruning outperforms magnitude-driven in several scenarios; b) incremental pruning achieves higher accuracy compared to one-shot pruning, especially when targeting smaller sizes; and c) low-rank approximation presents the best trade-off between size reduction and inference speed-up for moderate compression.

artificial intelligence, machine learning, natural language, (2 more...)

arXiv.org Artificial Intelligence

2310.03424

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.89)

Add feedback