AITopics | Hetz, Gill

Collaborating Authors

Hetz, Gill

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Whisper in Medusa's Ear: Multi-head Efficient Decoding for Transformer-based ASR

Segal-Feldman, Yael, Shamsian, Aviv, Navon, Aviv, Hetz, Gill, Keshet, Joseph

arXiv.org Artificial IntelligenceSep-24-2024

Large transformer-based models have significant potential for speech transcription and translation. Their self-attention mechanisms and parallel processing enable them to capture complex patterns and dependencies in audio sequences. However, this potential comes with challenges, as these large and computationally intensive models lead to slow inference speeds. Various optimization strategies have been proposed to improve performance, including efficient hardware utilization and algorithmic enhancements. In this paper, we introduce Whisper-Medusa, a novel approach designed to enhance processing speed with minimal impact on Word Error Rate (WER). The proposed model extends the OpenAI's Whisper architecture by predicting multiple tokens per iteration, resulting in a 50% reduction in latency. We showcase the effectiveness of Whisper-Medusa across different learning setups and datasets.

arxiv preprint arxiv, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2409.15869

Country: North America > United States (0.97)

Genre: Research Report > Promising Solution (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

Keyword-Guided Adaptation of Automatic Speech Recognition

Shamsian, Aviv, Navon, Aviv, Glazer, Neta, Hetz, Gill, Keshet, Joseph

arXiv.org Artificial IntelligenceJun-4-2024

Automatic Speech Recognition (ASR) technology has made significant progress in recent years, providing accurate transcription across various domains. However, some challenges remain, especially in noisy environments and specialized jargon. In this paper, we propose a novel approach for improved jargon word recognition by contextual biasing Whisper-based models. We employ a keyword spotting model that leverages the Whisper encoder representation to dynamically generate prompts for guiding the decoder during the transcription process. We introduce two approaches to effectively steer the decoder towards these prompts: KG-Whisper, which is aimed at fine-tuning the Whisper decoder, and KG-Whisper-PT, which learns a prompt prefix. Our results show a significant improvement in the recognition accuracy of specified keywords and in reducing the overall word error rates. Specifically, in unseen language generalization, we demonstrate an average WER improvement of 5.1% over Whisper.

artificial intelligence, keyword, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2406.02649

Genre:

Research Report > New Finding (0.54)
Research Report > Promising Solution (0.34)

Industry:

Transportation (0.95)
Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Open-vocabulary Keyword-spotting with Adaptive Instance Normalization

Navon, Aviv, Shamsian, Aviv, Glazer, Neta, Hetz, Gill, Keshet, Joseph

arXiv.org Artificial IntelligenceSep-13-2023

Open vocabulary keyword spotting is a crucial and challenging task in automatic speech recognition (ASR) that focuses on detecting user-defined keywords within a spoken utterance. Keyword spotting methods commonly map the audio utterance and keyword into a joint embedding space to obtain some affinity score. In this work, we propose AdaKWS, a novel method for keyword spotting in which a text encoder is trained to output keyword-conditioned normalization parameters. These parameters are used to process the auditory input. We provide an extensive evaluation using challenging and diverse multi-lingual benchmarks and show significant improvements over recent keyword spotting and ASR baselines. Furthermore, we study the effectiveness of our approach on low-resource languages that were unseen during the training. The results demonstrate a substantial performance improvement compared to baseline methods.

artificial intelligence, open-vocabulary keyword-spotting, speech recognition, (1 more...)

arXiv.org Artificial Intelligence

2309.08561

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.53)

Add feedback