AITopics | kw system

Collaborating Authors

kw system

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Zero-Shot KWS for Children's Speech using Layer-Wise Features from SSL Models

Kutum, Subham, Sinha, Abhijit, Kathania, Hemant Kumar, Kadiri, Sudarsana Reddy, Govil, Mahesh Chandra

arXiv.org Artificial IntelligenceSep-1-2025

Numerous methods have been proposed to enhance Keyword Spotting (KWS) in adult speech, but children's speech presents unique challenges for KWS systems due to its distinct acoustic and linguistic characteristics. This paper introduces a zero-shot KWS approach that leverages state-of-the-art self-supervised learning (SSL) models, including Wav2Vec2, HuBERT and Data2Vec. Features are extracted layer-wise from these SSL models and used to train a Kaldi-based DNN KWS system. The WSJCAM0 adult speech dataset was used for training, while the PFSTAR children's speech dataset was used for testing, demonstrating the zero-shot capability of our method. Our approach achieved state-of-the-art results across all keyword sets for children's speech. Notably, the Wav2Vec2 model, particularly layer 22, performed the best, delivering an ATWV score of 0.691, a MTWV score of 0.7003 and probability of false alarm and probability of miss of 0.0164 and 0.0547 respectively, for a set of 30 keywords. Furthermore, age-specific performance evaluation confirmed the system's effectiveness across different age groups of children. To assess the system's robustness against noise, additional experiments were conducted using the best-performing layer of the best-performing Wav2Vec2 model. The results demonstrated a significant improvement over traditional MFCC-based baseline, emphasizing the potential of SSL embeddings even in noisy conditions. To further generalize the KWS framework, the experiments were repeated for an additional CMU dataset. Overall the results highlight the significant contribution of SSL features in enhancing Zero-Shot KWS performance for children's speech, effectively addressing the challenges associated with the distinct characteristics of child speakers.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.patrec.2025.08.010

2508.21248

Country: North America > United States > California (0.46)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Multiple-Instance, Cascaded Classification for Keyword Spotting in Narrow-Band Audio

AbdulKader, Ahmad, Nassar, Kareem, El-Geish, Mohamed, Galvez, Daniel, Patil, Chetan

arXiv.org Artificial IntelligenceApr-28-2025

We propose using cascaded classifiers for a keyword spotting (KWS) task on narrow-band (NB), 8kHz audio acquired in non-IID environments -- a more challenging task than most state-of-the-art KWS systems face. We present a model that incorporates Deep Neural Networks (DNNs), cascading, multiple-feature representations, and multiple-instance learning. The cascaded classifiers handle the task's class imbalance and reduce power consumption on computationally-constrained devices via early termination. The KWS system achieves a false negative rate of 6% at an hourly false positive rate of 0.75

artificial intelligence, classifier, machine learning, (16 more...)

arXiv.org Artificial Intelligence

1711.08058

Country:

North America > United States (0.29)
Europe (0.29)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

A Literature Review of Keyword Spotting Technologies for Urdu

Rizvi, Syed Muhammad Aqdas

arXiv.org Artificial IntelligenceSep-16-2024

This literature review surveys the advancements of keyword spotting (KWS) technologies, specifically focusing on Urdu, Pakistan's low-resource language (LRL), which has complex phonetics. Despite the global strides in speech technology, Urdu presents unique challenges requiring more tailored solutions. The review traces the evolution from foundational Gaussian Mixture Models to sophisticated neural architectures like deep neural networks and transformers, highlighting significant milestones such as integrating multi-task learning and self-supervised approaches that leverage unlabeled data. It examines emerging technologies' role in enhancing KWS systems' performance within multilingual and resource-constrained settings, emphasizing the need for innovations that cater to languages like Urdu. Thus, this review underscores the need for context-specific research addressing the inherent complexities of Urdu and similar URLs and the means of regions communicating through such languages for a more inclusive approach to speech technology.

keyword, speech technology, urdu, (15 more...)

arXiv.org Artificial Intelligence

2409.16317

Country:

Asia > Pakistan (0.25)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)

Genre: Overview (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Disentangled Training with Adversarial Examples For Robust Small-footprint Keyword Spotting

Wang, Zhenyu, Wan, Li, Zhang, Biqiao, Huang, Yiteng, Li, Shang-Wen, Sun, Ming, Lei, Xin, Yang, Zhaojun

arXiv.org Artificial IntelligenceAug-23-2024

A keyword spotting (KWS) engine that is continuously running on device is exposed to various speech signals that are usually unseen before. It is a challenging problem to build a small-footprint and high-performing KWS model with robustness under different acoustic environments. In this paper, we explore how to effectively apply adversarial examples to improve KWS robustness. We propose datasource-aware disentangled learning with adversarial examples to reduce the mismatch between the original and adversarial data as well as the mismatch across original training datasources. The KWS model architecture is based on depth-wise separable convolution and a simple attention module. Experimental results demonstrate that the proposed learning strategy improves false reject rate by $40.31%$ at $1%$ false accept rate on the internal dataset, compared to the strongest baseline without using adversarial examples. Our best-performing system achieves $98.06%$ accuracy on the Google Speech Commands V1 dataset.

adversarial example, dataset, keyword, (16 more...)

arXiv.org Artificial Intelligence

2408.13355

Country:

South America (0.04)
Oceania > New Zealand (0.04)
North America > United States > Texas (0.04)
(3 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Written Term Detection Improves Spoken Term Detection

Yusuf, Bolaji, Saraçlar, Murat

arXiv.org Artificial IntelligenceJul-5-2024

End-to-end (E2E) approaches to keyword search (KWS) are considerably simpler in terms of training and indexing complexity when compared to approaches which use the output of automatic speech recognition (ASR) systems. This simplification however has drawbacks due to the loss of modularity. In particular, where ASR-based KWS systems can benefit from external unpaired text via a language model, current formulations of E2E KWS systems have no such mechanism. Therefore, in this paper, we propose a multitask training objective which allows unpaired text to be integrated into E2E KWS without complicating indexing and search. In addition to training an E2E KWS model to retrieve text queries from spoken documents, we jointly train it to retrieve text queries from masked written documents. We show empirically that this approach can effectively leverage unpaired text for KWS, with significant improvements in search performance across a wide variety of languages. We conduct analysis which indicates that these improvements are achieved because the proposed method improves document representations for words in the unpaired text. Finally, we show that the proposed method can be used for domain adaptation in settings where in-domain paired data is scarce or nonexistent.

encoder, query, unpaired text, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.48550/arXiv.2308.08027

2407.04601

Country:

Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
Europe > Czechia > South Moravian Region > Brno (0.04)
Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Multilingual acoustic word embeddings for zero-resource languages

Jacobs, Christiaan

arXiv.org Artificial IntelligenceJan-23-2024

This research addresses the challenge of developing speech applications for zero-resource languages that lack labelled data. It specifically uses acoustic word embedding (AWE) -- fixed-dimensional representations of variable-duration speech segments -- employing multilingual transfer, where labelled data from several well-resourced languages are used for pertaining. The study introduces a new neural network that outperforms existing AWE models on zero-resource languages. It explores the impact of the choice of well-resourced languages. AWEs are applied to a keyword-spotting system for hate speech detection in Swahili radio broadcasts, demonstrating robustness in real-world scenarios. Additionally, novel semantic AWE models improve semantic query-by-example search.

awe model, multilingual model, zero-resource language, (17 more...)

arXiv.org Artificial Intelligence

2401.10543

Country:

Africa > South Africa (0.14)
North America > United States (0.04)
Africa > Sub-Saharan Africa (0.04)
(5 more...)

Genre:

Research Report > New Finding (1.00)
Overview (0.92)

Industry:

Leisure & Entertainment > Sports (1.00)
Media > Radio (0.87)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Data Science (1.00)
Information Technology > Communications (1.00)
(5 more...)

Add feedback

Towards hate speech detection in low-resource languages: Comparing ASR to acoustic word embeddings on Wolof and Swahili

Jacobs, Christiaan, Rakotonirina, Nathanaël Carraz, Chimoto, Everlyn Asiko, Bassett, Bruce A., Kamper, Herman

arXiv.org Artificial IntelligenceJun-1-2023

We consider hate speech detection through keyword spotting on radio broadcasts. One approach is to build an automatic speech recognition (ASR) system for the target low-resource language. We compare this to using acoustic word embedding (AWE) models that map speech segments to a space where matching words have similar vectors. We specifically use a multilingual AWE model trained on labelled data from well-resourced languages to spot keywords in data in the unseen target language. In contrast to ASR, the AWE approach only requires a few keyword exemplars. In controlled experiments on Wolof and Swahili where training and test data are from the same domain, an ASR model trained on just five minutes of data outperforms the AWE approach. But in an in-the-wild test on Swahili radio broadcasts with actual hate speech keywords, the AWE model (using one minute of template data) is more robust, giving similar performance to an ASR system trained on 30 hours of labelled data.

artificial intelligence, keyword, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2306.0041

Country:

Africa > Kenya (0.05)
Africa > South Africa > Western Cape > Cape Town (0.05)
North America > Canada > Quebec > Montreal (0.04)
(4 more...)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.56)

Industry:

Media > Radio (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Exploring Sequence-to-Sequence Transformer-Transducer Models for Keyword Spotting

Labrador, Beltrán, Zhao, Guanlong, Moreno, Ignacio López, Scarpati, Angelo Scorza, Fowl, Liam, Wang, Quan

arXiv.org Artificial IntelligenceNov-11-2022

In this paper, we present a novel approach to adapt a sequence-to-sequence Transformer-Transducer ASR system to the keyword spotting (KWS) task. We achieve this by replacing the keyword in the text transcription with a special token and training the system to detect the token in an audio stream. At inference time, we create a decision function inspired by conventional KWS approaches, to make our approach more suitable for the KWS task. Furthermore, we introduce a specific keyword spotting loss by adapting the sequence-discriminative Minimum Bayes-Risk training technique. We find that our approach significantly outperforms ASR based KWS systems. When compared with a conventional keyword spotting system, our proposal has similar performance while bringing the advantages and flexibility of sequence-to-sequence training. Additionally, when combined with the conventional KWS system, our approach can improve the performance at any operation point.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2211.06478

Country:

North America > Canada > Newfoundland and Labrador > Labrador (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Oceania > Australia (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Media (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.51)

Add feedback

An In-Vehicle KWS System with Multi-Source Fusion for Vehicle Applications

Tan, Yue, Zheng, Kan, Lei, Lei

arXiv.org Machine LearningFeb-16-2019

Abstract--In order to maximize detection precision rate as well as the recall rate, this paper proposes an in-vehicle multisource fusionscheme in Keyword Spotting (KWS) System for vehicle applications. Vehicle information, as a new source for the original system, is collected by an in-vehicle data acquisition platform while the user is driving. A Deep Neural Network (DNN) is trained to extract acoustic features and make a speech classification. Based on the posterior probabilities obtained from DNN, the vehicle information including the speed and direction of vehicle is applied to choose the suitable parameter from a pair of sensitivity values for the KWS system. The experimental results show that the KWS system with the proposed multi-source fusion scheme can achieve better performances in term of precision rate, recall rate, and mean square error compared to the system without it. I. INTRODUCTION Keyword Spotting (KWS) System, also known as wakeword detection,refers to the task of detecting specified keyword from a continuous stream of audio provided by the users [1]. Keyword Spotting has been an active research area in speech recognition for decades, and widely used in numerous applications.

information, kw system, sensitivity value, (14 more...)

arXiv.org Machine Learning

1902.04326

Country:

Asia > China > Beijing > Beijing (0.05)
Oceania > Australia (0.04)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Data Augmentation for Robust Keyword Spotting under Playback Interference

Raju, Anirudh, Panchapagesan, Sankaran, Liu, Xing, Mandal, Arindam, Strom, Nikko

arXiv.org Machine LearningAug-1-2018

Accurate on-device keyword spotting (KWS) with low false accept and false reject rate is crucial to customer experience for far-field voice control of conversational agents. It is particularly challenging to maintain low false reject rate in real world conditions where there is (a) ambient noise from external sources such as TV, household appliances, or other speech that is not directed at the device (b) imperfect cancellation of the audio playback from the device, resulting in residual echo, after being processed by the Acoustic Echo Cancellation (AEC) system. In this paper, we propose a data augmentation strategy to improve keyword spotting performance under these challenging conditions. The training set audio is artificially corrupted by mixing in music and TV/movie audio, at different signal to interference ratios. Our results show that we get around 30-45% relative reduction in false reject rates, at a range of false alarm rates, under audio playback from such devices.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Machine Learning

1808.00563

Country:

North America > United States (0.05)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment (1.00)
Media > Film (0.49)
Media > Television (0.34)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.30)

Add feedback