AITopics | Krishna, Gautam

Collaborating Authors

Krishna, Gautam

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Multimodal Large Language Models with Fusion Low Rank Adaptation for Device Directed Speech Detection

Palaskar, Shruti, Rudovic, Oggi, Dharur, Sameer, Pesce, Florian, Krishna, Gautam, Sivaraman, Aswin, Berkowitz, Jack, Abdelaziz, Ahmed Hussen, Adya, Saurabh, Tewfik, Ahmed

arXiv.org Artificial IntelligenceJun-13-2024

Although Large Language Models (LLMs) have shown promise for human-like conversations, they are primarily pre-trained on text data. Incorporating audio or video improves performance, but collecting large-scale multimodal data and pre-training multimodal LLMs is challenging. To this end, we propose a Fusion Low Rank Adaptation (FLoRA) technique that efficiently adapts a pre-trained unimodal LLM to consume new, previously unseen modalities via low rank adaptation. For device-directed speech detection, using FLoRA, the multimodal LLM achieves 22% relative reduction in equal error rate (EER) over the text-only approach and attains performance parity with its full fine-tuning (FFT) counterpart while needing to tune only a fraction of its parameters. Furthermore, with the newly introduced adapter dropout, FLoRA is robust to missing data, improving over FFT by 20% lower EER and 56% lower false accept rate. The proposed approach scales well for model sizes from 16M to 3B parameters.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2406.09617

Country: Asia > China (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Modality Dropout for Multimodal Device Directed Speech Detection using Verbal and Non-Verbal Features

Krishna, Gautam, Dharur, Sameer, Rudovic, Oggi, Dighe, Pranay, Adya, Saurabh, Abdelaziz, Ahmed Hussen, Tewfik, Ahmed H

arXiv.org Artificial IntelligenceOct-23-2023

Device-directed speech detection (DDSD) is the binary classification task of distinguishing between queries directed at a voice assistant versus side conversation or background speech. State-of-the-art DDSD systems use verbal cues, e.g acoustic, text and/or automatic speech recognition system (ASR) features, to classify speech as device-directed or otherwise, and often have to contend with one or more of these modalities being unavailable when deployed in real-world settings. In this paper, we investigate fusion schemes for DDSD systems that can be made more robust to missing modalities. Concurrently, we study the use of non-verbal cues, specifically prosody features, in addition to verbal cues for DDSD. We present different approaches to combine scores and embeddings from prosody with the corresponding verbal cues, finding that prosody improves DDSD performance by upto 8.5% in terms of false acceptance rate (FA) at a given fixed operating point via non-linear intermediate fusion, while our use of modality dropout techniques improves the performance of these models by 7.4% in terms of FA when evaluated with missing modalities during inference time.

artificial intelligence, multimodal device directed speech detection, speech recognition, (2 more...)

arXiv.org Artificial Intelligence

2310.15261

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)

Add feedback

Speech Synthesis using EEG

Krishna, Gautam, Tran, Co, Han, Yan, Carnahan, Mason

arXiv.org Machine LearningFeb-21-2020

In this paper we demonstrate speech synthesis using different electroencephalography (EEG) feature sets recently introduced in [1]. We make use of a recurrent neural network (RNN) regression model to predict acoustic features directly from EEG features. We demonstrate our results using EEG features recorded in parallel with spoken speech as well as using EEG recorded in parallel with listening utterances. We provide EEG based speech synthesis results for four subjects in this paper and our results demonstrate the feasibility of synthesizing speech directly from EEG features.

deep learning, eeg feature, neural network, (20 more...)

arXiv.org Machine Learning

2002.12756

Country: North America > United States > Texas (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.69)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)

Add feedback

Spoken Speech Enhancement using EEG

Krishna, Gautam, Han, Yan, Tran, Co, Carnahan, Mason, Tewfik, Ahmed H

arXiv.org Machine LearningSep-13-2019

SPOKEN SPEECH ENHANCEMENT USING EEG Gautam Krishna null Y an Han null Co Tran Mason Carnahan Ahmed H T ewfik Brain Machine Interface Lab, The University of Texas at Austin ABSTRACT In this paper we demonstrate spoken speech enhancement using electroencephalography (EEG) signals using a generative adversarial network (GAN) based model and Long short-term Memory (LSTM) regression based model. Our results demonstrate that EEG features can be used to clean speech recorded in presence of background noise. Index T erms -- electroencephalograpgy (EEG), speech enhancement, deep learning 1. INTRODUCTION Speech enhancement is the process of improving the quality of speech whose quality was degraded due to additive noise. Speech enhancement is a critical preprocessing method used to improve the performance of automatic speech recognition (ASR) systems operating in presence of background noise. Noisy speech is first fed into a speech enhancement system to produce enhanced speech which is then fed into the ASR model.

deep learning, speech enhancement, vascular disease, (22 more...)

arXiv.org Machine Learning

1909.09132

Country: North America > United States > Texas > Travis County > Austin (0.34)

Genre: Research Report > New Finding (0.69)

Industry:

Health & Medicine > Diagnostic Medicine (0.67)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Advancing Speech Recognition With No Speech Or With Noisy Speech

Krishna, Gautam, Tran, Co, Carnahan, Mason, Tewfik, Ahmed H

arXiv.org Machine LearningJul-16-2019

In this paper we demonstrate end to end continuous speech recognition (CSR) using electroencephalography (EEG) signals with no speech signal as input. An attention model based automatic speech recognition (ASR) and connectionist temporal classification (CTC) based ASR systems were implemented for performing recognition. We further demonstrate CSR for noisy speech by fusing with EEG features.

eeg feature, speech recognition, vascular disease, (19 more...)

arXiv.org Machine Learning

1906.08871

Country: North America > United States > Texas > Travis County > Austin (0.28)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Diagnostic Medicine (0.66)
Health & Medicine > Health Care Technology (0.48)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.34)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Robust End to End Speaker Verification Using EEG

Han, Yan, Krishna, Gautam, Tran, Co, Carnahan, Mason, Tewfik, Ahmed H

arXiv.org Machine LearningJun-17-2019

In this paper we demonstrate that performance of a speaker verification system can be improved by concatenating electroencephalography (EEG) signal features with speech signal. We use state of art end to end deep learning model for performing speaker verification and we demonstrate our results for noisy speech. Our results indicate that EEG signals can improve the robustness of speaker verification systems.

acoustic processing, utterance, vascular disease, (25 more...)

arXiv.org Machine Learning

1906.08044

Country: North America > United States > Texas > Travis County > Austin (0.14)

Genre: Research Report > New Finding (0.54)

Industry:

Information Technology > Security & Privacy (0.71)
Health & Medicine > Diagnostic Medicine (0.67)
Health & Medicine > Health Care Technology (0.49)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.35)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Speech > Acoustic Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Speech Recognition With No Speech Or With Noisy Speech Beyond English

Krishna, Gautam, Tran, Co, Han, Yan, Carnahan, Mason, Tewfik, Ahmed H

arXiv.org Machine LearningJun-17-2019

In this paper we demonstrate continuous noisy speech recognition using connectionist temporal classification (CTC) model on limited Chinese vocabulary using electroencephalography (EEG) features with no speech signal as input and we further demonstrate single CTC model based continuous noisy speech recognition on limited joint English and Chinese vocabulary using EEG features with no speech signal as input.

cardiology, speech recognition, vascular disease, (18 more...)

arXiv.org Machine Learning

1906.08045

Country: North America > United States > Texas > Travis County > Austin (0.14)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Diagnostic Medicine (0.67)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.35)

Technology: Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)

Add feedback

Speech Recognition with no speech or with noisy speech

Krishna, Gautam, Tran, Co, Yu, Jianguo, Tewfik, Ahmed H

arXiv.org Machine LearningMar-2-2019

The performance of automatic speech recognition systems(ASR) degrades in the presence of noisy speech. This paper demonstrates that using electroencephalography (EEG) can help automatic speech recognition systems overcome performance loss in the presence of noise. The paper also shows that distillation training of automatic speech recognition systems using EEG features will increase their performance. Finally, we demonstrate the ability to recognize words from EEG with no speech signal on a limited English vocabulary with high accuracy.

deep learning, speech recognition, vascular disease, (20 more...)

arXiv.org Machine Learning

1903.00739

Country: North America > United States > Texas (0.14)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Health Care Technology (0.67)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.35)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

Add feedback