AITopics | Dhamyal, Hira

Collaborating Authors

Dhamyal, Hira

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

On the Evaluation of Speech Foundation Models for Spoken Language Understanding

Arora, Siddhant, Pasad, Ankita, Chien, Chung-Ming, Han, Jionghao, Sharma, Roshan, Jung, Jee-weon, Dhamyal, Hira, Chen, William, Shon, Suwon, Lee, Hung-yi, Livescu, Karen, Watanabe, Shinji

arXiv.org Artificial IntelligenceJun-14-2024

The Spoken Language Understanding Evaluation (SLUE) suite of benchmark tasks was recently introduced to address the need for open resources and benchmarking of complex spoken language understanding (SLU) tasks, including both classification and sequence generation tasks, on natural speech. The benchmark has demonstrated preliminary success in using pre-trained speech foundation models (SFM) for these SLU tasks. However, the community still lacks a fine-grained understanding of the comparative utility of different SFMs. Inspired by this, we ask: which SFMs offer the most benefits for these complex SLU tasks, and what is the most effective approach for incorporating these SFMs? To answer this, we perform an extensive evaluation of multiple supervised and self-supervised SFMs using several evaluation protocols: (i) frozen SFMs with a lightweight prediction head, (ii) frozen SFMs with a complex prediction head, and (iii) fine-tuned SFMs with a lightweight prediction head. Although the supervised SFMs are pre-trained on much more speech recognition data (with labels), they do not always outperform self-supervised SFMs; the latter tend to perform at least as well as, and sometimes better than, supervised SFMs, especially on the sequence generation tasks in SLUE. While there is no universally optimal way of incorporating SFMs, the complex prediction head gives the best performance for most tasks, although it increases the inference time. We also introduce an open-source toolkit and performance leaderboard, SLUE-PERB, for these tasks and modeling strategies.

artificial intelligence, natural language, representation, (18 more...)

arXiv.org Artificial Intelligence

2406.10083

Country:

Europe (1.00)
North America > United States (0.93)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Prompting Audios Using Acoustic Properties For Emotion Representation

Dhamyal, Hira, Elizalde, Benjamin, Deshmukh, Soham, Wang, Huaming, Raj, Bhiksha, Singh, Rita

arXiv.org Artificial IntelligenceDec-6-2023

Emotions lie on a continuum, but current models treat emotions as a finite valued discrete variable. This representation does not capture the diversity in the expression of emotion. To better represent emotions we propose the use of natural language descriptions (or prompts). In this work, we address the challenge of automatically generating these prompts and training a model to better learn emotion representations from audio and prompt pairs. We use acoustic properties that are correlated to emotion like pitch, intensity, speech rate, and articulation rate to automatically generate prompts i.e. 'acoustic prompts'. We use a contrastive learning objective to map speech to their respective acoustic prompts. We evaluate our model on Emotion Audio Retrieval and Speech Emotion Recognition. Our results show that the acoustic prompts significantly improve the model's performance in EAR, in various Precision@K metrics. In SER, we observe a 3.8% relative accuracy improvement on the Ravdess dataset.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2310.02298

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Emotion (0.87)

Add feedback

LoFT: Local Proxy Fine-tuning For Improving Transferability Of Adversarial Attacks Against Large Language Model

Shah, Muhammad Ahmed, Sharma, Roshan, Dhamyal, Hira, Olivier, Raphael, Shah, Ankit, Konan, Joseph, Alharthi, Dareen, Bukhari, Hazim T, Baali, Massa, Deshmukh, Soham, Kuhlmann, Michael, Raj, Bhiksha, Singh, Rita

arXiv.org Artificial IntelligenceOct-21-2023

It has been shown that Large Language Model (LLM) alignments can be circumvented by appending specially crafted attack suffixes with harmful queries to elicit harmful responses. To conduct attacks against private target models whose characterization is unknown, public models can be used as proxies to fashion the attack, with successful attacks being transferred from public proxies to private target models. The success rate of attack depends on how closely the proxy model approximates the private model. We hypothesize that for attacks to be transferrable, it is sufficient if the proxy can approximate the target model in the neighborhood of the harmful query. Therefore, in this paper, we propose \emph{Local Fine-Tuning (LoFT)}, \textit{i.e.}, fine-tuning proxy models on similar queries that lie in the lexico-semantic neighborhood of harmful queries to decrease the divergence between the proxy and target models. First, we demonstrate three approaches to prompt private target models to obtain similar queries given harmful queries. Next, we obtain data for local fine-tuning by eliciting responses from target models for the generated similar queries. Then, we optimize attack suffixes to generate attack prompts and evaluate the impact of our local fine-tuning on the attack's success rate. Experiments show that local fine-tuning of proxy models improves attack transferability and increases attack success rate by $39\%$, $7\%$, and $0.5\%$ (absolute) on target models ChatGPT, GPT-4, and Claude respectively.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2310.04445

Genre:

Questionnaire & Opinion Survey (0.69)
Research Report (0.64)

Industry:

Banking & Finance > Credit (0.68)
Information Technology > Security & Privacy (0.66)
Government > Military (0.42)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech

Alharthi, Dareen, Sharma, Roshan, Dhamyal, Hira, Maiti, Soumi, Raj, Bhiksha, Singh, Rita

arXiv.org Artificial IntelligenceOct-1-2023

Modern speech synthesis systems have improved significantly, with synthetic speech being indistinguishable from real speech. However, efficient and holistic evaluation of synthetic speech still remains a significant challenge. Human evaluation using Mean Opinion Score (MOS) is ideal, but inefficient due to high costs. Therefore, researchers have developed auxiliary automatic metrics like Word Error Rate (WER) to measure intelligibility. Prior works focus on evaluating synthetic speech based on pre-trained speech recognition models, however, this can be limiting since this approach primarily measures speech intelligibility. In this paper, we propose an evaluation technique involving the training of an ASR model on synthetic speech and assessing its performance on real speech. Our main assumption is that by training the ASR model on the synthetic speech, the WER on real speech reflects the similarity between distributions, a broader assessment of synthetic speech quality beyond intelligibility. Our proposed metric demonstrates a strong correlation with both MOS naturalness and MOS intelligibility when compared to SpeechLMScore and MOSNet on three recent Text-to-Speech (TTS) systems: MQTTS, StyleTTS, and YourTTS.

artificial intelligence, machine learning, speech, (17 more...)

arXiv.org Artificial Intelligence

2310.00706

Country: Europe > Poland (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Synthesis (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

On the pragmatism of using binary classifiers over data intensive neural network classifiers for detection of COVID-19 from voice

Shah, Ankit, Dhamyal, Hira, Gao, Yang, Arancibia, Daniel, Arancibia, Mario, Raj, Bhiksha, Singh, Rita

arXiv.org Artificial IntelligenceOct-25-2022

In a self-assesment study, COVID patients reported difficulty producing certain voiced sounds and noticed changes in Lately, there has been a global effort by multiple research groups their voice [8]. to detect COVID-19 from voice. Different researchers use different Consequently, a number of research groups around the world kinds of information from the voice signal to achieve this. Various have initiated efforts on attempting to diagnose potential Covid infections types of phonated sounds and the sound of cough and breath have from recordings of vocalizations [9, 5]. While most groups all been used with varying degree of success in automated voice have focused on cough sounds [10, 11, 12] as they are a frequent based COVID-19 detection apps. In this paper, we show that detecting symptom of Covid-19, several groups have also considered other COVID-19 from voice does not require custom made nonstandard vocalizations, such as breathing sounds [10, 13] extended vowels features or complicated neural network classifiers rather it [14, 15, 16], and counts. Yet other teams have analyzed free-form can be successfully done with just standard features and simple binary speech such as those obtainable from YouTube recordings[17].

artificial intelligence, classifier, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2204.04802

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)

Add feedback

An Overview of Techniques for Biomarker Discovery in Voice Signal

Singh, Rita, Shah, Ankit, Dhamyal, Hira

arXiv.org Artificial IntelligenceOct-9-2021

This paper reflects on the effect of several categories of medical conditions on human voice, focusing on those that may be hypothesized to have effects on voice, but for which the changes themselves may be subtle enough to have eluded observation in standard analytical examinations of the voice signal. It presents three categories of techniques that can potentially uncover such elusive biomarkers and allow them to be measured and used for predictive and diagnostic purposes. These approaches include proxy techniques, model-based analytical techniques and data-driven AI techniques.

artificial intelligence, health & medicine, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2110.04678

Country: North America > United States (0.28)

Genre: Research Report (0.40)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.98)
Health & Medicine > Therapeutic Area > Musculoskeletal (0.95)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback