AITopics | Jyothi, Preethi

Collaborating Authors

Jyothi, Preethi

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Translation Errors Significantly Impact Low-Resource Languages in Cross-Lingual Learning

Agrawal, Ashish Sunil, Fazili, Barah, Jyothi, Preethi

arXiv.org Artificial IntelligenceFeb-3-2024

Popular benchmarks (e.g., XNLI) used to evaluate cross-lingual language understanding consist of parallel versions of English evaluation sets in multiple target languages created with the help of professional translators. When creating such parallel data, it is critical to ensure high-quality translations for all target languages for an accurate characterization of cross-lingual transfer. In this work, we find that translation inconsistencies do exist and interestingly they disproportionally impact low-resource languages in XNLI. To identify such inconsistencies, we propose measuring the gap in performance between zero-shot evaluations on the human-translated and machine-translated target text across multiple target languages; relatively large gaps are indicative of translation errors. We also corroborate that translation errors exist for two target languages, namely Hindi and Urdu, by doing a manual reannotation of human-translated test instances in these two languages and finding poor agreement with the original English labels these instances were supposed to inherit.

machine learning, natural language, translation, (17 more...)

arXiv.org Artificial Intelligence

2402.0208

Country:

Europe (0.67)
North America > United States > Louisiana (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Accented Speech Recognition With Accent-specific Codebooks

Prabhu, Darshan, Jyothi, Preethi, Ganapathy, Sriram, Unni, Vinit

arXiv.org Artificial IntelligenceOct-26-2023

Speech accents pose a significant challenge to state-of-the-art automatic speech recognition (ASR) systems. Degradation in performance across underrepresented accents is a severe deterrent to the inclusive adoption of ASR. In this work, we propose a novel accent adaptation approach for end-to-end ASR systems using cross-attention with a trainable set of codebooks. These learnable codebooks capture accent-specific information and are integrated within the ASR encoder layers. The model is trained on accented English speech, while the test data also contained accents which were not seen during training. On the Mozilla Common Voice multi-accented dataset, we show that our proposed approach yields significant performance gains not only on the seen English accents (up to $37\%$ relative improvement in word error rate) but also on the unseen accents (up to $5\%$ relative improvement in WER). Further, we illustrate benefits for a zero-shot transfer setup on the L2Artic dataset. We also compare the performance with other approaches based on accent adversarial training.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2310.1597

Country:

Europe > United Kingdom (0.29)
Asia > India (0.28)
North America > United States (0.28)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

DISCO: A Large Scale Human Annotated Corpus for Disfluency Correction in Indo-European Languages

Bhat, Vineet, Jyothi, Preethi, Bhattacharyya, Pushpak

arXiv.org Artificial IntelligenceOct-25-2023

Disfluency correction (DC) is the process of removing disfluent elements like fillers, repetitions and corrections from spoken utterances to create readable and interpretable text. DC is a vital post-processing step applied to Automatic Speech Recognition (ASR) outputs, before subsequent processing by downstream language understanding tasks. Existing DC research has primarily focused on English due to the unavailability of large-scale open-source datasets. Towards the goal of multilingual disfluency correction, we present a high-quality human-annotated DC corpus covering four important Indo-European languages: English, Hindi, German and French. We provide extensive analysis of results of state-of-the-art DC models across all four languages obtaining F1 scores of 97.55 (English), 94.29 (Hindi), 95.89 (German) and 92.97 (French). To demonstrate the benefits of DC on downstream tasks, we show that DC leads to 5.65 points increase in BLEU scores on average when used in conjunction with a state-of-the-art Machine Translation (MT) system. We release code to run our experiments along with our annotated dataset here.

artificial intelligence, machine translation, natural language, (5 more...)

arXiv.org Artificial Intelligence

2310.16749

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.87)

Add feedback

Temporally Aligning Long Audio Interviews with Questions: A Case Study in Multimodal Data Integration

Pasi, Piyush Singh, Battepati, Karthikeya, Jyothi, Preethi, Ramakrishnan, Ganesh, Mahapatra, Tanmay, Singh, Manoj

arXiv.org Artificial IntelligenceOct-10-2023

The problem of audio-to-text alignment has seen significant amount of research using complete supervision during training. However, this is typically not in the context of long audio recordings wherein the text being queried does not appear verbatim within the audio file. This work is a collaboration with a non-governmental organization called CARE India that collects long audio health surveys from young mothers residing in rural parts of Bihar, India. Given a question drawn from a questionnaire that is used to guide these surveys, we aim to locate where the question is asked within a long audio recording. This is of great value to African and Asian organizations that would otherwise have to painstakingly go through long and noisy audio recordings to locate questions (and answers) of interest. Our proposed framework, INDENT, uses a cross-attention-based model and prior information on the temporal ordering of sentences to learn speech embeddings that capture the semantics of the underlying spoken text. These learnt embeddings are used to retrieve the corresponding audio segment based on text queries at inference time. We empirically demonstrate the significant effectiveness (improvement in R-avg of about 3%) of our model over those obtained using text-based heuristics. We also show how noisy ASR, generated using state-of-the-art ASR models for Indian languages, yields better results when used in place of speech. INDENT, trained only on Hindi data is able to cater to all languages supported by the (semantically) shared text space. We illustrate this empirically on 11 Indic languages.

audio recording, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2310.06702

Country: Asia > India > Bihar (0.34)

Genre:

Personal > Interview (0.50)
Research Report (0.50)
Questionnaire & Opinion Survey (0.35)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.40)

Add feedback

Improving RNN-Transducers with Acoustic LookAhead

Unni, Vinit S., Mittal, Ashish, Jyothi, Preethi, Sarawagi, Sunita

arXiv.org Artificial IntelligenceJul-10-2023

RNN-Transducers (RNN-Ts) have gained widespread acceptance as an end-to-end model for speech to text conversion because of their high accuracy and streaming capabilities. A typical RNN-T independently encodes the input audio and the text context, and combines the two encodings by a thin joint network. While this architecture provides SOTA streaming accuracy, it also makes the model vulnerable to strong LM biasing which manifests as multi-step hallucination of text without acoustic evidence. In this paper we propose LookAhead that makes text representations more acoustically grounded by looking ahead into the future within the audio input. This technique yields a significant 5%-20% relative reduction in word error rate on both in-domain and out-of-domain evaluation sets.

artificial intelligence, machine learning, ook, (17 more...)

arXiv.org Artificial Intelligence

2307.05006

Genre: Research Report (0.52)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.95)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Adversarial Training For Low-Resource Disfluency Correction

Bhat, Vineet, Jyothi, Preethi, Bhattacharyya, Pushpak

arXiv.org Artificial IntelligenceJun-10-2023

Disfluencies commonly occur in conversational speech. Speech with disfluencies can result in noisy Automatic Speech Recognition (ASR) transcripts, which affects downstream tasks like machine translation. In this paper, we propose an adversarially-trained sequence-tagging model for Disfluency Correction (DC) that utilizes a small amount of labeled real disfluent data in conjunction with a large amount of unlabeled data. We show the benefit of our proposed technique, which crucially depends on synthetically generated disfluent data, by evaluating it for DC in three Indian languages- Bengali, Hindi, and Marathi (all from the Indo-Aryan family). Our technique also performs well in removing stuttering disfluencies in ASR transcripts introduced by speech impairments. We achieve an average 6.15 points improvement in F1-score over competitive baselines across all three languages mentioned. To the best of our knowledge, we are the first to utilize adversarial training for DC and use it to correct stuttering disfluencies in English, establishing a new benchmark for this task.

computational linguistic, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2306.06384

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.89)

Add feedback

DisfluencyFixer: A tool to enhance Language Learning through Speech To Speech Disfluency Correction

Bhat, Vineet, Jyothi, Preethi, Bhattacharyya, Pushpak

arXiv.org Artificial IntelligenceMay-26-2023

Conversational speech often consists of deviations from the speech plan, producing disfluent utterances that affect downstream NLP tasks. Removing these disfluencies is necessary to create fluent and coherent speech. This paper presents DisfluencyFixer, a tool that performs speech-to-speech disfluency correction in English and Hindi using a pipeline of Automatic Speech Recognition (ASR), Disfluency Correction (DC) and Text-To-Speech (TTS) models. Our proposed system removes disfluencies from input speech and returns fluent speech as output along with its transcript, disfluency type and total disfluency count in source utterance, providing a one-stop destination for language learners to improve the fluency of their speech. We evaluate the performance of our tool subjectively and receive scores of 4.26, 4.29 and 4.42 out of 5 in ASR performance, DC performance and ease-of-use of the system. Our tool can be accessed openly at the following link.

artificial intelligence, disfluencyfixer, speech recognition, (13 more...)

arXiv.org Artificial Intelligence

2305.16957

Genre: Research Report (0.40)

Industry: Education > Curriculum > Subject-Specific Education (0.53)

Technology: Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)

Add feedback

DICTDIS: Dictionary Constrained Disambiguation for Improved NMT

Maheshwari, Ayush, Sharma, Piyush, Jyothi, Preethi, Ramakrishnan, Ganesh

arXiv.org Artificial IntelligenceMay-21-2023

Domain-specific neural machine translation (NMT) systems (\eg, in educational applications) are socially significant with the potential to help make information accessible to a diverse set of users in multilingual societies. It is desirable that such NMT systems be lexically constrained and draw from domain-specific dictionaries. Dictionaries could present multiple candidate translations for a source word/phrase due to the polysemous nature of words. The onus is then on the NMT model to choose the contextually most appropriate candidate. Prior work has largely ignored this problem and focused on the single candidate constraint setting wherein the target word or phrase is replaced by a single constraint. In this work we present \dictdis, a lexically constrained NMT system that disambiguates between multiple candidate translations derived from dictionaries. We achieve this by augmenting training data with multiple dictionary candidates to actively encourage disambiguation during training by implicitly aligning multiple candidate constraints. We demonstrate the utility of \dictdis\ via extensive experiments on English-Hindi and English-German sentences in a variety of domains including regulatory, finance, engineering. We also present comparisons on standard benchmark test datasets. In comparison with existing approaches for lexically constrained and unconstrained NMT, we demonstrate superior performance with respect to constraint copy and disambiguation related measures on all domains while also obtaining improved fluency of up to 2-3 BLEU points on some domains.

artificial intelligence, constraint, natural language, (17 more...)

arXiv.org Artificial Intelligence

2210.06996

Country:

North America > United States > Minnesota (0.28)
Asia (0.28)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Collaborative Learning to Generate Audio-Video Jointly

Kurmi, Vinod K, Bajaj, Vipul, Patro, Badri N, Venkatesh, K S, Namboodiri, Vinay P, Jyothi, Preethi

arXiv.org Artificial IntelligenceMar-31-2021

There have been a number of techniques that have demonstrated the generation of multimedia data for one modality at a time using GANs, such as the ability to generate images, videos, and audio. However, so far, the task of multi-modal generation of data, specifically for audio and videos both, has not been sufficiently well-explored. Towards this, we propose a method that demonstrates that we are able to generate naturalistic samples of video and audio data by the joint correlated generation of audio and video modalities. The proposed method uses multiple discriminators to ensure that the audio, video, and the joint output are also indistinguishable from real-world samples. We present a dataset for this task and show that we are able to generate realistic samples. This method is validated using various standard metrics such as Inception Score, Frechet Inception Distance (FID) and through human evaluation.

artificial intelligence, machine learning, video, (17 more...)

arXiv.org Artificial Intelligence

2104.02656

Genre: Research Report (0.40)

Industry:

Leisure & Entertainment (0.95)
Media > Music (0.69)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Reduce and Reconstruct: Improving Low-resource End-to-end ASR Via Reconstruction Using Reduced Vocabularies

Diwan, Anuj, Jyothi, Preethi

arXiv.org Artificial IntelligenceOct-19-2020

End-to-end automatic speech recognition (ASR) systems are increasingly being favoured due to their direct treatment of the problem of speech to text conversion. However, these systems are known to be data hungry and hence underperform in low-resource settings. In this work, we propose a seemingly simple but effective technique to improve low-resource end-to-end ASR performance. We compress the output vocabulary of the end-to-end ASR system using linguistically meaningful reductions and then reconstruct the original vocabulary using a standalone module. Our objective is two-fold: to lessen the burden on the low-resource end-to-end ASR system by reducing the output vocabulary space and to design a powerful reconstruction module that recovers sequences in the original vocabulary from sequences in the reduced vocabulary. We present two reconstruction modules, an encoder decoder-based architecture and a finite state transducer-based model. We demonstrate the efficacy of our proposed techniques using ASR systems for two Indian languages, Gujarati and Telugu.

deep learning, sequence, speech recognition, (18 more...)

arXiv.org Artificial Intelligence

2010.09322

Country: Asia > Middle East > Qatar (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback