AITopics | Umesh, S

Collaborating Authors

Umesh, S

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SPRING-INX: A Multilingual Indian Language Speech Corpus by SPRING Lab, IIT Madras

R, Nithya, S, Malavika, F, Jordan, Gangwar, Arjun, J, Metilda N, Umesh, S, Sarab, Rithik, Dubey, Akhilesh Kumar, Divakaran, Govind, K, Samudra Vijaya, Gangashetty, Suryakanth V

arXiv.org Artificial IntelligenceOct-24-2023

To increase the internet content of Indian Languages in different domains India is home to a multitude of languages of which 22 languages are recognised by the Indian Constitution as official. As part of the Speech Consortium of the NLTM-R&D Building speech based applications for the Indian population which is led by Indian Institute of Technology Madras is a difficult problem owing to limited data and the number (IITM), SPRING Lab of IITM has collected and is collecting of languages and accents to accommodate. To encourage the legally sourced and manually transcribed speech corpus in language technology community to build speech based applications various Indian languages such as Tamil, Hindi, Indian English, in Indian languages, we are open sourcing SPRING-Marathi, Bengali, Malayalam, Telugu, Assamese, Kannada, INX data which has about 2000 hours of legally sourced and Gujarati, Odia, Punjabi. Bodo and Manipuri through manually transcribed speech data for ASR system building speech data collection agencies identified using a tendering in Assamese, Bengali, Gujarati, Hindi, Kannada, Malayalam, process. The data collected has been carefully evaluated by Marathi, Odia, Punjabi and Tamil. This endeavor is by the Speech Quality Control (SQC) team led by KL University. SPRING Lab, Indian Institute of Technology Madras and is We are releasing the first set of valuable data amounting a part of National Language Translation Mission (NLTM), to 2000 hours (both Audio and corresponding manually transcribed funded by the Indian Ministry of Electronics and Information transcriptions) which was collected, cleaned and prepared Technology (MeitY), Government of India. We describe the for ASR system building in 10 Indian languages such data collection and data cleaning process along with the data as Assamese, Bengali, Gujarati, Hindi, Kannada, Malayalam, statistics in this paper.

artificial intelligence, natural language, transcription, (13 more...)

arXiv.org Artificial Intelligence

2310.14654

Country: Asia > India (1.00)

Genre: Research Report (0.40)

Industry: Government > Regional Government > Asia Government > India Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.88)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)

Add feedback

SALTTS: Leveraging Self-Supervised Speech Representations for improved Text-to-Speech Synthesis

Sivaguru, Ramanan, Lodagala, Vasista Sai, Umesh, S

arXiv.org Artificial IntelligenceAug-2-2023

While FastSpeech2 aims to integrate aspects of speech such as pitch, energy, and duration as conditional inputs, it still leaves scope for richer representations. As a part of this work, we leverage representations from various Self-Supervised Learning (SSL) models to enhance the quality of the synthesized speech. In particular, we pass the FastSpeech2 encoder's length-regulated outputs through a series of encoder layers with the objective of reconstructing the SSL representations. In the SALTTS-parallel implementation, the representations from this second encoder are used for an auxiliary reconstruction loss with the SSL features. The SALTTS-cascade implementation, however, passes these representations through the decoder in addition to having the reconstruction loss. The richness of speech characteristics from the SSL features reflects in the output speech quality, with the objective and subjective evaluation measures of the proposed approach outperforming the baseline FastSpeech2.

artificial intelligence, machine learning, representation, (13 more...)

arXiv.org Artificial Intelligence

2308.01018

Country: Asia > India (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.90)

Add feedback

Technology Pipeline for Large Scale Cross-Lingual Dubbing of Lecture Videos into Multiple Indian Languages

Prakash, Anusha, Kumar, Arun, Seth, Ashish, Mukherjee, Bhagyashree, Gupta, Ishika, Kuriakose, Jom, Fernandes, Jordan, Vikram, K V, M, Mano Ranjith Kumar, Mary, Metilda Sagaya, Wajahat, Mohammad, N, Mohana, Batra, Mudit, K, Navina, George, Nihal John, Ravi, Nithya, Mishra, Pruthwik, Srivastava, Sudhanshu, Lodagala, Vasista Sai, Mujadia, Vandan, Vineeth, Kada Sai Venkata, Sukhadia, Vrunda, Sharma, Dipti, Murthy, Hema, Bhattacharya, Pushpak, Umesh, S, Sangal, Rajeev

arXiv.org Artificial IntelligenceNov-1-2022

Cross-lingual dubbing of lecture videos requires the transcription of the original audio, correction and removal of disfluencies, domain term discovery, text-to-text translation into the target language, chunking of text using target language rhythm, text-to-speech synthesis followed by isochronous lipsyncing to the original video. This task becomes challenging when the source and target languages belong to different language families, resulting in differences in generated audio duration. This is further compounded by the original speaker's rhythm, especially for extempore speech. This paper describes the challenges in regenerating English lecture videos in Indian languages semi-automatically. A prototype is developed for dubbing lectures into 9 Indian languages. A mean-opinion-score (MOS) is obtained for two languages, Hindi and Tamil, on two different courses. The output video is compared with the original video in terms of MOS (1-5) and lip synchronisation with scores of 4.09 and 3.74, respectively. The human effort also reduces by 75%.

artificial intelligence, machine translation, natural language, (18 more...)

arXiv.org Artificial Intelligence

2211.01338

Country:

Asia (0.30)
North America > United States (0.29)

Genre:

Research Report (0.64)
Instructional Material (0.47)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.71)

Add feedback