Sharma, Neeraj Kumar
Boli: A dataset for understanding stuttering experience and analyzing stuttered speech
Batra, Ashita, narang, Mannas, Sharma, Neeraj Kumar, Das, Pradip K
There is a growing need for diverse, high-quality stuttered speech data, particularly in the context of Indian languages. This paper introduces Project Boli, a multi-lingual stuttered speech dataset designed to advance scientific understanding and technology development for individuals who stutter, particularly in India. The dataset constitutes (a) anonymized metadata (gender, age, country, mother tongue) and responses to a questionnaire about how stuttering affects their daily lives, (b) captures both read speech (using the Rainbow Passage) and spontaneous speech (through image description tasks) for each participant and (c) includes detailed annotations of five stutter types: blocks, prolongations, interjections, sound repetitions and word repetitions. We present a comprehensive analysis of the dataset, including the data collection procedure, experience summarization of people who stutter, severity assessment of stuttering events and technical validation of the collected data. The dataset is released as an open access to further speech technology development.
Deciphering Assamese Vowel Harmony with Featural InfoWaveGAN
Barman, Sneha Ray, Mahanta, Shakuntala, Sharma, Neeraj Kumar
Traditional approaches for understanding phonological learning have predominantly relied on curated text data. Although insightful, such approaches limit the knowledge captured in textual representations of the spoken language. To overcome this limitation, we investigate the potential of the Featural InfoWaveGAN model to learn iterative long-distance vowel harmony using raw speech data. We focus on Assamese, a language known for its phonologically regressive and word-bound vowel harmony. We demonstrate that the model is adept at grasping the intricacies of Assamese phonotactics, particularly iterative long-distance harmony with regressive directionality. It also produced non-iterative illicit forms resembling speech errors during human language acquisition. Our statistical analysis reveals a preference for a specific [+high,+ATR] vowel as a trigger across novel items, indicative of feature learning. More data and control could improve model proficiency, contrasting the universality of learning.
Coswara: A respiratory sounds and symptoms dataset for remote screening of SARS-CoV-2 infection
Bhattacharya, Debarpan, Sharma, Neeraj Kumar, Dutta, Debottam, Chetupalli, Srikanth Raj, Mote, Pravin, Ganapathy, Sriram, C, Chandrakiran, Nori, Sahiti, K, Suhail K, Gonuguntla, Sadhana, Alagesan, Murali
This paper presents the Coswara dataset, a dataset containing diverse set of respiratory sounds and rich meta-data, recorded between April-2020 and February-2022 from 2635 individuals (1819 SARS-CoV-2 negative, 674 positive, and 142 recovered subjects). The respiratory sounds contained nine sound categories associated with variants of breathing, cough and speech. The rich metadata contained demographic information associated with age, gender and geographic location, as well as the health information relating to the symptoms, pre-existing respiratory ailments, comorbidity and SARS-CoV-2 test status. Our study is the first of its kind to manually annotate the audio quality of the entire dataset (amounting to 65~hours) through manual listening. The paper summarizes the data collection procedure, demographic, symptoms and audio data information. A COVID-19 classifier based on bi-directional long short-term (BLSTM) architecture, is trained and evaluated on the different population sub-groups contained in the dataset to understand the bias/fairness of the model. This enabled the analysis of the impact of gender, geographic location, date of recording, and language proficiency on the COVID-19 detection performance.
A Dynamic Framework of Reputation Systems for an Agent Mediated e-market
Gaur, Vibha, Sharma, Neeraj Kumar
The success of an agent mediated e-market system lies in the underlying reputation management system to improve the quality of services in an information asymmetric e-market. Reputation provides an operatable metric for establishing trustworthiness between mutually unknown online entities. Reputation systems encourage honest behaviour and discourage malicious behaviour of participating agents in the e-market. A dynamic reputation model would provide virtually instantaneous knowledge about the changing e-market environment and would utilise Internets' capacity for continuous interactivity for reputation computation. This paper proposes a dynamic reputation framework using reinforcement learning and fuzzy set theory that ensures judicious use of information sharing for inter-agent cooperation. This framework is sensitive to the changing parameters of e-market like the value of transaction and the varying experience of agents with the purpose of improving inbuilt defense mechanism of the reputation system against various attacks so that e-market reaches an equilibrium state and dishonest agents are weeded out of the market.