Lakshadweep
FairI Tales: Evaluation of Fairness in Indian Contexts with a Focus on Bias and Stereotypes
Nawale, Janki Atul, Khan, Mohammed Safi Ur Rahman, D, Janani, Gupta, Mansi, Pruthi, Danish, Khapra, Mitesh M.
Existing studies on fairness are largely Western-focused, making them inadequate for culturally diverse countries such as India. To address this gap, we introduce INDIC-BIAS, a comprehensive India-centric benchmark designed to evaluate fairness of LLMs across 85 identity groups encompassing diverse castes, religions, regions, and tribes. We first consult domain experts to curate over 1,800 socio-cultural topics spanning behaviors and situations, where biases and stereotypes are likely to emerge. Grounded in these topics, we generate and manually validate 20,000 real-world scenario templates to probe LLMs for fairness. We structure these templates into three evaluation tasks: plausibility, judgment, and generation. Our evaluation of 14 popular LLMs on these tasks reveals strong negative biases against marginalized identities, with models frequently reinforcing common stereotypes. Additionally, we find that models struggle to mitigate bias even when explicitly asked to rationalize their decision. Our evaluation provides evidence of both allocative and representational harms that current LLMs could cause towards Indian identities, calling for a more cautious usage in practical applications. We release INDIC-BIAS as an open-source benchmark to advance research on benchmarking and mitigating biases and stereotypes in the Indian context.
Eliminating Position Bias of Language Models: A Mechanistic Approach
Wang, Ziqi, Zhang, Hanlin, Li, Xiner, Huang, Kuan-Hao, Han, Chi, Ji, Shuiwang, Kakade, Sham M., Peng, Hao, Ji, Heng
Position bias has proven to be a prevalent issue of modern language models (LMs), where the models prioritize content based on its position within the given context. This bias often leads to unexpected model failures and hurts performance, robustness, and reliability across various applications. Our mechanistic analysis attributes the position bias to two components employed in nearly all state-of-the-art LMs: causal attention and relative positional encodings. Specifically, we find that causal attention generally causes models to favor distant content, while relative positional encodings like RoPE Su et al. (2024) prefer nearby ones based on the analysis of retrieval-augmented question answering (QA). Further, our empirical study on object detection reveals that position bias is also present in vision-language models (VLMs). Based on the above analyses, we propose to eliminate position bias caused by different input segment orders (e.g., options in LM-as-a-judge, retrieved documents in QA) in a training-free zero-shot manner. Our method changes the causal attention to bidirectional attention between segments and utilizes model attention values to decide the relative orders of segments instead of using the order provided in input prompts, therefore enabling Position-INvariant inferencE (PINE) at the segment level. By eliminating position bias, models achieve better performance and reliability in downstream tasks where position bias widely exists, such as LM-as-a-judge and retrieval-augmented QA. Notably, PINE is especially useful when adapting LMs for evaluating reasoning pairs: it consistently provides 8 to 10 percentage points performance gains in most cases, and makes Llama-3-70B-Instruct perform even better than GPT-4-0125-preview on the RewardBench reasoning subset.
Multilingual Text Style Transfer: Datasets & Models for Indian Languages
Mukherjee, Sourabrata, Ojha, Atul Kr., Bansal, Akanksha, Alok, Deepak, McCrae, John P., Dušek, Ondřej
Text style transfer (TST) involves altering the linguistic style of a text while preserving its core content. This paper focuses on sentiment transfer, a vital TST subtask (Mukherjee et al., 2022a), across a spectrum of Indian languages: Hindi, Magahi, Malayalam, Marathi, Punjabi, Odia, Telugu, and Urdu, expanding upon previous work on English-Bangla sentiment transfer (Mukherjee et al., 2023). We introduce dedicated datasets of 1,000 positive and 1,000 negative style-parallel sentences for each of these eight languages. We then evaluate the performance of various benchmark models categorized into parallel, non-parallel, cross-lingual, and shared learning approaches, including the Llama2 and GPT-3.5 large language models (LLMs). Our experiments highlight the significance of parallel data in TST and demonstrate the effectiveness of the Masked Style Filling (MSF) approach (Mukherjee et al., 2023) in non-parallel techniques. Moreover, cross-lingual and joint multilingual learning methods show promise, offering insights into selecting optimal models tailored to the specific language and task requirements. To the best of our knowledge, this work represents the first comprehensive exploration of the TST task as sentiment transfer across a diverse set of languages.
Real Time Monitoring and Forecasting of COVID 19 Cases using an Adjusted Holt based Hybrid Model embedded with Wavelet based ANN
Das, Agniva, Muralidharan, Kunnummal
Since the inception of the SARS - CoV - 2 (COVID - 19) novel coronavirus, a lot of time and effort is being allocated to estimate the trajectory and possibly, forecast with a reasonable degree of accuracy, the number of cases, recoveries, and deaths due to the same. The model proposed in this paper is a mindful step in the same direction. The primary model in question is a Hybrid Holt's Model embedded with a Wavelet-based ANN. To test its forecasting ability, we have compared three separate models, the first, being a simple ARIMA model, the second, also an ARIMA model with a wavelet-based function, and the third, being the proposed model. We have also compared the forecast accuracy of this model with that of a modern day Vanilla LSTM recurrent neural network model. We have tested the proposed model on the number of confirmed cases (daily) for the entire country as well as 6 hotspot states. We have also proposed a simple adjustment algorithm in addition to the hybrid model so that daily and/or weekly forecasts can be meted out, with respect to the entirety of the country, as well as a moving window performance metric based on out-of-sample forecasts. In order to have a more rounded approach to the analysis of COVID-19 dynamics, focus has also been given to the estimation of the Basic Reproduction Number, $R_0$ using a compartmental epidemiological model (SIR). Lastly, we have also given substantial attention to estimating the shelf-life of the proposed model. It is obvious yet noteworthy how an accurate model, in this regard, can ensure better allocation of healthcare resources, as well as, enable the government to take necessary measures ahead of time.
Neural Machine Translation for Malayalam Paraphrase Generation
Varghese, Christeena, Koshelev, Sergey, Yamshchikov, Ivan P.
This study explores four methods of generating paraphrases in Malayalam, utilizing resources available for English paraphrasing and pre-trained Neural Machine Translation (NMT) models. We evaluate the resulting paraphrases using both automated metrics, such as BLEU, METEOR, and cosine similarity, as well as human annotation. Our findings suggest that automated evaluation measures may not be fully appropriate for Malayalam, as they do not consistently align with human judgment. This discrepancy underscores the need for more nuanced paraphrase evaluation approaches especially for highly agglutinative languages.
Bayesian Learning of Coupled Biogeochemical-Physical Models
Gupta, Abhinav, Lermusiaux, Pierre F. J.
Predictive dynamical models for marine ecosystems are used for a variety of needs. Due to sparse measurements and limited understanding of the myriad of ocean processes, there is however significant uncertainty. There is model uncertainty in the parameter values, functional forms with diverse parameterizations, level of complexity needed, and thus in the state fields. We develop a Bayesian model learning methodology that allows interpolation in the space of candidate models and discovery of new models from noisy, sparse, and indirect observations, all while estimating state fields and parameter values, as well as the joint PDFs of all learned quantities. We address the challenges of high-dimensional and multidisciplinary dynamics governed by PDEs by using state augmentation and the computationally efficient GMM-DO filter. Our innovations include stochastic formulation and complexity parameters to unify candidate models into a single general model as well as stochastic expansion parameters within piecewise function approximations to generate dense candidate model spaces. These innovations allow handling many compatible and embedded candidate models, possibly none of which are accurate, and learning elusive unknown functional forms. Our new methodology is generalizable, interpretable, and extrapolates out of the space of models to discover new ones. We perform a series of twin experiments based on flows past a ridge coupled with three-to-five component ecosystem models, including flows with chaotic advection. The probabilities of known, uncertain, and unknown model formulations, and of state fields and parameters, are updated jointly using Bayes' law. Non-Gaussian statistics, ambiguity, and biases are captured. The parameter values and model formulations that best explain the data are identified. When observations are sufficiently informative, model complexity and functions are discovered.
IMaSC -- ICFOSS Malayalam Speech Corpus
Gopinath, Deepa P, K, Thennal D, Nair, Vrinda V, S, Swaraj K, G, Sachin
Modern text-to-speech (TTS) systems use deep learning to synthesize speech increasingly approaching human quality, but they require a database of high quality audio-text sentence pairs for training. Malayalam, the official language of the Indian state of Kerala and spoken by 35+ million people, is a low resource language in terms of available corpora for TTS systems. In this paper, we present IMaSC, a Malayalam text and speech corpora containing approximately 50 hours of recorded speech. With 8 speakers and a total of 34,473 text-audio pairs, IMaSC is larger than every other publicly available alternative. We evaluated the database by using it to train TTS models for each speaker based on a modern deep learning architecture. Via subjective evaluation, we show that our models perform significantly better in terms of naturalness compared to previous studies and publicly available models, with an average mean opinion score of 4.50, indicating that the synthesized speech is close to human quality.
Development of fully intuitionistic fuzzy data envelopment analysis model with missing data: an application to Indian police sector
Sonkariya, Anjali, Singh, Awadh Pratap, Yadav, Shiv Prasad
Data Envelopment Analysis (DEA) is a technique used to measure the efficiency of decision-making units (DMUs). In order to measure the efficiency of DMUs, the essential requirement is input-output data. Data is usually collected by humans, machines, or both. Due to human/machine errors, there are chances of having some missing values or inaccuracy, such as vagueness/uncertainty/hesitation in the collected data. In this situation, it will be difficult to measure the efficiencies of DMUs accurately. To overcome these shortcomings, a method is presented that can deal with missing values and inaccuracy in the data. To measure the performance efficiencies of DMUs, an input minimization BCC (IMBCC) model in a fully intuitionistic fuzzy (IF) environment is proposed. To validate the efficacy of the proposed fully intuitionistic fuzzy input minimization BCC (FIFIMBCC) model and the technique to deal with missing values in the data, a real-life application to measure the performance efficiencies of Indian police stations is presented.
Over a Decade of Social Opinion Mining
Social media popularity and importance is on the increase, due to people using it for various types of social interaction across multiple channels. This social interaction by online users includes submission of feedback, opinions and recommendations about various individuals, entities, topics, and events. This systematic review focuses on the evolving research area of Social Opinion Mining, tasked with the identification of multiple opinion dimensions, such as subjectivity, sentiment polarity, emotion, affect, sarcasm and irony, from user-generated content represented across multiple social media platforms and in various media formats, like text, image, video and audio. Therefore, through Social Opinion Mining, natural language can be understood in terms of the different opinion dimensions, as expressed by humans. This contributes towards the evolution of Artificial Intelligence, which in turn helps the advancement of several real-world use cases, such as customer service and decision making. A thorough systematic review was carried out on Social Opinion Mining research which totals 485 studies and spans a period of twelve years between 2007 and 2018. The in-depth analysis focuses on the social media platforms, techniques, social datasets, language, modality, tools and technologies, natural language processing tasks and other aspects derived from the published studies. Such multi-source information fusion plays a fundamental role in mining of people's social opinions from social media platforms. These can be utilised in many application areas, ranging from marketing, advertising and sales for product/service management, and in multiple domains and industries, such as politics, technology, finance, healthcare, sports and government. Future research directions are presented, whereas further research and development has the potential of leaving a wider academic and societal impact.
COVID-19: Strategies for Allocation of Test Kits
Biswas, Arpita, Bannur, Shruthi, Jain, Prateek, Merugu, Srujana
South Korea, a country of 50 million people, has set an example of successfully flattening the curve of new COVID-19 infections by conducting over 400,000 tests [13] (Figure 2). This was achieved by setting up drive-through testing, allowing at least 10,000 people to be tested per day. South Korea's foreign minister Kang Kyung-wha, in an interview with BBC News [2], said that "Testing is central because that leads to early detection, minimizes further spread, and quickly treats those found with the virus". Several countries are suffering from severe community spread because of their delays in testing [12], two of the prime examples being the United States and Italy. In the United States, among a population of 330 million, the number of confirmed cases is more than 230,000 with over 10,000 deaths and these numbers are growing exponentially (Figure 3), whereas in South Korea there are around 9976 confirmed cases and 169 deaths (as of April 2, 2020). Thus, early testing and repeated testing at regular intervals are two of the key strategies to ensure a low fatality rate. However, for countries with a large population (more than 100 million), it is difficult to adopt exhaustive testing schemes because of the limited number of available testing-kits and facilities. Testing a lot of people with mild or no symptoms would occupy the limited testing resources, which could otherwise be used for highrisk patients. However, it is also important to test individuals with mild or no symptoms to detect asymptomatic cases [10] and implement a method that systematically tests individuals for COVID-19.