Goto

Collaborating Authors

 Mizoram


Health Sentinel: An AI Pipeline For Real-time Disease Outbreak Detection

Pant, Devesh, Grandhe, Rishi Raj, Samaria, Vipin, Paul, Mukul, Kumar, Sudhir, Khanna, Saransh, Agrawal, Jatin, Kalra, Jushaan Singh, VSSG, Akhil, Khalikar, Satish V, Garg, Vipin, Chauhan, Himanshu, Verma, Pranay, Khandelwal, Neha, Dhavala, Soma S, Mathew, Minesh

arXiv.org Artificial Intelligence

Early detection of disease outbreaks is crucial to ensure timely intervention by the health authorities. Due to the challenges associated with traditional indicator-based surveillance, monitoring informal sources such as online media has become increasingly popular. However, owing to the number of online articles getting published everyday, manual screening of the articles is impractical. To address this, we propose Health Sentinel. It is a multi-stage information extraction pipeline that uses a combination of ML and non-ML methods to extract events-structured information concerning disease outbreaks or other unusual health events-from online articles. The extracted events are made available to the Media Scanning and Verification Cell (MSVC) at the National Centre for Disease Control (NCDC), Delhi for analysis, interpretation and further dissemination to local agencies for timely intervention. From April 2022 till date, Health Sentinel has processed over 300 million news articles and identified over 95,000 unique health events across India of which over 3,500 events were shortlisted by the public health experts at NCDC as potential outbreaks.


IndicVisionBench: Benchmarking Cultural and Multilingual Understanding in VLMs

Faraz, Ali, Akash, null, Khan, Shaharukh, Kolla, Raja, Patidar, Akshat, Goswami, Suranjan, Ravi, Abhinav, Khatri, Chandra, Agarwal, Shubham

arXiv.org Artificial Intelligence

Vision-language models (VLMs) have demonstrated impressive generalization across multimodal tasks, yet most evaluation benchmarks remain Western-centric, leaving open questions about their performance in culturally diverse and multilingual settings. To address this gap, we introduce IndicVisionBench, the first large-scale benchmark centered on the Indian subcontinent. Our final benchmark consists of a total of 5K images and 37K+ QA pairs across 13 culturally grounded topics. In addition, we release a paired parallel corpus of annotations across 10 Indic languages, creating a unique resource for analyzing cultural and linguistic biases in VLMs. We evaluate a broad spectrum of 8 models, from proprietary closed-source systems to open-weights medium and large-scale models. Our experiments reveal substantial performance gaps, underscoring the limitations of current VLMs in culturally diverse contexts. By centering cultural diversity and multilinguality, IndicVisionBench establishes a reproducible evaluation framework that paves the way for more inclusive multimodal research. Vision-language models (VLMs) (Bai et al., 2023; Chen et al., 2024; Lu et al., 2024; Wang et al., 2024b; Laurenc on et al., 2024; Tong et al., 2024; Xue et al., 2024) have demonstrated strong performance across a variety of multimodal tasks. However, existing benchmarks (Antol et al., 2015; Fu et al., 2023; Goyal et al., 2017) remain heavily Western-centric, limiting our understanding of how these models generalize to culturally diverse and multilingual settings. While some recent efforts partially cover this diversity (Romero et al., 2024; Nayak et al., 2024; V ayani et al., 2025), a systematic, large-scale benchmark capturing India-specific cultural concepts across multiple languages is still lacking. To address this gap, we introduce IndicVisionBench, a culturally grounded evaluation benchmark tailored for the Indian subcontinent. To the best of our knowledge, this is the first large-scale benchmark explicitly designed to assess VLMs in the context of Indian culture and languages. We use states as a proxy for cultural groups following prior works (Adilazuarda et al., 2024; Nayak et al., 2024).


DRISHTIKON: A Multimodal Multilingual Benchmark for Testing Language Models' Understanding on Indian Culture

Maji, Arijit, Kumar, Raghvendra, Ghosh, Akash, Anushka, null, Shah, Nemil, Borah, Abhilekh, Shah, Vanshika, Mishra, Nishant, Saha, Sriparna

arXiv.org Artificial Intelligence

We introduce DRISHTIKON, a first-of-its-kind multimodal and multilingual benchmark centered exclusively on Indian culture, designed to evaluate the cultural understanding of generative AI systems. Unlike existing benchmarks with a generic or global scope, DRISHTIKON offers deep, fine-grained coverage across India's diverse regions, spanning 15 languages, covering all states and union territories, and incorporating over 64,000 aligned text-image pairs. The dataset captures rich cultural themes including festivals, attire, cuisines, art forms, and historical heritage amongst many more. We evaluate a wide range of vision-language models (VLMs), including open-source small and large models, proprietary systems, reasoning-specialized VLMs, and Indic-focused models, across zero-shot and chain-of-thought settings. Our results expose key limitations in current models' ability to reason over culturally grounded, multimodal inputs, particularly for low-resource languages and less-documented traditions. DRISHTIKON fills a vital gap in inclusive AI research, offering a robust testbed to advance culturally aware, multimodally competent language technologies.


Better To Ask in English? Evaluating Factual Accuracy of Multilingual LLMs in English and Low-Resource Languages

Rohera, Pritika, Ginimav, Chaitrali, Sawant, Gayatri, Joshi, Raviraj

arXiv.org Artificial Intelligence

Multilingual Large Language Models (LLMs) have demonstrated significant effectiveness across various languages, particularly in high-resource languages such as English. However, their performance in terms of factual accuracy across other low-resource languages, especially Indic languages, remains an area of investigation. In this study, we assess the factual accuracy of LLMs - GPT-4o, Gemma-2-9B, Gemma-2-2B, and Llama-3.1-8B - by comparing their performance in English and Indic languages using the IndicQuest dataset, which contains question-answer pairs in English and 19 Indic languages. By asking the same questions in English and their respective Indic translations, we analyze whether the models are more reliable for regional context questions in Indic languages or when operating in English. Our findings reveal that LLMs often perform better in English, even for questions rooted in Indic contexts. Notably, we observe a higher tendency for hallucination in responses generated in low-resource Indic languages, highlighting challenges in the multilingual understanding capabilities of current LLMs.


Wavelet-SARIMA-Transformer: A Hybrid Model for Rainfall Forecasting

Saikia, Junmoni, Goswami, Kuldeep, Kakaty, Sarat C.

arXiv.org Artificial Intelligence

This study develops and evaluates a novel hybridWavelet SARIMA Transformer, WST framework to forecast using monthly rainfall across five meteorological subdivisions of Northeast India over the 1971 to 2023 period. The approach employs the Maximal Overlap Discrete Wavelet Transform, MODWT with four wavelet families such as, Haar, Daubechies, Symlet, Coiflet etc. to achieve shift invariant, multiresolution decomposition of the rainfall series. Linear and seasonal components are modeled using Seasonal ARIMA, SARIMA, while nonlinear components are modeled by a Transformer network, and forecasts are reconstructed via inverse MODWT. Comprehensive validation using an 80 is to 20 train test split and multiple performance indices such as, RMSE, MAE, SMAPE, Willmotts d, Skill Score, Percent Bias, Explained Variance, and Legates McCabes E1 demonstrates the superiority of the Haar-based hybrid model, WHST. Across all subdivisions, WHST consistently achieved lower forecast errors, stronger agreement with observed rainfall, and unbiased predictions compared with stand alone SARIMA, stand-alone Transformer, and two-stage wavelet hybrids. Residual adequacy was confirmed through the Ljung Box test, while Taylor diagrams provided an integrated assessment of correlation, variance fidelity, and RMSE, further reinforcing the robustness of the proposed approach. The results highlight the effectiveness of integrating multiresolution signal decomposition with complementary linear and deep learning models for hydroclimatic forecasting. Beyond rainfall, the proposed WST framework offers a scalable methodology for forecasting complex environmental time series, with direct implications for flood risk management, water resources planning, and climate adaptation strategies in data-sparse and climate-sensitive regions.


Farm-Level, In-Season Crop Identification for India

Deshpande, Ishan, Reehal, Amandeep Kaur, Nath, Chandan, Singh, Renu, Patel, Aayush, Jayagopal, Aishwarya, Singh, Gaurav, Aggarwal, Gaurav, Agarwal, Amit, Bele, Prathmesh, Reddy, Sridhar, Warrier, Tanya, Singh, Kinjal, Tendulkar, Ashish, Outon, Luis Pazos, Saxena, Nikita, Dondzik, Agata, Tewari, Dinesh, Garg, Shruti, Singh, Avneet, Dhand, Harsh, Rajan, Vaibhav, Talekar, Alok

arXiv.org Artificial Intelligence

Accurate, timely, and farm-level crop type information is paramount for national food security, agricultural policy formulation, and economic planning, particularly in agriculturally significant nations like India. While remote sensing and machine learning have become vital tools for crop monitoring, existing approaches often grapple with challenges such as limited geographical scalability, restricted crop type coverage, the complexities of mixed-pixel and heterogeneous landscapes, and crucially, the robust in-season identification essential for proactive decision-making. We present a framework designed to address the critical data gaps for targeted data driven decision making which generates farm-level, in-season, multi-crop identification at national scale (India) using deep learning. Our methodology leverages the strengths of Sentinel-1 and Sentinel-2 satellite imagery, integrated with national-scale farm boundary data. The model successfully identifies 12 major crops (which collectively account for nearly 90% of India's total cultivated area showing an agreement with national crop census 2023-24 of 94% in winter, and 75% in monsoon season). Our approach incorporates an automated season detection algorithm, which estimates crop sowing and harvest periods. This allows for reliable crop identification as early as two months into the growing season and facilitates rigorous in-season performance evaluation. Furthermore, we have engineered a highly scalable inference pipeline, culminating in what is, to our knowledge, the first pan-India, in-season, farm-level crop type data product. The system's effectiveness and scalability are demonstrated through robust validation against national agricultural statistics, showcasing its potential to deliver actionable, data-driven insights for transformative agricultural monitoring and management across India.


FairI Tales: Evaluation of Fairness in Indian Contexts with a Focus on Bias and Stereotypes

Nawale, Janki Atul, Khan, Mohammed Safi Ur Rahman, D, Janani, Gupta, Mansi, Pruthi, Danish, Khapra, Mitesh M.

arXiv.org Artificial Intelligence

Existing studies on fairness are largely Western-focused, making them inadequate for culturally diverse countries such as India. To address this gap, we introduce INDIC-BIAS, a comprehensive India-centric benchmark designed to evaluate fairness of LLMs across 85 identity groups encompassing diverse castes, religions, regions, and tribes. We first consult domain experts to curate over 1,800 socio-cultural topics spanning behaviors and situations, where biases and stereotypes are likely to emerge. Grounded in these topics, we generate and manually validate 20,000 real-world scenario templates to probe LLMs for fairness. We structure these templates into three evaluation tasks: plausibility, judgment, and generation. Our evaluation of 14 popular LLMs on these tasks reveals strong negative biases against marginalized identities, with models frequently reinforcing common stereotypes. Additionally, we find that models struggle to mitigate bias even when explicitly asked to rationalize their decision. Our evaluation provides evidence of both allocative and representational harms that current LLMs could cause towards Indian identities, calling for a more cautious usage in practical applications. We release INDIC-BIAS as an open-source benchmark to advance research on benchmarking and mitigating biases and stereotypes in the Indian context.


Tone recognition in low-resource languages of North-East India: peeling the layers of SSL-based speech models

Gogoi, Parismita, Kalita, Sishir, Lalhminghlui, Wendy, Terhiija, Viyazonuo, Tzudir, Moakala, Sarmah, Priyankoo, Prasanna, S. R. M.

arXiv.org Artificial Intelligence

This study explores the use of self-supervised learning (SSL) models for tone recognition in three low-resource languages from North Eastern India: Angami, Ao, and Mizo. We evaluate four Wav2vec2.0 base models that were pre-trained on both tonal and non-tonal languages. We analyze tone-wise performance across the layers for all three languages and compare the different models. Our results show that tone recognition works best for Mizo and worst for Angami. The middle layers of the SSL models are the most important for tone recognition, regardless of the pre-training language, i.e. tonal or non-tonal. We have also found that the tone inventory, tone types, and dialectal variations affect tone recognition. These findings provide useful insights into the strengths and weaknesses of SSL-based embeddings for tonal languages and highlight the potential for improving tone recognition in low-resource settings. The source code is available at GitHub 1 .


Mitigating Hallucinated Translations in Large Language Models with Hallucination-focused Preference Optimization

Tang, Zilu, Chatterjee, Rajen, Garg, Sarthak

arXiv.org Artificial Intelligence

Machine Translation (MT) is undergoing a paradigm shift, with systems based on fine-tuned large language models (LLM) becoming increasingly competitive with traditional encoder-decoder models trained specifically for translation tasks. However, LLM-based systems are at a higher risk of generating hallucinations, which can severely undermine user's trust and safety. Most prior research on hallucination mitigation focuses on traditional MT models, with solutions that involve post-hoc mitigation - detecting hallucinated translations and re-translating them. While effective, this approach introduces additional complexity in deploying extra tools in production and also increases latency. To address these limitations, we propose a method that intrinsically learns to mitigate hallucinations during the model training phase. Specifically, we introduce a data creation framework to generate hallucination focused preference datasets. Fine-tuning LLMs on these preference datasets reduces the hallucination rate by an average of 96% across five language pairs, while preserving overall translation quality. In a zero-shot setting our approach reduces hallucinations by 89% on an average across three unseen target languages.


SPRING Lab IITM's submission to Low Resource Indic Language Translation Shared Task

Sayed, Hamees, Joglekar, Advait, Umesh, Srinivasan

arXiv.org Artificial Intelligence

We develop a robust translation model for four low-resource Indic languages: Khasi, Mizo, Manipuri, and Assamese. Our approach includes a comprehensive pipeline from data collection and preprocessing to training and evaluation, leveraging data from WMT task datasets, BPCC, PMIndia, and OpenLanguageData. To address the scarcity of bilingual data, we use back-translation techniques on monolingual datasets for Mizo and Khasi, significantly expanding our training corpus. We fine-tune the pre-trained NLLB 3.3B model for Assamese, Mizo, and Manipuri, achieving improved performance over the baseline. For Khasi, which is not supported by the NLLB model, we introduce special tokens and train the model on our Khasi corpus. Our training involves masked language modelling, followed by fine-tuning for English-to-Indic and Indic-to-English translations.