AITopics | Manipur

Collaborating Authors

Manipur

New eye scan detects diseases years before symptoms appear

Al JazeeraMay-13-2026, 19:37:25 GMT

A Qatar-based professor has pioneered a non-invasive eye scan to detect neurodegenerative diseases years before symptoms appear. The technology uses AI to analyse the eye and can identify early signs of dementia, Parkinson's disease, and other diseases within minutes. Church leaders killed in latest ethnic violence in India's Manipur

artificial intelligence, live navigation menu news show, video duration 00, (4 more...)

Al Jazeera

Country:

North America > United States (0.71)
Asia > Middle East > Qatar (0.27)
Asia > India > Manipur (0.27)

Industry: Health & Medicine > Therapeutic Area > Neurology > Dementia (0.59)

Technology: Information Technology > Artificial Intelligence (0.59)

Add feedback

FairI Tales: Evaluation of Fairness in Indian Contexts with a Focus on Bias and Stereotypes

Nawale, Janki Atul, Khan, Mohammed Safi Ur Rahman, D, Janani, Gupta, Mansi, Pruthi, Danish, Khapra, Mitesh M.

arXiv.org Artificial IntelligenceJul-1-2025

Existing studies on fairness are largely Western-focused, making them inadequate for culturally diverse countries such as India. To address this gap, we introduce INDIC-BIAS, a comprehensive India-centric benchmark designed to evaluate fairness of LLMs across 85 identity groups encompassing diverse castes, religions, regions, and tribes. We first consult domain experts to curate over 1,800 socio-cultural topics spanning behaviors and situations, where biases and stereotypes are likely to emerge. Grounded in these topics, we generate and manually validate 20,000 real-world scenario templates to probe LLMs for fairness. We structure these templates into three evaluation tasks: plausibility, judgment, and generation. Our evaluation of 14 popular LLMs on these tasks reveals strong negative biases against marginalized identities, with models frequently reinforcing common stereotypes. Additionally, we find that models struggle to mitigate bias even when explicitly asked to rationalize their decision. Our evaluation provides evidence of both allocative and representational harms that current LLMs could cause towards Indian identities, calling for a more cautious usage in practical applications. We release INDIC-BIAS as an open-source benchmark to advance research on benchmarking and mitigating biases and stereotypes in the Indian context.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2506.23111

Country:

Asia > India > Bihar (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Asia > India > Uttar Pradesh (0.04)
(37 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Law > Civil Rights & Constitutional Law (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Education > Educational Setting (0.92)
Government > Regional Government > Asia Government > India Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Deceptive Humor: A Synthetic Multilingual Benchmark Dataset for Bridging Fabricated Claims with Humorous Content

Kasu, Sai Kartheek Reddy, Biradar, Shankar, Saumya, Sunil

arXiv.org Artificial IntelligenceMar-20-2025

This paper presents the Deceptive Humor Dataset (DHD), a novel resource for studying humor derived from fabricated claims and misinformation. In an era of rampant misinformation, understanding how humor intertwines with deception is essential. DHD consists of humor-infused comments generated from false narratives, incorporating fabricated claims and manipulated information using the ChatGPT-4o model. Each instance is labeled with a Satire Level, ranging from 1 for subtle satire to 3 for high-level satire and classified into five distinct Humor Categories: Dark Humor, Irony, Social Commentary, Wordplay, and Absurdity. The dataset spans multiple languages including English, Telugu, Hindi, Kannada, Tamil, and their code-mixed variants (Te-En, Hi-En, Ka-En, Ta-En), making it a valuable multilingual benchmark. By introducing DHD, we establish a structured foundation for analyzing humor in deceptive contexts, paving the way for a new research direction that explores how humor not only interacts with misinformation but also influences its perception and spread. We establish strong baselines for the proposed dataset, providing a foundation for future research to benchmark and advance deceptive humor detection models.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2503.16031

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > China (0.05)
Asia > India > Andhra Pradesh (0.04)
(6 more...)

Genre: Research Report > New Finding (0.68)

Industry:

Government (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.95)
Media > News (0.91)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Parallel Corpora for Machine Translation in Low-resource Indic Languages: A Comprehensive Review

Raja, Rahul, Vats, Arpita

arXiv.org Artificial IntelligenceMar-2-2025

Parallel corpora play an important role in training machine translation (MT) models, particularly for low-resource languages where high-quality bilingual data is scarce. This review provides a comprehensive overview of available parallel corpora for Indic languages, which span diverse linguistic families, scripts, and regional variations. We categorize these corpora into text-to-text, code-switched, and various categories of multimodal datasets, highlighting their significance in the development of robust multilingual MT systems. Beyond resource enumeration, we critically examine the challenges faced in corpus creation, including linguistic diversity, script variation, data scarcity, and the prevalence of informal textual content.We also discuss and evaluate these corpora in various terms such as alignment quality and domain representativeness. Furthermore, we address open challenges such as data imbalance across Indic languages, the trade-off between quality and quantity, and the impact of noisy, informal, and dialectal data on MT performance. Finally, we outline future directions, including leveraging cross-lingual transfer learning, expanding multilingual datasets, and integrating multimodal resources to enhance translation quality. To the best of our knowledge, this paper presents the first comprehensive review of parallel corpora specifically tailored for low-resource Indic languages in the context of machine translation.

corpora, dataset, translation, (14 more...)

arXiv.org Artificial Intelligence

2503.04797

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Indonesia > Bali (0.04)
(30 more...)

Genre: Overview (1.00)

Industry:

Education (0.67)
Government (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

A Breadth-First Catalog of Text Processing, Speech Processing and Multimodal Research in South Asian Languages

Gupta, Pranav

arXiv.org Artificial IntelligenceDec-20-2024

We review the recent literature (January 2022- October 2024) in South Asian languages on text-based language processing, multimodal models, and speech processing, and provide a spotlight analysis focused on 21 low-resource South Asian languages, namely Saraiki, Assamese, Balochi, Bhojpuri, Bodo, Burmese, Chhattisgarhi, Dhivehi, Gujarati, Kannada, Kashmiri, Konkani, Khasi, Malayalam, Meitei, Nepali, Odia, Pashto, Rajasthani, Sindhi, and Telugu. We identify trends, challenges, and future research directions, using a step-wise approach that incorporates relevance classification and clustering based on large language models (LLMs). Our goal is to provide a breadth-first overview of the recent developments in South Asian language technologies to NLP researchers interested in working with South Asian languages.

proceedings, recognition, speech recognition, (14 more...)

arXiv.org Artificial Intelligence

2501.00029

Country:

Asia > Nepal (0.14)
Asia > Sri Lanka (0.14)
Asia > India > Goa (0.14)
(32 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
(2 more...)

Add feedback

EzSQL: An SQL intermediate representation for improving SQL-to-text Generation

Bhardwaj, Meher, Ethari, Hrishikesh, Moirangthem, Dennis Singh

arXiv.org Artificial IntelligenceNov-28-2024

The SQL-to-text generation task traditionally uses template base, Seq2Seq, tree-to-sequence, and graph-to-sequence models. Recent models take advantage of pre-trained generative language models for this task in the Seq2Seq framework. However, treating SQL as a sequence of inputs to the pre-trained models is not optimal. In this work, we put forward a new SQL intermediate representation called EzSQL to align SQL with the natural language text sequence. EzSQL simplifies the SQL queries and brings them closer to natural language text by modifying operators and keywords, which can usually be described in natural language. EzSQL also removes the need for set operators. Our proposed SQL-to-text generation model uses EzSQL as the input to a pre-trained generative language model for generating the text descriptions. We demonstrate that our model is an effective state-of-the-art method to generate text narrations from SQL queries on the WikiSQL and Spider datasets. We also show that by generating pretraining data using our SQL-to-text generation model, we can enhance the performance of Text-to-SQL parsers.

artificial intelligence, computational linguistic, natural language, (16 more...)

arXiv.org Artificial Intelligence

2411.18923

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Belgium > Brussels-Capital Region > Brussels (0.05)
Europe > Germany > Berlin (0.04)
(4 more...)

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback

SPRING Lab IITM's submission to Low Resource Indic Language Translation Shared Task

Sayed, Hamees, Joglekar, Advait, Umesh, Srinivasan

arXiv.org Artificial IntelligenceNov-11-2024

We develop a robust translation model for four low-resource Indic languages: Khasi, Mizo, Manipuri, and Assamese. Our approach includes a comprehensive pipeline from data collection and preprocessing to training and evaluation, leveraging data from WMT task datasets, BPCC, PMIndia, and OpenLanguageData. To address the scarcity of bilingual data, we use back-translation techniques on monolingual datasets for Mizo and Khasi, significantly expanding our training corpus. We fine-tune the pre-trained NLLB 3.3B model for Assamese, Mizo, and Manipuri, achieving improved performance over the baseline. For Khasi, which is not supported by the NLLB model, we introduce special tokens and train the model on our Khasi corpus. Our training involves masked language modelling, followed by fine-tuning for English-to-Indic and Indic-to-English translations.

khasi, proceedings, translation, (14 more...)

arXiv.org Artificial Intelligence

2411.00727

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Asia > India > Mizoram (0.05)
Asia > India > Meghalaya (0.05)
(8 more...)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Whole-Graph Representation Learning For the Classification of Signed Networks

Cecillon, Noé, Labatut, Vincent, Dufour, Richard, Arınık, Nejat

arXiv.org Artificial IntelligenceSep-30-2024

Graphs are ubiquitous for modeling complex systems involving structured data and relationships. Consequently, graph representation learning, which aims to automatically learn low-dimensional representations of graphs, has drawn a lot of attention in recent years. The overwhelming majority of existing methods handle unsigned graphs. However, signed graphs appear in an increasing number of application domains to model systems involving two types of opposed relationships. Several authors took an interest in signed graphs and proposed methods for providing vertex-level representations, but only one exists for whole-graph representations, and it can handle only fully connected graphs. In this article, we tackle this issue by proposing two approaches to learning whole-graph representations of general signed graphs. The first is a SG2V, a signed generalization of the whole-graph embedding method Graph2vec that relies on a modification of the Weisfeiler--Lehman relabelling procedure. The second one is WSGCN, a whole-graph generalization of the signed vertex embedding method SGCN that relies on the introduction of master nodes into the GCN. We propose several variants of both these approaches. A bottleneck in the development of whole-graph-oriented methods is the lack of data. We constitute a benchmark composed of three collections of signed graphs with corresponding ground truths. We assess our methods on this benchmark, and our results show that the signed whole-graph methods learn better representations for this task. Overall, the baseline obtains an F-measure score of 58.57, when SG2V and WSGCN reach 73.01 and 81.20, respectively. Our source code and benchmark dataset are both publicly available online.

data mining, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ACCESS.2024.3472474

2409.20073

Country:

Europe > France > Pays de la Loire > Loire-Atlantique > Nantes (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Michigan (0.04)
Asia > India > Manipur (0.04)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Communications (0.68)
(2 more...)

Add feedback

Detecting Masquerade Attacks in Controller Area Networks Using Graph Machine Learning

Marfo, William, Moriano, Pablo, Tosh, Deepak K., Moore, Shirley V.

arXiv.org Artificial IntelligenceAug-10-2024

Modern vehicles rely on a myriad of electronic control units (ECUs) interconnected via controller area networks (CANs) for critical operations. Despite their ubiquitous use and reliability, CANs are susceptible to sophisticated cyberattacks, particularly masquerade attacks, which inject false data that mimic legitimate messages at the expected frequency. These attacks pose severe risks such as unintended acceleration, brake deactivation, and rogue steering. Traditional intrusion detection systems (IDS) often struggle to detect these subtle intrusions due to their seamless integration into normal traffic. This paper introduces a novel framework for detecting masquerade attacks in the CAN bus using graph machine learning (ML). We hypothesize that the integration of shallow graph embeddings with time series features derived from CAN frames enhances the detection of masquerade attacks. We show that by representing CAN bus frames as message sequence graphs (MSGs) and enriching each node with contextual statistical attributes from time series, we can enhance detection capabilities across various attack patterns compared to using only graph-based features. Our method ensures a comprehensive and dynamic analysis of CAN frame interactions, improving robustness and efficiency. Extensive experiments on the ROAD dataset validate the effectiveness of our approach, demonstrating statistically significant improvements in the detection rates of masquerade attacks compared to a baseline that uses only graph-based features, as confirmed by Mann-Whitney U and Kolmogorov-Smirnov tests (p < 0.05).

detection, graph, masquerade attack, (13 more...)

arXiv.org Artificial Intelligence

2408.05427

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Tennessee > Anderson County > Oak Ridge (0.04)
South America > Colombia (0.04)
(7 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.88)

Industry:

Information Technology > Security & Privacy (1.00)
Automobiles & Trucks (1.00)
Government > Military > Cyberwarfare (0.66)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

Navigating Text-to-Image Generative Bias across Indic Languages

Mittal, Surbhi, Sudan, Arnav, Vatsa, Mayank, Singh, Richa, Glaser, Tamar, Hassner, Tal

arXiv.org Artificial IntelligenceAug-1-2024

This research investigates biases in text-to-image (TTI) models for the Indic languages widely spoken across India. It evaluates and compares the generative performance and cultural relevance of leading TTI models in these languages against their performance in English. Using the proposed IndicTTI benchmark, we comprehensively assess the performance of 30 Indic languages with two open-source diffusion models and two commercial generation APIs. The primary objective of this benchmark is to evaluate the support for Indic languages in these models and identify areas needing improvement. Given the linguistic diversity of 30 languages spoken by over 1.4 billion people, this benchmark aims to provide a detailed and insightful analysis of TTI models' effectiveness within the Indic linguistic landscape.

benchmark, deva, indic language, (14 more...)

arXiv.org Artificial Intelligence

2408.00283

Country:

North America > United States > California > San Mateo County > Menlo Park (0.04)
North America > United States > California > Alameda County > Alameda (0.04)
Asia > Indonesia > Bali (0.04)
(3 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.46)

Add feedback