AITopics | maithili

Collaborating Authors

maithili

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SentiMaithili: A Benchmark Dataset for Sentiment and Reason Generation for the Low-Resource Maithili Language

Ranjan, Rahul, Gurve, Mahendra Kumar, Anuj, null, Nitin, null, Prasad, Yamuna

arXiv.org Artificial IntelligenceOct-28-2025

Developing benchmark datasets for low-resource languages poses significant challenges, primarily due to the limited availability of native linguistic experts and the substantial time and cost involved in annotation. Given these challenges, Maithili is still underrepresented in natural language processing research. It is an Indo-Aryan language spoken by more than 13 million people in the Purvanchal region of India, valued for its rich linguistic structure and cultural significance. While sentiment analysis has achieved remarkable progress in high-resource languages, resources for low-resource languages, such as Maithili, remain scarce, often restricted to coarse-grained annotations and lacking interpretability mechanisms. To address this limitation, we introduce a novel dataset comprising 3,221 Maithili sentences annotated for sentiment polarity and accompanied by natural language justifications. Moreover, the dataset is carefully curated and validated by linguistic experts to ensure both label reliability and contextual fidelity. Notably, the justifications are written in Maithili, thereby promoting culturally grounded interpretation and enhancing the explainability of sentiment models. Furthermore, extensive experiments using both classical machine learning and state-of-the-art transformer architectures demonstrate the dataset's effectiveness for interpretable sentiment analysis. Ultimately, this work establishes the first benchmark for explainable affective computing in Maithili, thus contributing a valuable resource to the broader advancement of multilingual NLP and explainable AI.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2510.2216

Country: Asia > India (0.48)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

Can maiBERT Speak for Maithili?

Yadav, Sumit, Yadav, Raju Kumar, Maskey, Utsav, Kashyap, Gautam Siddharth, Hoque, Md Azizul, Gautam, Ganesh

arXiv.org Artificial IntelligenceSep-23-2025

Natural Language Understanding (NLU) for low-resource languages remains a major challenge in NLP due to the scarcity of high-quality data and language-specific models. Maithili, despite being spoken by millions, lacks adequate computational resources, limiting its inclusion in digital and AI-driven applications. To address this gap, we introducemaiBERT, a BERT-based language model pre-trained specifically for Maithili using the Masked Language Modeling (MLM) technique. Our model is trained on a newly constructed Maithili corpus and evaluated through a news classification task. In our experiments, maiBERT achieved an accuracy of 87.02%, outperforming existing regional models like NepBERTa and HindiBERT, with a 0.13% overall accuracy gain and 5-7% improvement across various classes. We have open-sourced maiBERT on Hugging Face enabling further fine-tuning for downstream tasks such as sentiment analysis and Named Entity Recognition (NER).

artificial intelligence, natural language, text processing, (18 more...)

arXiv.org Artificial Intelligence

2509.15048

Country: Asia > India (0.14)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Add feedback

Kinship in Speech: Leveraging Linguistic Relatedness for Zero-Shot TTS in Indian Languages

Pathak, Utkarsh, Gunda, Chandra Sai Krishna, Prakash, Anusha, Agarwal, Keshav, Murthy, Hema A.

arXiv.org Artificial IntelligenceJun-5-2025

Text-to-speech (TTS) systems typically require high-quality studio data and accurate transcriptions for training. India has 1369 languages, with 22 official using 13 scripts. Training a TTS system for all these languages, most of which have no digital resources, seems a Herculean task. Our work focuses on zero-shot synthesis, particularly for languages whose scripts and phonotactics come from different families. The novelty of our work is in the augmentation of a shared phone representation and modifying the text parsing rules to match the phonotac-tics of the target language, thus reducing the synthesiser overhead and enabling rapid adaptation. Intelligible and natural speech was generated for Sanskrit, Maharashtrian and Canara Konkani, Maithili and Kurukh by leveraging linguistic connections across languages with suitable synthesisers. Evaluations confirm the effectiveness of this approach, highlighting its potential to expand speech technology access for under-represented languages. Index T erms: zero-shot synthesis, unseen Indian languages, common label set (CLS), low resource, unified parser.

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2506.03884

Country: Asia > India (0.51)

Genre: Research Report (0.41)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.84)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.52)

Add feedback

A Finite State Transducer Based Morphological Analyzer of Maithili Language

Rahi, Raza, Pushp, Sumant, Khan, Arif, Sinha, Smriti Kumar

arXiv.org Artificial IntelligenceFeb-29-2020

Morphological analyzers are the essential milestones for many linguistic applications like; machine translation, word sense disambiguation, spells checkers, and search engines etc. Therefore, development of an effective morphological analyzer has a greater impact on the computational recognition of a language. In this paper, we present a finite state transducer based inflectional morphological analyzer for a resource poor language of India, known as Maithili. Maithili is an eastern Indo-Aryan language spoken in the eastern and northern regions of Bihar in India and the southeastern plains, known as tarai of Nepal. This work can be recognized as the first work towards the computational development of Maithili which may attract researchers around the country to up-rise the language to establish in computational world.

finite state transducer, maithili, morphological analyzer, (8 more...)

arXiv.org Artificial Intelligence

2003.00234

Country:

Asia > Nepal (0.25)
Asia > India > Bihar > Patna (0.04)
Asia > India > Assam (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback