Goto

Collaborating Authors

 Nandi, Sukumar


Comparative Study of Zero-Shot Cross-Lingual Transfer for Bodo POS and NER Tagging Using Gemini 2.0 Flash Thinking Experimental Model

arXiv.org Artificial Intelligence

Part-of-Speech (POS) tagging and Named Entity Recognition (NER) are fundamental tasks within the field of Natural Language Processing (NLP), serving as essential prerequisites for a multitude of downstream applications. POS tagging, the process of assigning grammatical categories to individual words within a sentence (e.g., noun, verb, adjective, adverb), provides crucial syntactic information that underpins higher-level language understanding. NER, on the contrary, focuses on identifying and classifying named entities - real-world objects that are designated with a proper name - into predefined semantic categories such as persons, organizations, locations, dates, times, and quantities [1, 2]. The synergy of POS and NER tagging empowers a wide spectrum of NLP applications. In information extraction, NER helps to pinpoint key entities, while POS tags help to understand the relationships between these entities and other words in the text, facilitating the extraction of structured information from unstructured text [3]. Machine translation systems benefit from POS tagging to improve syntactic analysis and word order prediction, and NER to ensure accurate translation of named entities in languages [4]. Question-answer systems rely on both NER and POS to understand the question's intent, identify relevant entities and relationships in the knowledge base, and formulate accurate answers. Text summarization algorithms leverage NER to identify salient entities and POS tags to preserve grammatical coherence and readability in summaries.


AC-Lite : A Lightweight Image Captioning Model for Low-Resource Assamese Language

arXiv.org Artificial Intelligence

Neural networks have significantly advanced AI applications, yet their real-world adoption remains constrained by high computational demands, hardware limitations, and accessibility challenges. In image captioning, many state-of-the-art models have achieved impressive performances while relying on resource-intensive architectures. This made them impractical for deployment on resource-constrained devices. This limitation is particularly noticeable for applications involving low-resource languages. We demonstrate the case of image captioning in Assamese language, where lack of effective, scalable systems can restrict the accessibility of AI-based solutions for native Assamese speakers. This work presents AC-Lite, a computationally efficient model for image captioning in low-resource Assamese language. AC-Lite reduces computational requirements by replacing computation-heavy visual feature extractors like FasterRCNN with lightweight ShuffleNetv2x1.5. Additionally, Gated Recurrent Units (GRUs) are used as the caption decoder to further reduce computational demands and model parameters. Furthermore, the integration of bilinear attention enhances the model's overall performance. AC-Lite can operate on edge devices, thereby eliminating the need for computation on remote servers. The proposed AC-Lite model achieves 82.3 CIDEr score on the COCO-AC dataset with 1.098 GFLOPs and 25.65M parameters.


Part-of-Speech Tagger for Bodo Language using Deep Learning approach

arXiv.org Artificial Intelligence

Language Processing systems such as Part-of-speech tagging, Named entity recognition, Machine translation, Speech recognition, and Language modeling (LM) are well-studied in high-resource languages. Nevertheless, research on these systems for several low-resource languages, including Bodo, Mizo, Nagamese, and others, is either yet to commence or is in its nascent stages. Language model plays a vital role in the downstream tasks of modern NLP. Extensive studies are carried out on LMs for high-resource languages. Nevertheless, languages such as Bodo, Rabha, and Mising continue to lack coverage. In this study, we first present BodoBERT, a language model for the Bodo language. To the best of our knowledge, this work is the first such effort to develop a language model for Bodo. Secondly, we present an ensemble DL-based POS tagging model for Bodo. The POS tagging model is based on combinations of BiLSTM with CRF and stacked embedding of BodoBERT with BytePairEmbeddings. We cover several language models in the experiment to see how well they work in POS tagging tasks. The best-performing model achieves an F1 score of 0.8041. A comparative experiment was also conducted on Assamese POS taggers, considering that the language is spoken in the same region as Bodo.


AsPOS: Assamese Part of Speech Tagger using Deep Learning Approach

arXiv.org Artificial Intelligence

Part of Speech (POS) tagging is crucial to Natural Language Processing (NLP). It is a well-studied topic in several resource-rich languages. However, the development of computational linguistic resources is still in its infancy despite the existence of numerous languages that are historically and literary rich. Assamese, an Indian scheduled language, spoken by more than 25 million people, falls under this category. In this paper, we present a Deep Learning (DL)-based POS tagger for Assamese. The development process is divided into two stages. In the first phase, several pre-trained word embeddings are employed to train several tagging models. This allows us to evaluate the performance of the word embeddings in the POS tagging task. The top-performing model from the first phase is employed to annotate another set of new sentences. In the second phase, the model is trained further using the fresh dataset. Finally, we attain a tagging accuracy of 86.52% in F1 score. The model may serve as a baseline for further study on DL-based Assamese POS tagging.


AsNER -- Annotated Dataset and Baseline for Assamese Named Entity recognition

arXiv.org Artificial Intelligence

We present the AsNER, a named entity annotation dataset for low resource Assamese language with a baseline Assamese NER model. The dataset contains about 99k tokens comprised of text from the speech of the Prime Minister of India and Assamese play. It also contains person names, location names and addresses. The proposed NER dataset is likely to be a significant resource for deep neural based Assamese language processing. We benchmark the dataset by training NER models and evaluating using state-of-the-art architectures for supervised named entity recognition (NER) such as Fasttext, BERT, XLM-R, FLAIR, MuRIL etc. We implement several baseline approaches with state-of-the-art sequence tagging Bi-LSTM-CRF architecture. The highest F1-score among all baselines achieves an accuracy of 80.69% when using MuRIL as a word embedding method. The annotated dataset and the top performing model are made publicly available.