Goto

Collaborating Authors

 Vadodara


Integrating Linguistics and AI: Morphological Analysis and Corpus development of Endangered Toto Language of West Bengal

Guha, Ambalika, Saha, Sajal, Ballav, Debanjan, Mitra, Soumi, Chakraborty, Hritwick

arXiv.org Artificial Intelligence

Preserving linguistic diversity is necessary as every language offers a distinct perspective on the world. There have been numerous global initiatives to preserve endangered languages through documentation. This paper is a part of a project which aims to develop a trilingual (Toto-Bangla-English) language learning application to digitally archive and promote the endangered Toto language of West Bengal, India. This application, designed for both native Toto speakers and non-native learners, aims to revitalize the language by ensuring accessibility and usability through Unicode script integration and a structured language corpus. The research includes detailed linguistic documentation collected via fieldwork, followed by the creation of a morpheme-tagged, trilingual corpus used to train a Small Language Model (SLM) and a Transformer-based translation engine. The analysis covers inflectional morphology such as person-number-gender agreement, tense-aspect-mood distinctions, and case marking, alongside derivational strategies that reflect word-class changes. Script standardization and digital literacy tools were also developed to enhance script usage. The study offers a sustainable model for preserving endangered languages by incorporating traditional linguistic methodology with AI. This bridge between linguistic research with technological innovation highlights the value of interdisciplinary collaboration for community-based language revitalization.


THELMA: Task Based Holistic Evaluation of Large Language Model Applications-RAG Question Answering

Patel, Udita, Mulkar, Rutu, Roberts, Jay, Senthilkumar, Cibi Chakravarthy, Gandhi, Sujay, Zheng, Xiaofei, Nayyar, Naumaan, Kalra, Parul, Castrillo, Rafael

arXiv.org Artificial Intelligence

We propose THELMA (Task Based Holistic Evaluation of Large Language Model Applications), a reference free framework for RAG (Retrieval Augmented generation) based question answering (QA) applications. THELMA consist of six interdependent metrics specifically designed for holistic, fine grained evaluation of RAG QA applications. THELMA framework helps developers and application owners evaluate, monitor and improve end to end RAG QA pipelines without requiring labelled sources or reference responses.We also present our findings on the interplay of the proposed THELMA metrics, which can be interpreted to identify the specific RAG component needing improvement in QA applications.


Robot DOG makes an appearance at the Met Gala - dressed in a tuxedo and adorned with a 1,000-carat diamond leash

Daily Mail - Science & tech

At New York's Met Gala, guests are known for attention-grabbing outfits, from Katy Perry's human chandelier dress to Kim Kardashian's all-black body suit. But one attendant in particular has stolen the limelight this year – and he's not even human. Indian-American entrepreneur Mona Patel rocked up to the annual event on Monday night with an adorable robotic dachshund in tow. Vector the robo-dog, developed by scientists at MIT, has a 1,000-carat diamond-studded leash and his own cute little specially-fitted tuxedo. Powered by AI and equipped with sensors, Vector has customised movement patterns and'just the right amount of sass', Vogue India reports.


YINYANG-ALIGN: Benchmarking Contradictory Objectives and Proposing Multi-Objective Optimization based DPO for Text-to-Image Alignment

Das, Amitava, Narsupalli, Yaswanth, Singh, Gurpreet, Jain, Vinija, Sharma, Vasu, Trivedy, Suranjana, Chadha, Aman, Sheth, Amit

arXiv.org Artificial Intelligence

Precise alignment in Text-to-Image (T2I) systems is crucial to ensure that generated visuals not only accurately encapsulate user intents but also conform to stringent ethical and aesthetic benchmarks. Incidents like the Google Gemini fiasco, where misaligned outputs triggered significant public backlash, underscore the critical need for robust alignment mechanisms. In contrast, Large Language Models (LLMs) have achieved notable success in alignment. Building on these advancements, researchers are eager to apply similar alignment techniques, such as Direct Preference Optimization (DPO), to T2I systems to enhance image generation fidelity and reliability. We present YinYangAlign, an advanced benchmarking framework that systematically quantifies the alignment fidelity of T2I systems, addressing six fundamental and inherently contradictory design objectives. Each pair represents fundamental tensions in image generation, such as balancing adherence to user prompts with creative modifications or maintaining diversity alongside visual coherence. YinYangAlign includes detailed axiom datasets featuring human prompts, aligned (chosen) responses, misaligned (rejected) AI-generated outputs, and explanations of the underlying contradictions.


Real Time Monitoring and Forecasting of COVID 19 Cases using an Adjusted Holt based Hybrid Model embedded with Wavelet based ANN

Das, Agniva, Muralidharan, Kunnummal

arXiv.org Machine Learning

Since the inception of the SARS - CoV - 2 (COVID - 19) novel coronavirus, a lot of time and effort is being allocated to estimate the trajectory and possibly, forecast with a reasonable degree of accuracy, the number of cases, recoveries, and deaths due to the same. The model proposed in this paper is a mindful step in the same direction. The primary model in question is a Hybrid Holt's Model embedded with a Wavelet-based ANN. To test its forecasting ability, we have compared three separate models, the first, being a simple ARIMA model, the second, also an ARIMA model with a wavelet-based function, and the third, being the proposed model. We have also compared the forecast accuracy of this model with that of a modern day Vanilla LSTM recurrent neural network model. We have tested the proposed model on the number of confirmed cases (daily) for the entire country as well as 6 hotspot states. We have also proposed a simple adjustment algorithm in addition to the hybrid model so that daily and/or weekly forecasts can be meted out, with respect to the entirety of the country, as well as a moving window performance metric based on out-of-sample forecasts. In order to have a more rounded approach to the analysis of COVID-19 dynamics, focus has also been given to the estimation of the Basic Reproduction Number, $R_0$ using a compartmental epidemiological model (SIR). Lastly, we have also given substantial attention to estimating the shelf-life of the proposed model. It is obvious yet noteworthy how an accurate model, in this regard, can ensure better allocation of healthcare resources, as well as, enable the government to take necessary measures ahead of time.


Automatic location detection based on deep learning

Karangiya, Anjali, Sharma, Anirudh, Shah, Divax, Badgujar, Kartavya, Thacker, Dr. Chintan, Dave, Dainik

arXiv.org Artificial Intelligence

The proliferation of digital images and the advancements in deep learning have paved the way for innovative solutions in various domains, especially in the field of image classification. Our project presents an in-depth study and implementation of an image classification system specifically tailored to identify and classify images of Indian cities. Drawing from an extensive dataset, our model classifies images into five major Indian cities: Ahmedabad, Delhi, Kerala, Kolkata, and Mumbai to recognize the distinct features and characteristics of each city/state. To achieve high precision and recall rates, we adopted two approaches. The first, a vanilla Convolutional Neural Network (CNN) and then we explored the power of transfer learning by leveraging the VGG16 model. The vanilla CNN achieved commendable accuracy and the VGG16 model achieved a test accuracy of 63.6%. Evaluations highlighted the strengths and potential areas of improvement, positioning our model as not only competitive but also scalable for broader applications. With an emphasis on open-source ethos, our work aims to contribute to the community, encouraging further development and diverse applications. Our findings demonstrate the potential applications in tourism, urban planning, and even real-time location identification systems, among others.


A Strictly Bounded Deep Network for Unpaired Cyclic Translation of Medical Images

Rai, Swati, Bhatt, Jignesh S., Patra, Sarat Kumar

arXiv.org Artificial Intelligence

Medical image translation is an ill-posed problem. Unlike existing paired unbounded unidirectional translation networks, in this paper, we consider unpaired medical images and provide a strictly bounded network that yields a stable bidirectional translation. We propose a patch-level concatenated cyclic conditional generative adversarial network (pCCGAN) embedded with adaptive dictionary learning. It consists of two cyclically connected CGANs of 47 layers each; where both generators (each of 32 layers) are conditioned with concatenation of alternate unpaired patches from input and target modality images (not ground truth) of the same organ. The key idea is to exploit cross-neighborhood contextual feature information that bounds the translation space and boosts generalization. The generators are further equipped with adaptive dictionaries learned from the contextual patches to reduce possible degradation. Discriminators are 15-layer deep networks that employ minimax function to validate the translated imagery. A combined loss function is formulated with adversarial, non-adversarial, forward-backward cyclic, and identity losses that further minimize the variance of the proposed learning machine. Qualitative, quantitative, and ablation analysis show superior results on real CT and MRI.


An Annexure to the Paper "Driving the Technology Value Stream by Analyzing App Reviews"

Das, Souvick, Deb, Novarun, Cortesi, Agostino, Chaki, Nabendu

arXiv.org Artificial Intelligence

This paper presents a novel framework that utilizes Natural Language Processing (NLP) techniques to understand user feedback on mobile applications. The framework allows software companies to drive their technology value stream based on user reviews, which can highlight areas for improvement. The framework is analyzed in depth, and its modules are evaluated for their effectiveness. The proposed approach is demonstrated to be effective through an analysis of reviews for sixteen popular Android Play Store applications over a long period of time.


FedGrad: Optimisation in Decentralised Machine Learning

Patel, Mann

arXiv.org Artificial Intelligence

Federated Learning is a machine learning paradigm where we aim to train machine learning models in a distributed fashion. Many clients/edge devices collaborate with each other to train a single model on the central. Clients do not share their own datasets with each other, decoupling computation and data on the same device. In this paper, we propose yet another adaptive federated optimization method and some other ideas in the field of federated learning. We also perform experiments using these methods and showcase the improvement in the overall performance of federated learning.


Exploration of Interpretability Techniques for Deep COVID-19 Classification using Chest X-ray Images

Chatterjee, Soumick, Saad, Fatima, Sarasaen, Chompunuch, Ghosh, Suhita, Krug, Valerie, Khatun, Rupali, Mishra, Rahul, Desai, Nirja, Radeva, Petia, Rose, Georg, Stober, Sebastian, Speck, Oliver, Nürnberger, Andreas

arXiv.org Artificial Intelligence

The outbreak of COVID-19 has shocked the entire world with its fairly rapid spread and has challenged different sectors. One of the most effective ways to limit its spread is the early and accurate diagnosis of infected patients. Medical imaging such as X-ray and Computed Tomography (CT) combined with the potential of Artificial Intelligence (AI) plays an essential role in supporting the medical staff in the diagnosis process. Thereby, five different deep learning models (ResNet18, ResNet34, InceptionV3, InceptionResNetV2, and DenseNet161) and their Ensemble have been used in this paper to classify COVID-19, pneumoni{\ae} and healthy subjects using Chest X-Ray images. Multi-label classification was performed to predict multiple pathologies for each patient, if present. Foremost, the interpretability of each of the networks was thoroughly studied using local interpretability methods - occlusion, saliency, input X gradient, guided backpropagation, integrated gradients, and DeepLIFT, and using a global technique - neuron activation profiles. The mean Micro-F1 score of the models for COVID-19 classifications ranges from 0.66 to 0.875, and is 0.89 for the Ensemble of the network models. The qualitative results depicted the ResNets to be the most interpretable models. This research demonstrates the importance of using interpretability methods to compare different models before making the decision regarding the best-performing model.