Goto

Collaborating Authors

 Nepal


Cost-effective Deep Learning Infrastructure with NVIDIA GPU

arXiv.org Artificial Intelligence

The growing demand for computational power is driven by advancements in deep learning, the increasing need for big data processing, and the requirements of scientific simulations for academic and research purposes. Developing countries like Nepal often struggle with the resources needed to invest in new and better hardware for these purposes. However, optimizing and building on existing technology can still meet these computing demands effectively. To address these needs, we built a cluster using four NVIDIA GeForce GTX 1650 GPUs. The cluster consists of four nodes: one master node that controls and manages the entire cluster, and three compute nodes dedicated to processing tasks. The master node is equipped with all necessary software for package management, resource scheduling, and deployment, such as Anaconda and Slurm. In addition, a Network File Storage (NFS) system was integrated to provide the additional storage required by the cluster. Given that the cluster is accessible via ssh by a public domain address, which poses significant cybersecurity risks, we implemented fail2ban to mitigate brute force attacks and enhance security. Despite the continuous challenges encountered during the design and implementation process, this project demonstrates how powerful computational clusters can be built to handle resource-intensive tasks in various demanding fields.


Do Large Language Model Benchmarks Test Reliability?

arXiv.org Artificial Intelligence

When deploying large language models (LLMs), it is important to ensure that these models are not only capable, but also reliable. Many benchmarks have been created to track LLMs' growing capabilities, however there has been no similar focus on measuring their reliability. To understand the potential ramifications of this gap, we investigate how well current benchmarks quantify model reliability. We find that pervasive label errors can compromise these evaluations, obscuring lingering model failures and hiding unreliable behavior. Motivated by this gap in the evaluation of reliability, we then propose the concept of so-called platinum benchmarks, i.e., benchmarks carefully curated to minimize label errors and ambiguity. As a first attempt at constructing such benchmarks, we revise examples from fifteen existing popular benchmarks. We evaluate a wide range of models on these platinum benchmarks and find that, indeed, frontier LLMs still exhibit failures on simple tasks such as elementary-level math word problems. Analyzing these failures further reveals previously unidentified patterns of problems on which frontier models consistently struggle. We provide code at https://github.com/MadryLab/platinum-benchmarks


A review on development of eco-friendly filters in Nepal for use in cigarettes and masks and Air Pollution Analysis with Machine Learning and SHAP Interpretability

arXiv.org Artificial Intelligence

In Nepal, air pollution is a serious public health concern, especially in cities like Kathmandu where particulate matter (PM2.5 and PM10) has a major influence on respiratory health and air quality. The Air Quality Index (AQI) is predicted in this work using a Random Forest Regressor, and the model's predictions are interpreted using SHAP (SHapley Additive exPlanations) analysis. With the lowest Testing RMSE (0.23) and flawless R2 scores (1.00), CatBoost performs better than other models, demonstrating its greater accuracy and generalization which is cross validated using a nested cross validation approach. NowCast Concentration and Raw Concentration are the most important elements influencing AQI values, according to SHAP research, which shows that the machine learning results are highly accurate. Their significance as major contributors to air pollution is highlighted by the fact that high values of these characteristics significantly raise the AQI. This study investigates the Hydrogen-Alpha (HA) biodegradable filter as a novel way to reduce the related health hazards. With removal efficiency of more than 98% for PM2.5 and 99.24% for PM10, the HA filter offers exceptional defense against dangerous airborne particles. These devices, which are biodegradable face masks and cigarette filters, address the environmental issues associated with traditional filters' non-biodegradable trash while also lowering exposure to air contaminants.


EEG-based AI-BCI Wheelchair Advancement: A Brain-Computer Interfacing Wheelchair System Using Deep Learning Approach

arXiv.org Artificial Intelligence

Abstract: This study offers a revolutionary strategy to developing wheelchairs based on the Brain-Computer Interface (BCI) that incorporates Artificial Intelligence (AI) using a The device uses electroencephalogram (EEG) data to mimic wheelchair navigation. Five different models were trained on a pre-filtered dataset that was divided into fixed-length windows using a sliding window technique. Each window contained statistical measurements, FFT coefficients for different frequency bands, and a label identifying the activity carried out during that window that was taken from an open-source Kaggle repository. The XGBoost model outperformed the other models, CatBoost, GRU, SVC, and XGBoost, with an accuracy of 60%. The CatBoost model with a major difference between training and testing accuracy shows overfitting, and similarly, the bestperforming model, with SVC, was implemented in a tkinter GUI. The wheelchair movement could be simulated in various directions, and a Raspberry Pi-powered wheelchair system for braincomputer interface is proposed here. Keywords: Brain Computer Interfacing, FFT (Fast Fourier Transform), Raspberry-pi, electroencephalogram 1. Introduction Brain-Computer Interfaces (BCIs) represent a cutting-edge technology that facilitates direct communication between the human brain and external devices. In recent years, BCIs have been widely explored for assisting individuals with mobility impairments. This paper focuses on a novel BCI-based wheelchair control system that leverages EEG signals associated with control using various movements related dataset. The system incorporates various machine learning models with various optimization techniques for hyper-parameter tuning and finally, shows an attention mechanism for enhancing the performance of Bi-directional Long Short-Term Memory (Bi-LSTM) networks, which are employed for EEG signal classification. To integrate the braincomputer interface (BCI) for the wheelchair, an analysis of brain activity is necessary-based on modern technology. The signs of brain activity can be obtained using a variety of techniques [1]. In order to help people with severe disabilities live their daily lives, new aids, gadgets, and assistive technologies are required, as demonstrated by the pandemic emergency of the coronavirus illness 2019 (COVID-19). Brain-Computer Interfaces (BCIs) that use electroencephalography (EEG) can help people who experience major health issues become more independent and participate in activities more easily. This can improve their general well-being and prevent deficits [2].


Electrocardiogram (ECG) Based Cardiac Arrhythmia Detection and Classification using Machine Learning Algorithms

arXiv.org Artificial Intelligence

The rapid advancements in Artificial Intelligence, specifically Machine Learning (ML) and Deep Learning (DL), have opened new prospects in medical sciences for improved diagnosis, prognosis, and treatment of severe health conditions. This paper focuses on the development of an ML model with high predictive accuracy to classify arrhythmic electrocardiogram (ECG) signals. The ECG signals datasets utilized in this study were sourced from the PhysioNet and MIT-BIH databases. The research commenced with binary classification, where an optimized Bidirectional Long Short-Term Memory (Bi-LSTM) model yielded excellent results in differentiating normal and atrial fibrillation signals. A pivotal aspect of this research was a survey among medical professionals, which not only validated the practicality of AI-based ECG classifiers but also identified areas for improvement, including accuracy and the inclusion of more arrhythmia types. These insights drove the development of an advanced Convolutional Neural Network (CNN) system capable of classifying five different types of ECG signals with better accuracy and precision. The CNN model's robust performance was ensured through rigorous stratified 5-fold cross validation. A web portal was also developed to demonstrate real-world utility, offering access to the trained model for real-time classification. This study highlights the potential applications of such models in remote health monitoring, predictive healthcare, assistive diagnostic tools, and simulated environments for educational training and interdisciplinary collaboration between data scientists and medical personnel.


Mero Nagarikta: Advanced Nepali Citizenship Data Extractor with Deep Learning-Powered Text Detection and OCR

arXiv.org Artificial Intelligence

Transforming text-based identity documents, such as Nepali citizenship cards, into a structured digital format poses several challenges due to the distinct characteristics of the Nepali script and minor variations in print alignment and contrast across different cards. This work proposes a robust system using YOLOv8 for accurate text object detection and an OCR algorithm based on Optimized PyTesseract. The system, implemented within the context of a mobile application, allows for the automated extraction of important textual information from both the front and the back side of Nepali citizenship cards, including names, citizenship numbers, and dates of birth. The final YOLOv8 model was accurate, with a mean average precision of 99.1% for text detection on the front and 96.1% on the back. The tested PyTesseract optimized for Nepali characters outperformed the standard OCR regarding flexibility and accuracy, extracting text from images with clean and noisy backgrounds and various contrasts. Using preprocessing steps such as converting the images into grayscale, removing noise from the images, and detecting edges further improved the system's OCR accuracy, even for low-quality photos. This work expands the current body of research in multilingual OCR and document analysis, especially for low-resource languages such as Nepali. It emphasizes the effectiveness of combining the latest object detection framework with OCR models that have been fine-tuned for practical applications.


Deep learning-based fault identification in condition monitoring

arXiv.org Artificial Intelligence

Vibration-based condition monitoring techniques are commonly used to identify faults in rolling element bearings. Accuracy and speed of fault detection procedures are critical performance measures in condition monitoring. Delay is especially important in remote condition monitoring and time-sensitive industrial applications. While most existing methods focus on accuracy, little attention has been given to the inference time in the fault identification process. In this paper, we address this gap by presenting a Convolutional Neural Network (CNN) based approach for real-time fault identification in rolling element bearings. We encode raw vibration signals into two-dimensional images using various encoding methods and use these with a CNN to classify several categories of bearing fault types and sizes. We analyse the interplay between fault identification accuracy and processing time. For training and evaluation we use a bearing failure CWRU dataset.


Design and Performance Comparison of FuzzyPID and Non-linear Model Predictive Controller for 4-Wheel Omni-drive Robot

arXiv.org Artificial Intelligence

Trajectory tracking for an Omni-drive robot presents a challenging task that demands an efficient controller design. To address the limitations of manual tuning, we introduce a self-optimizing controller named fuzzyPID, leveraging the analysis of responses from various dynamic and static systems. The rule-based controller design is implemented using Matlab/Simulink, and trajectory tracking simulations are conducted within the CoppeliaSim environment. Similarly, a non-linear model predictive controller(NMPC) is proposed to compare tracking performance with fuzzyPID. We also assess the impact of tunable parameters of NMPC on its tracking accuracy. Simulation results validate the precision and effectiveness of NMPC over fuzzyPID controller while trading computational complexity.


Prospect Personalized Recommendation on Large Language Model-based Agent Platform

arXiv.org Artificial Intelligence

The new kind of Agent-oriented information system, exemplified by GPTs, urges us to inspect the information system infrastructure to support Agent-level information processing and to adapt to the characteristics of Large Language Model (LLM)-based Agents, such as interactivity. In this work, we envisage the prospect of the recommender system on LLM-based Agent platforms and introduce a novel recommendation paradigm called Rec4Agentverse, comprised of Agent Items and Agent Recommender. Rec4Agentverse emphasizes the collaboration between Agent Items and Agent Recommender, thereby promoting personalized information services and enhancing the exchange of information beyond the traditional user-recommender feedback loop. Additionally, we prospect the evolution of Rec4Agentverse and conceptualize it into three stages based on the enhancement of the interaction and information exchange among Agent Items, Agent Recommender, and the user. A preliminary study involving several cases of Rec4Agentverse validates its significant potential for application. Lastly, we discuss potential issues and promising directions for future research.


A Comprehensive Study of the Current State-of-the-Art in Nepali Automatic Speech Recognition Systems

arXiv.org Artificial Intelligence

In this paper, we examine the research conducted in the field of Nepali Automatic Speech Recognition (ASR). The primary objective of this survey is to conduct a comprehensive review of the works on Nepali Automatic Speech Recognition Systems completed to date, explore the different datasets used, examine the technology utilized, and take account of the obstacles encountered in implementing the Nepali ASR system. In tandem with the global trends of ever-increasing research on speech recognition based research, the number of Nepalese ASR-related projects are also growing. Nevertheless, the investigation of language and acoustic models of the Nepali language has not received adequate attention compared to languages that possess ample resources. In this context, we provide a framework as well as directions for future investigations.