Goto

Collaborating Authors

 Faridabad


Enhancing Machine Learning Model Efficiency through Quantization and Bit Depth Optimization: A Performance Analysis on Healthcare Data

arXiv.org Artificial Intelligence

This research aims to optimize intricate learning models by implementing quantization and bit-depth optimization techniques. The objective is to significantly cut time complexity while preserving model efficiency, thus addressing the challenge of extended execution times in intricate models. Two medical datasets were utilized as case studies to apply a Logistic Regression (LR) machine learning model. Using efficient quantization and bit depth optimization strategies the input data is downscaled from float64 to float32 and int32. The results demonstrated a significant reduction in time complexity, with only a minimal decrease in model accuracy post-optimization, showcasing the state-of-the-art optimization approach. This comprehensive study concludes that the impact of these optimization techniques varies depending on a set of parameters.


Theory: Multidimensional Space of Events

arXiv.org Machine Learning

This paper extends Bayesian probability theory by developing a multidimensional space of events (MDSE) theory that accounts for mutual influences between events and hypotheses sets. While traditional Bayesian approaches assume conditional independence between certain variables, real-world systems often exhibit complex interdependencies that limit classical model applicability. Building on established probabilistic foundations, our approach introduces a mathematical formalism for modeling these complex relationships. We developed the MDSE theory through rigorous mathematical derivation and validated it using three complementary methodologies: analytical proofs, computational simulations, and case studies drawn from diverse domains. Results demonstrate that MDSE successfully models complex dependencies with 15-20% improved prediction accuracy compared to standard Bayesian methods when applied to datasets with high interdimensionality. This theory particularly excels in scenarios with over 50 interrelated variables, where traditional methods show exponential computational complexity growth while MDSE maintains polynomial scaling. Our findings indicate that MDSE provides a viable mathematical foundation for extending Bayesian reasoning to complex systems while maintaining computational tractability. This approach offers practical applications in engineering challenges including risk assessment, resource optimization, and forecasting problems where multiple interdependent factors must be simultaneously considered.


Co-Design of a Robot Controller Board and Indoor Positioning System for IoT-Enabled Applications

arXiv.org Artificial Intelligence

Abstract--This paper describes the development of a costeffective yet precise indoor robot navigation system composed of a custom robot controller board and an indoor positioning system. First, the proposed robot controller board has been specially designed for emerging IoT-based robot applications and is capable of driving two 6-Amp motor channels. Then, working together with the robot controller board, the proposed positioning system detects the robot's location using a down-looking webcam and uses the robot's position on the webcam images to estimate the real-world position of the robot in the environment. The positioning system can then send commands via WIFI to the robot in order to steer it to any arbitrary location in the environment. Our experiments show that the proposed system reaches a navigation error smaller or equal to 0.125 meters while being more than two orders of magnitude more cost-effective compared to off-the-shelve motion capture (MOCAP) positioning systems.


Semantic Search and Recommendation Algorithm

arXiv.org Artificial Intelligence

Abstract--This paper details the development of a novel semantic search algorithm utilizing Word2Vec and Annoy Index to efficiently process and retrieve information from large datasets. Addressing traditional search algorithms' limitations, our proposed method demonstrates significant improvements in speed, accuracy, and scalability, validated by rigorous testing on datasets up to 100GB. In the era of big data, efficiently retrieving relevant information from vast, unstructured datasets is crucial across numerous domains such as e-commerce, healthcare, research, and public administration. Traditional search engines, which rely primarily on keyword matching, often struggle with the inherent complexity and ambiguity of natural language. These systems lack the ability to understand the semantic meaning and context of queries, leading to inaccurate results and suboptimal user experiences. The evolution of semantic search technologies aims to address these limitations by focusing on understanding the in high-dimensional space.


Depression detection from Social Media Bangla Text Using Recurrent Neural Networks

arXiv.org Artificial Intelligence

Mostofa Akbar Department of CSE Bangladesh University of Engineering & T echnology Dhaka, Bangladesh mostofa@cse.buet.ac.bd Abstract --Emotion artificial intelligence is a field of study that focuses on figuring out how to recognize emotions, especially in the area of text mining. T oday is the age of social media which has opened a door for us to share our individual expressions, emotions, and perspectives on any event. We can analyze sentiment on social media posts to detect positive, negative, or emotional behavior toward society. One of the key challenges in sentiment analysis is to identify depressed text from social media text that is a root cause of mental ill-health. Furthermore, depression leads to severe impairment in day-to-day living and is a major source of suicide incidents. In this paper, we apply natural language processing techniques on Facebook texts for conducting emotion analysis focusing on depression using multiple machine learning algorithms. Preprocessing steps like stemming, stop word removal, etc. are used to clean the collected data, and feature extraction techniques like stylometric feature, TF-IDF, word embedding, etc. are applied to the collected dataset which consists of 983 texts collected from social media posts. In the process of class prediction, LSTM, GRU, support vector machine, and Naive-Bayes classifiers have been used. We have presented the results using the primary classification metrics including F1-score, and accuracy. This work focuses on depression detection from social media posts to help psychologists to analyze sentiment from shared posts which may reduce the undesirable behaviors of depressed individuals through diagnosis and treatment. I NTRODUCTION Text is the most important means of communication in today's world. Popular online social networking sites such as Facebook, Twitter, MySpace, etc. are mainly text-based. The rapid growth of Social Media has created enough opportunities to share information across time and space. Users are now comfortable contributing more to the content of social media websites and posting their own material. The emergence of internet-based media sources has resulted in the availability of substantial user data for the emotional analysis of text and images.


Comprehensive Monitoring of Air Pollution Hotspots Using Sparse Sensor Networks

arXiv.org Artificial Intelligence

Urban air pollution hotspots pose significant health risks, yet their detection and analysis remain limited by the sparsity of public sensor networks. This paper addresses this challenge by combining predictive modeling and mechanistic approaches to comprehensively monitor pollution hotspots. We enhanced New Delhi's existing sensor network with 28 low-cost sensors, collecting PM2.5 data over 30 months from May 1, 2018, to Nov 1, 2020. Applying established definitions of hotspots to this data, we found the existence of additional 189 hidden hotspots apart from confirming 660 hotspots detected by the public network. Using predictive techniques like Space-Time Kriging, we identified hidden hotspots with 95% precision and 88% recall with 50% sensor failure rate, and with 98% precision and 95% recall with 50% missing sensors. The projected results of our predictive models were further compiled into policy recommendations for public authorities. Additionally, we developed a Gaussian Plume Dispersion Model to understand the mechanistic underpinnings of hotspot formation, incorporating an emissions inventory derived from local sources. Our mechanistic model is able to explain 65% of observed transient hotspots. Our findings underscore the importance of integrating data-driven predictive models with physics-based mechanistic models for scalable and robust air pollution management in resource-constrained settings.


Navigating Process Mining: A Case study using pm4py

arXiv.org Artificial Intelligence

Process-mining techniques have emerged as powerful tools for analyzing event data to gain insights into business processes. In this paper, we present a comprehensive analysis of road traffic fine management processes using the pm4py library in Python. We start by importing an event log dataset and explore its characteristics, including the distribution of activities and process variants. Through filtering and statistical analysis, we uncover key patterns and variations in the process executions. Subsequently, we apply various process-mining algorithms, including the Alpha Miner, Inductive Miner, and Heuristic Miner, to discover process models from the event log data. We visualize the discovered models to understand the workflow structures and dependencies within the process. Additionally, we discuss the strengths and limitations of each mining approach in capturing the underlying process dynamics. Our findings shed light on the efficiency and effectiveness of road traffic fine management processes, providing valuable insights for process optimization and decision-making. This study demonstrates the utility of pm4py in facilitating process mining tasks and its potential for analyzing real-world business processes.


Generation and De-Identification of Indian Clinical Discharge Summaries using LLMs

arXiv.org Artificial Intelligence

The consequences of a healthcare data breach can be devastating for the patients, providers, and payers. The average financial impact of a data breach in recent months has been estimated to be close to USD 10 million. This is especially significant for healthcare organizations in India that are managing rapid digitization while still establishing data governance procedures that align with the letter and spirit of the law. Computer-based systems for de-identification of personal information are vulnerable to data drift, often rendering them ineffective in cross-institution settings. Therefore, a rigorous assessment of existing de-identification against local health datasets is imperative to support the safe adoption of digital health initiatives in India. Using a small set of de-identified patient discharge summaries provided by an Indian healthcare institution, in this paper, we report the nominal performance of de-identification algorithms (based on language models) trained on publicly available non-Indian datasets, pointing towards a lack of cross-institutional generalization. Similarly, experimentation with off-the-shelf de-identification systems reveals potential risks associated with the approach. To overcome data scarcity, we explore generating synthetic clinical reports (using publicly available and Indian summaries) by performing in-context learning over Large Language Models (LLMs). Our experiments demonstrate the use of generated reports as an effective strategy for creating high-performing de-identification systems with good generalization capabilities.


Integrating supervised and unsupervised learning approaches to unveil critical process inputs

arXiv.org Artificial Intelligence

This study introduces a machine learning framework tailored to large-scale industrial processes characterized by a plethora of numerical and categorical inputs. The framework aims to (i) discern critical parameters influencing the output and (ii) generate accurate out-of-sample qualitative and quantitative predictions of production outcomes. Specifically, we address the pivotal question of the significance of each input in shaping the process outcome, using an industrial Chemical Vapor Deposition (CVD) process as an example. The initial objective involves merging subject matter expertise and clustering techniques exclusively on the process output, here, coating thickness measurements at various positions in the reactor. This approach identifies groups of production runs that share similar qualitative characteristics, such as film mean thickness and standard deviation. In particular, the differences of the outcomes represented by the different clusters can be attributed to differences in specific inputs, indicating that these inputs are critical for the production outcome. Leveraging this insight, we subsequently implement supervised classification and regression methods using the identified critical process inputs. The proposed methodology proves to be valuable in scenarios with a multitude of inputs and insufficient data for the direct application of deep learning techniques, providing meaningful insights into the underlying processes.


Predicting Lung Disease Severity via Image-Based AQI Analysis using Deep Learning Techniques

arXiv.org Artificial Intelligence

Air pollution is a significant health concern worldwide, contributing to various respiratory diseases. Advances in air quality mapping, driven by the emergence of smart cities and the proliferation of Internet-of-Things sensor devices, have led to an increase in available data, fueling momentum in air pollution forecasting. The objective of this study is to devise an integrated approach for predicting air quality using image data and subsequently assessing lung disease severity based on Air Quality Index (AQI).The aim is to implement an integrated approach by refining existing techniques to improve accuracy in predicting AQI and lung disease severity. The study aims to forecast additional atmospheric pollutants like AQI, PM10, O3, CO, SO2, NO2 in addition to PM2.5 levels. Additionally, the study aims to compare the proposed approach with existing methods to show its effectiveness. The approach used in this paper uses VGG16 model for feature extraction in images and neural network for predicting AQI.In predicting lung disease severity, Support Vector Classifier (SVC) and K-Nearest Neighbors (KNN) algorithms are utilized. The neural network model for predicting AQI achieved training accuracy of 88.54 % and testing accuracy of 87.44%,which was measured using loss function, while the KNN model used for predicting lung disease severity achieved training accuracy of 98.4% and testing accuracy of 97.5% In conclusion, the integrated approach presented in this study forecasts air quality and evaluates lung disease severity, achieving high testing accuracies of 87.44% for AQI and 97.5% for lung disease severity using neural network, KNN, and SVC models. The future scope involves implementing transfer learning and advanced deep learning modules to enhance prediction capabilities. While the current study focuses on India, the objective is to expand its scope to encompass global coverage.