Dehghani, Mohammad
Enhancing Readmission Prediction with Deep Learning: Extracting Biomedical Concepts from Clinical Texts
Samani, Rasoul, Dehghani, Mohammad, Shahrokh, Fahime
Hospital readmission, defined as patients being re-hospitalized shortly after discharge, is a critical concern as it impacts patient outcomes and healthcare costs. Identifying patients at risk of readmission allows for timely interventions, reducing re-hospitalization rates and overall treatment costs. This study focuses on predicting patient readmission within less than 30 days using text mining techniques applied to discharge report texts from electronic health records (EHR). Various machine learning and deep learning methods were employed to develop a classification model for this purpose. A novel aspect of this research involves leveraging the Bio-Discharge Summary Bert (BDSS) model along with principal component analysis (PCA) feature extraction to preprocess data for deep learning model input. Our analysis of the MIMIC-III dataset indicates that our approach, which combines the BDSS model with a multilayer perceptron (MLP), outperforms state-of-the-art methods. This model achieved a recall of 94% and an area under the curve (AUC) of 75%, showcasing its effectiveness in predicting patient readmissions. This study contributes to the advancement of predictive modeling in healthcare by integrating text mining techniques with deep learning algorithms to improve patient outcomes and optimize resource allocation.
Dental Severity Assessment through Few-shot Learning and SBERT Fine-tuning
Dehghani, Mohammad
Dental diseases have a significant impact on a considerable portion of the population, leading to various health issues that can detrimentally affect individuals' overall well-being. The integration of automated systems in oral healthcare has become increasingly crucial. Machine learning approaches offer a viable solution to address challenges such as diagnostic difficulties, inefficiencies, and errors in oral disease diagnosis. These methods prove particularly useful when physicians struggle to predict or diagnose diseases at their early stages. In this study, thirteen different machine learning, deep learning, and large language models were employed to determine the severity level of oral health issues based on radiologists' reports. The results revealed that the Few-shot learning with SBERT and Multi-Layer Perceptron model outperformed all other models across various experiments, achieving an impressive accuracy of 94.1% as the best result. Consequently, this model exhibits promise as a reliable tool for evaluating the severity of oral diseases, enabling patients to receive more effective treatment and aiding healthcare professionals in making informed decisions regarding resource allocation and the management of high-risk patients. The incidence of periodontitis and dental caries has witnessed a surge in recent years among the human population, highlighting the pressing need for early detection to prevent severe complications and tooth loss [1]. Dental caries is a significant health concern affecting both children and adults in most industrialized nations [2]. Its impact is felt throughout an individual's lifetime, leading to pain, discomfort, and oral deformities.
A comprehensive cross-language framework for harmful content detection with the aid of sentiment analysis
Dehghani, Mohammad
In today's digital world, social media plays a significant role in facilitating communication and content sharing. However, the exponential rise in user-generated content has led to challenges in maintaining a respectful online environment. In some cases, users have taken advantage of anonymity in order to use harmful language, which can negatively affect the user experience and pose serious social problems. Recognizing the limitations of manual moderation, automatic detection systems have been developed to tackle this problem. Nevertheless, several obstacles persist, including the absence of a universal definition for harmful language, inadequate datasets across languages, the need for detailed annotation guideline, and most importantly, a comprehensive framework. This study aims to address these challenges by introducing, for the first time, a detailed framework adaptable to any language. This framework encompasses various aspects of harmful language detection. A key component of the framework is the development of a general and detailed annotation guideline. Additionally, the integration of sentiment analysis represents a novel approach to enhancing harmful language detection. Also, a definition of harmful language based on the review of different related concepts is presented. To demonstrate the effectiveness of the proposed framework, its implementation in a challenging low-resource language is conducted. We collected a Persian dataset and applied the annotation guideline for harmful detection and sentiment analysis. Next, we present baseline experiments utilizing machine and deep learning methods to set benchmarks. Results prove the framework's high performance, achieving an accuracy of 99.4% in offensive language detection and 66.2% in sentiment analysis.
Persian Typographical Error Type Detection Using Deep Neural Networks on Algorithmically-Generated Misspellings
Dehghani, Mohammad, Faili, Heshaam
Spelling correction is a remarkable challenge in the field of natural language processing. The objective of spelling correction tasks is to recognize and rectify spelling errors automatically. The development of applications that can effectually diagnose and correct Persian spelling and grammatical errors has become more important in order to improve the quality of Persian text. The Typographical Error Type Detection in Persian is a relatively understudied area. Therefore, this paper presents a compelling approach for detecting typographical errors in Persian texts. Our work includes the presentation of a publicly available dataset called FarsTypo, which comprises 3.4 million words arranged in chronological order and tagged with their corresponding part-of-speech. These words cover a wide range of topics and linguistic styles. We develop an algorithm designed to apply Persian-specific errors to a scalable portion of these words, resulting in a parallel dataset of correct and incorrect words. By leveraging FarsTypo, we establish a strong foundation and conduct a thorough comparison of various methodologies employing different architectures. Additionally, we introduce a groundbreaking Deep Sequential Neural Network that utilizes both word and character embeddings, along with bidirectional LSTM layers, for token classification aimed at detecting typographical errors across 51 distinct classes. Our approach is contrasted with highly advanced industrial systems that, unlike this study, have been developed using a diverse range of resources. The outcomes of our final method proved to be highly competitive, achieving an accuracy of 97.62%, precision of 98.83%, recall of 98.61%, and surpassing others in terms of speed.
Deep Neural Decision Forest: A Novel Approach for Predicting Recovery or Decease of COVID-19 Patients with Clinical and RT-PCR
Dehghani, Mohammad, Yazdanparast, Zahra, Samani, Rasoul
COVID-19 continues to be considered an endemic disease in spite of the World Health Organization's declaration that the pandemic is over. This pandemic has disrupted people's lives in unprecedented ways and caused widespread morbidity and mortality. As a result, it is important for emergency physicians to identify patients with a higher mortality risk in order to prioritize hospital equipment, especially in areas with limited medical services. The collected data from patients is beneficial to predict the outcome of COVID-19 cases, although there is a question about which data makes the most accurate predictions. Therefore, this study aims to accomplish two main objectives. First, we want to examine whether deep learning algorithms can predict a patient's morality. Second, we investigated the impact of Clinical and RT-PCR on prediction to determine which one is more reliable. We defined four stages with different feature sets and used interpretable deep learning methods to build appropriate model. Based on results, the deep neural decision forest performed the best across all stages and proved its capability to predict the recovery and death of patients. Additionally, results indicate that Clinical alone (without the use of RT-PCR) is the most effective method of diagnosis, with an accuracy of 80%. It is important to document and understand experiences from the COVID-19 pandemic in order to aid future medical efforts. This study can provide guidance for medical professionals in the event of a crisis or outbreak similar to COVID-19. Keywords: Machine Learning, Deep Learning, Deep Neural Decision Forest, COVID-19, Polymerase Chain Reaction, RT-PCR. 1. Introduction COVID-19 was first observed as a deadly illness in the Wuhan region of China in 2019. It was highly contagious and spread rapidly through direct contact with an infected individual [1].
A Survey From Distributed Machine Learning to Distributed Deep Learning
Dehghani, Mohammad, Yazdanparast, Zahra
Artificial intelligence has made remarkable progress in handling complex tasks, thanks to advances in hardware acceleration and machine learning algorithms. However, to acquire more accurate outcomes and solve more complex issues, algorithms should be trained with more data. Processing this huge amount of data could be time-consuming and require a great deal of computation. To address these issues, distributed machine learning has been proposed, which involves distributing the data and algorithm across several machines. There has been considerable effort put into developing distributed machine learning algorithms, and different methods have been proposed so far. We divide these algorithms in classification and clustering (traditional machine learning), deep learning and deep reinforcement learning groups. Distributed deep learning has gained more attention in recent years and most of the studies have focused on this approach. Therefore, we mostly concentrate on this category. Based on the investigation of the mentioned algorithms, we highlighted the limitations that should be addressed in future research. Keywords: Artificial intelligence, Machine learning, Distributed machine learning, Distributed deep learning, Ditributed reinforcement learning, Data-parallelism, Model-parallelism. Introduction Artificial intelligence (AI) is a rapidly developing field that uses knowledge to simulate human behaviors (1) and train computers to learn, make judgments, and make decisions similarly to humans (2, 3). In other words, AI involves developing techniques and algorithms that are capable of thinking, acting, and implementing tasks using protocols that are otherwise beyond human comprehension (4). Machine learning (ML) is a subset of AI that learns from historical data, without being explicitly programmed (5). ML algorithms can be used to analyze data and build data-driven systems, including classification, clustering, regression, association rule mining, and reinforcement learning (6, 7). Deep learning is a branch of machine learning that uses artificial neural networks to intelligently analyze large amounts of data (8, 9).
Discovering the Symptom Patterns of COVID-19 from Recovered and Deceased Patients Using Apriori Association Rule Mining
Dehghani, Mohammad, Yazdanparast, Zahra
The COVID-19 pandemic has a devastating impact globally, claiming millions of lives and causing significant social and economic disruptions. In order to optimize decision-making and allocate limited resources, it is essential to identify COVID-19 symptoms and determine the severity of each case. Machine learning algorithms offer a potent tool in the medical field, particularly in mining clinical datasets for useful information and guiding scientific decisions. Association rule mining is a machine learning technique for extracting hidden patterns from data. This paper presents an application of association rule mining based Apriori algorithm to discover symptom patterns from COVID-19 patients. The study, using 2875 patient's records, identified the most common signs and symptoms as apnea (72%), cough (64%), fever (59%), weakness (18%), myalgia (14.5%), and sore throat (12%). The proposed method provides clinicians with valuable insight into disease that can assist them in managing and treating it effectively.
Political Sentiment Analysis of Persian Tweets Using CNN-LSTM Model
Dehghani, Mohammad, Yazdanparast, Zahra
Sentiment analysis is the process of identifying and categorizing people's emotions or opinions regarding various topics. The analysis of Twitter sentiment has become an increasingly popular topic in recent years. In this paper, we present several machine learning and a deep learning model to analysis sentiment of Persian political tweets. Our analysis was conducted using Bag of Words and ParsBERT for word representation. We applied Gaussian Naive Bayes, Gradient Boosting, Logistic Regression, Decision Trees, Random Forests, as well as a combination of CNN and LSTM to classify the polarities of tweets. The results of this study indicate that deep learning with ParsBERT embedding performs better than machine learning. The CNN-LSTM model had the highest classification accuracy with 89 percent on the first dataset and 71 percent on the second dataset. Due to the complexity of Persian, it was a difficult task to achieve this level of efficiency. The main objective of our research was to reduce the training time while maintaining the model's performance. As a result, several adjustments were made to the model architecture and parameters. In addition to achieving the objective, the performance was slightly improved as well.
BioBERT Based SNP-traits Associations Extraction from Biomedical Literature
Dehghani, Mohammad, Bokharaeian, Behrouz, Yazdanparast, Zahra
Scientific literature contains a considerable amount of information that provides an excellent opportunity for developing text mining methods to extract biomedical relationships. An important type of information is the relationship between singular nucleotide polymorphisms (SNP) and traits. In this paper, we present a BioBERT-GRU method to identify SNP- traits associations. Based on the evaluation of our method on the SNPPhenA dataset, it is concluded that this new method performs better than previous machine learning and deep learning based methods. BioBERT-GRU achieved the result a precision of 0.883, recall of 0.882 and F1-score of 0.881.
Artificial intelligence for Sustainable Energy: A Contextual Topic Modeling and Content Analysis
Saheb, Tahereh, Dehghani, Mohammad
Parallel to the rising debates over sustainable energy and artificial intelligence solutions, the world is currently discussing the ethics of artificial intelligence and its possible negative effects on society and the environment. In these arguments, sustainable AI is proposed, which aims at advancing the pathway toward sustainability, such as sustainable energy. In this paper, we offered a novel contextual topic modeling combining LDA, BERT, and Clustering. We then combined these computational analyses with content analysis of related scientific publications to identify the main scholarly topics, sub-themes, and cross-topic themes within scientific research on sustainable AI in energy. Our research identified eight dominant topics including sustainable buildings, AI-based DSSs for urban water management, climate artificial intelligence, Agriculture 4, the convergence of AI with IoT, AI-based evaluation of renewable technologies, smart campus and engineering education, and AI-based optimization. We then recommended 14 potential future research strands based on the observed theoretical gaps. Theoretically, this analysis contributes to the existing literature on sustainable AI and sustainable energy, and practically, it intends to act as a general guide for energy engineers and scientists, AI scientists, and social scientists to widen their knowledge of sustainability in AI and energy convergence research.