Goto

Collaborating Authors

 Accuracy


Automatic pain recognition from Blood Volume Pulse (BVP) signal using machine learning techniques

arXiv.org Artificial Intelligence

Physiological responses to pain have received increasing attention among researchers for developing an automated pain recognition sensing system. Though less explored, Blood Volume Pulse (BVP) is one of the candidate physiological measures that could help objective pain assessment. In this study, we applied machine learning techniques on BVP signals to device a non-invasive modality for pain sensing. Thirty-two healthy subjects participated in this study. First, we investigated a novel set of time-domain, frequency-domain and nonlinear dynamics features that could potentially be sensitive to pain. These include 24 features from BVP signals and 20 additional features from Inter-beat Intervals (IBIs) derived from the same BVP signals. Utilizing these features, we built machine learning models for detecting the presence of pain and its intensity. We explored different machine learning models, including Logistic Regression, Random Forest, Support Vector Machines, Adaptive Boosting (AdaBoost) and Extreme Gradient Boosting (XGBoost). Among them, we found that the XGBoost offered the best model performance for both pain classification and pain intensity estimation tasks. The ROC-AUC of the XGBoost model to detect low pain, medium pain and high pain with no pain as the baseline were 80.06 %, 85.81 %, and 90.05 % respectively. Moreover, the XGboost classifier distinguished medium pain from high pain with ROC-AUC of 91%. For the multi-class classification among three pain levels, the XGBoost offered the best performance with an average F1-score of 80.03%. Our results suggest that BVP signal together with machine learning algorithms is a promising physiological measurement for automated pain assessment. This work will have a national impact on accurate pain assessment, effective pain management, reducing drug-seeking behavior among patients, and addressing national opioid crisis.


A hybrid CNN-RNN approach for survival analysis in a Lung Cancer Screening study

arXiv.org Artificial Intelligence

In this study, we present a hybrid CNN-RNN approach to investigate long-term survival of subjects in a lung cancer screening study. Subjects who died of cardiovascular and respiratory causes were identified whereby the CNN model was used to capture imaging features in the CT scans and the RNN model was used to investigate time series and thus global information. The models were trained on subjects who underwent cardiovascular and respiratory deaths and a control cohort matched to participant age, gender, and smoking history. The combined model can achieve an AUC of 0.76 which outperforms humans at cardiovascular mortality prediction. The corresponding F1 and Matthews Correlation Coefficient are 0.63 and 0.42 respectively. The generalisability of the model is further validated on an 'external' cohort. The same models were applied to survival analysis with the Cox Proportional Hazard model. It was demonstrated that incorporating the follow-up history can lead to improvement in survival prediction. The Cox neural network can achieve an IPCW C-index of 0.75 on the internal dataset and 0.69 on an external dataset. Delineating imaging features associated with long-term survival can help focus preventative interventions appropriately, particularly for under-recognised pathologies thereby potentially reducing patient morbidity.


Extracting Incidents, Effects, and Requested Advice from MeToo Posts

arXiv.org Artificial Intelligence

Survivors of sexual harassment frequently share their experiences on social media, revealing their feelings and emotions and seeking advice. We observed that on Reddit, survivors regularly share long posts that describe a combination of (i) a sexual harassment incident, (ii) its effect on the survivor, including their feelings and emotions, and (iii) the advice being sought. We term such posts MeToo posts, even though they may not be so tagged and may appear in diverse subreddits. A prospective helper (such as a counselor or even a casual reader) must understand a survivor's needs from such posts. But long posts can be time-consuming to read and respond to. Accordingly, we address the problem of extracting key information from a long MeToo post. We develop a natural language-based model to identify sentences from a post that describe any of the above three categories. On ten-fold cross-validation of a dataset, our model achieves a macro F1 score of 0.82. In addition, we contribute MeThree, a dataset comprising 8,947 labeled sentences extracted from Reddit posts. We apply the LIWC-22 toolkit on MeThree to understand how different language patterns in sentences of the three categories can reveal differences in emotional tone, authenticity, and other aspects.


Deep learning for drug response prediction in cancer

#artificialintelligence

Predicting the sensitivity of tumors to specific anti-cancer treatments is a challenge of paramount importance for precision medicine. Machine learning(ML) algorithms can be trained on high-throughput screening data to develop models that are able to predict the response of cancer cell lines and patients to novel drugs or drug combinations. Deep learning (DL) refers to a distinct class of ML algorithms that have achieved top-level performance in a variety of fields, including drug discovery. These types of models have unique characteristics that may make them more suitable for the complex task of modeling drug response based on both biological and chemical data, but the application of DL to drug response prediction has been unexplored until very recently. The few studies that have been published have shown promising results, and the use of DL for drug response prediction is beginning to attract greater interest from researchers in the field. In this article, we critically review ...


Detecting Anomalous Cryptocurrency Transactions: an AML/CFT Application of Machine Learning-based Forensics

arXiv.org Artificial Intelligence

In shaping the Internet of Money, the application of blockchain and distributed ledger technologies (DLTs) to the financial sector triggered regulatory concerns. Notably, while the user anonymity enabled in this field may safeguard privacy and data protection, the lack of identifiability hinders accountability and challenges the fight against money laundering and the financing of terrorism and proliferation (AML/CFT). As law enforcement agencies and the private sector apply forensics to track crypto transfers across ecosystems that are socio-technical in nature, this paper focuses on the growing relevance of these techniques in a domain where their deployment impacts the traits and evolution of the sphere. In particular, this work offers contextualized insights into the application of methods of machine learning and transaction graph analysis. Namely, it analyzes a real-world dataset of Bitcoin transactions represented as a directed graph network through various techniques. The modeling of blockchain transactions as a complex network suggests that the use of graph-based data analysis methods can help classify transactions and identify illicit ones. Indeed, this work shows that the neural network types known as Graph Convolutional Networks (GCN) and Graph Attention Networks (GAT) are a promising AML/CFT solution. Notably, in this scenario GCN outperform other classic approaches and GAT are applied for the first time to detect anomalies in Bitcoin. Ultimately, the paper upholds the value of public-private synergies to devise forensic strategies conscious of the spirit of explainability and data openness.


Smart ROI Detection for Alzheimer's disease prediction using explainable AI

arXiv.org Artificial Intelligence

Purpose Predicting the progression of MCI to Alzheimer's disease is an important step in reducing the progression of the disease. Therefore, many methods have been introduced for this task based on deep learning. Among these approaches, the methods based on ROIs are in a good position in terms of accuracy and complexity. In these techniques, some specific parts of the brain are extracted as ROI manually for all of the patients. Extracting ROI manually is time-consuming and its results depend on human expertness and precision. Method To overcome these limitations, we propose a novel smart method for detecting ROIs automatically based on Explainable AI using Grad-Cam and a 3DCNN model that extracts ROIs per patient. After extracting the ROIs automatically, Alzheimer's disease is predicted using extracted ROI-based 3D CNN. Results We implement our method on 176 MCI patients of the famous ADNI dataset and obtain remarkable results compared to the state-of-the-art methods. The accuracy acquired using 5-fold cross-validation is 98.6 and the AUC is 1. We also compare the results of the ROI-based method with the whole brain-based method. The results show that the performance is impressively increased. Conclusion The experimental results show that the proposed smart ROI extraction, which extracts the ROIs automatically, performs well for Alzheimer's disease prediction. The proposed method can also be used for Alzheimer's disease classification and diagnosis.


Representation Bias in Data: A Survey on Identification and Resolution Techniques

arXiv.org Artificial Intelligence

Data-driven algorithms are only as good as the data they work with, while data sets, especially social data, often fail to represent minorities adequately. Representation Bias in data can happen due to various reasons ranging from historical discrimination to selection and sampling biases in the data acquisition and preparation methods. Given that "bias in, bias out", one cannot expect AI-based solutions to have equitable outcomes for societal applications, without addressing issues such as representation bias. While there has been extensive study of fairness in machine learning models, including several review papers, bias in the data has been less studied. This paper reviews the literature on identifying and resolving representation bias as a feature of a data set, independent of how consumed later. The scope of this survey is bounded to structured (tabular) and unstructured (e.g., image, text, graph) data. It presents taxonomies to categorize the studied techniques based on multiple design dimensions and provides a side-by-side comparison of their properties. There is still a long way to fully address representation bias issues in data. The authors hope that this survey motivates researchers to approach these challenges in the future by observing existing work within their respective domains.


Enhancing COVID-19 Severity Analysis through Ensemble Methods

arXiv.org Artificial Intelligence

Computed Tomography (CT) scans provide a detailed image of the lungs, allowing clinicians to observe the extent of damage caused by COVID-19. The CT severity score (CTSS) based scoring method is used to identify the extent of lung involvement observed on a CT scan. This paper presents a domain knowledge-based pipeline for extracting regions of infection in COVID-19 patients using a combination of image-processing algorithms and a pre-trained UNET model. The severity of the infection is then classified into different categories using an ensemble of three machine-learning models: Extreme Gradient Boosting, Extremely Randomized Trees, and Support Vector Machine. The proposed system was evaluated on a validation dataset in the AI-Enabled Medical Image Analysis Workshop and COVID-19 Diagnosis Competition (AI-MIA-COV19D) and achieved a macro F1 score of 64%. These results demonstrate the potential of combining domain knowledge with machine learning techniques for accurate COVID-19 diagnosis using CT scans. The implementation of the proposed system for severity analysis is available at \textit{https://github.com/aanandt/Enhancing-COVID-19-Severity-Analysis-through-Ensemble-Methods.git }


Semi-supervised Bladder Tissue Classification in Multi-Domain Endoscopic Images

arXiv.org Artificial Intelligence

Objective: Accurate visual classification of bladder tissue during Trans-Urethral Resection of Bladder Tumor (TURBT) procedures is essential to improve early cancer diagnosis and treatment. During TURBT interventions, White Light Imaging (WLI) and Narrow Band Imaging (NBI) techniques are used for lesion detection. Each imaging technique provides diverse visual information that allows clinicians to identify and classify cancerous lesions. Computer vision methods that use both imaging techniques could improve endoscopic diagnosis. We address the challenge of tissue classification when annotations are available only in one domain, in our case WLI, and the endoscopic images correspond to an unpaired dataset, i.e. there is no exact equivalent for every image in both NBI and WLI domains. Method: We propose a semi-surprised Generative Adversarial Network (GAN)-based method composed of three main components: a teacher network trained on the labeled WLI data; a cycle-consistency GAN to perform unpaired image-to-image translation, and a multi-input student network. To ensure the quality of the synthetic images generated by the proposed GAN we perform a detailed quantitative, and qualitative analysis with the help of specialists. Conclusion: The overall average classification accuracy, precision, and recall obtained with the proposed method for tissue classification are 0.90, 0.88, and 0.89 respectively, while the same metrics obtained in the unlabeled domain (NBI) are 0.92, 0.64, and 0.94 respectively. The quality of the generated images is reliable enough to deceive specialists. Significance: This study shows the potential of using semi-supervised GAN-based bladder tissue classification when annotations are limited in multi-domain data. The dataset is available at https://zenodo.org/record/7741476#.ZBQUK7TMJ6k


Neighborhood Averaging for Improving Outlier Detectors

arXiv.org Artificial Intelligence

-- We hypothesize that similar objects should have similar outlier scores. To our knowledge, all existing outlier detectors calculate the outlier score for each object independently regardless of the outlier scores of the other objects. Therefore, they do not guarantee that similar objects have similar outlier scores. To verify our proposed hypothesis, we propose an outlier score post-processing technique for outlier detectors, called neighborhood averaging (NA), which pays attention to objects and their neighbors and guarantees them to have more similar outlier scores than their original scores. Given an object and its outlier score from any outlier detector, NA modifies its outlier score by combining it with its k nearest neighbors' scores. We demonstrate the effectivity of NA by using the well-known k-nearest neighbors (k-NN). Experimental results show that NA improves all 10 tested baseline detectors by 13% (from 0.70 to 0.79 AUC) on average evaluated on nine real-world datasets. Moreover, even outlier detectors that are already based on k-NN are also improved. The experiments also show that in some applications, the choice of detector is no more significant when detectors are jointly used with NA, which may pose a challenge to the generally considered idea that the data model is the most important factor. Outliers are objects that significantly deviate from other objects. Outliers can indicate useful information, which can be applied in applications such as fraud detection [1, 2], abnormal time series [3, 4], and traffic patterns [5, 6]. Outliers can also be harmful because they are generally unwanted, can be considered errors, and may have biased statistical analysis for applications like clustering [7, 8]. Recently, outlier detection has also been applied to manufacturing data [9] and industrial applications [10]. For these reasons, outliers need to be detected. Most outlier detectors calculate the so-called outlier score for every object independently and then calculate the threshold scores that deviate significantly from the others and label them as outliers [11].