Accuracy
Brain tumour segmentation using cascaded 3D densely-connected U-net
Ghaffari, Mina, Sowmya, Arcot, Oliver, Ruth
Accurate brain tumour segmentation is a crucial step towards improving disease diagnosis and proper treatment planning. In this paper, we propose a deep-learning based method to segment a brain tumour into its subregions: whole tumour, tumour core and enhancing tumour. The proposed architecture is a 3D convolutional neural network based on a variant of the U-Net architecture of Ronneberger et al. [17] with three main modifications: (i) a heavy encoder, light decoder structure using residual blocks (ii) employment of dense blocks instead of skip connections, and (iii) utilization of self-ensembling in the decoder part of the network. The network was trained and tested using two different approaches: a multitask framework to segment all tumour subregions at the same time, and a three-stage cascaded framework to segment one subregion at a time. An ensemble of the results from both frameworks was also computed. To address the class imbalance issue, appropriate patch extraction was employed in a pre-processing step. Connected component analysis was utilized in the post-processing step to reduce the false positive predictions. Experimental results on the BraTS20 validation dataset demonstrates that the proposed model achieved average Dice Scores of 0.90, 0.82, and 0.78 for whole tumour, tumour core and enhancing tumour respectively. Keywords: Brain tumour segmentation, · Multimodal MRI, · Cascaded network, · Densely connected CNN.
PANDA: Predicting the change in proteins binding affinity upon mutations using sequence information
Abbasi, Wajid Arshad, Abbas, Syed Ali, Andleeb, Saiqa
Accurately determining a change in protein binding affinity upon mutations is important for the discovery and design of novel therapeutics and to assist mutagenesis studies. Determination of change in binding affinity upon mutations requires sophisticated, expensive, and time-consuming wet-lab experiments that can be aided with computational methods. Most of the computational prediction techniques require protein structures that limit their applicability to protein complexes with known structures. In this work, we explore the sequence-based prediction of change in protein binding affinity upon mutation. We have used protein sequence information instead of protein structures along with machine learning techniques to accurately predict the change in protein binding affinity upon mutation. Our proposed sequence-based novel change in protein binding affinity predictor called PANDA gives better accuracy than existing methods over the same validation set as well as on an external independent test dataset. On an external test dataset, our proposed method gives a maximum Pearson correlation coefficient of 0.52 in comparison to the state-of-the-art existing protein structure-based method called MutaBind which gives a maximum Pearson correlation coefficient of 0.59. Our proposed protein sequence-based method, to predict a change in binding affinity upon mutations, has wide applicability and comparable performance in comparison to existing protein structure-based methods.
Optimal Sepsis Patient Treatment using Human-in-the-loop Artificial Intelligence
Gupta, Akash, Lash, Michael T., Nachimuthu, Senthil K.
Sepsis is one of the leading causes of death in Intensive Care Units (ICU). The strategy for treating sepsis involves the infusion of intravenous (IV) fluids and administration of antibiotics. Determining the optimal quantity of IV fluids is a challenging problem due to the complexity of a patient's physiology. In this study, we develop a data-driven optimization solution that derives the optimal quantity of IV fluids for individual patients. The proposed method minimizes the probability of severe outcomes by controlling the prescribed quantity of IV fluids and utilizes human-in-the-loop artificial intelligence. We demonstrate the performance of our model on 1122 ICU patients with sepsis diagnosis extracted from the MIMIC-III dataset. The results show that, on average, our model can reduce mortality by 22%. This study has the potential to help physicians synthesize optimal, patient-specific treatment strategies.
Anomaly and Fraud Detection in Credit Card Transactions Using the ARIMA Model
Moschini, Giulia, Houssou, Régis, Bovay, Jérôme, Robert-Nicoud, Stephan
This paper addresses the problem of unsupervised approach of credit card fraud detection in unbalanced dataset using the ARIMA model. The ARIMA model is fitted on the regular spending behaviour of the customer and is used to detect fraud if some deviations or discrepancies appear. Our model is applied to credit card datasets and is compared to 4 anomaly detection approaches such as K-Means, Box-Plot, Local Outlier Factor and Isolation Forest. The results show that the ARIMA model presents a better detecting power than the benchmark models.
Arabic Opinion Mining Using a Hybrid Recommender System Approach
Harrag, Fouzi, Al-Salman, Abdulmalik Salman, Alquahtani, Alaa
One of these textual information is the customer comments or reviews. People usually prefer to read the reviews before buying or using a service to make the right decision. This behavior is also common before the existence of the Internet. From this amount of available data, researches attempt to handle and use these data to have a specific and useful knowledge. Sentiment analysis (SA) is the process of determining the opinion or feeling of a piece of text. Sentiment means feelings, attitudes, emotions and opinions. The applications of sentiment analysis are numerous such as politics or political science, law, e-commerce, sociology and psychology. In e-commerce, the sentiment analysis is super useful for gaining insight into customer opinions; once they understand how the customer feels after analyzing their comments or reviews, they can identify what they like and dislike and build things like recommendation systems, or enhance the product or the service.
Optimal Decision Trees for Nonlinear Metrics
Demirović, Emir, Stuckey, Peter J.
Nonlinear metrics, such as the F1-score, Matthews correlation coefficient, and Fowlkes-Mallows index, are often used to evaluate the performance of machine learning models, in particular, when facing imbalanced datasets that contain more samples of one class than the other. Recent optimal decision tree algorithms have shown remarkable progress in producing trees that are optimal with respect to linear criteria, such as accuracy, but unfortunately nonlinear metrics remain a challenge. To address this gap, we propose a novel algorithm based on bi-objective optimisation, which treats misclassifications of each binary class as a separate objective. We show that, for a large class of metrics, the optimal tree lies on the Pareto frontier. Consequently, we obtain the optimal tree by using our method to generate the set of all nondominated trees. To the best of our knowledge, this is the first method to compute provably optimal decision trees for nonlinear metrics. Our approach leads to a trade-off when compared to optimising linear metrics: the resulting trees may be more desirable according to the given nonlinear metric at the expense of higher runtimes. Nevertheless, the experiments illustrate that runtimes are reasonable for majority of the tested datasets.
Light Can Hack Your Face! Black-box Backdoor Attack on Face Recognition Systems
Li, Haoliang, Wang, Yufei, Xie, Xiaofei, Liu, Yang, Wang, Shiqi, Wan, Renjie, Chau, Lap-Pui, Kot, Alex C.
Deep neural networks (DNN) have shown great success in many computer vision applications. However, they are also known to be susceptible to backdoor attacks. When conducting backdoor attacks, most of the existing approaches assume that the targeted DNN is always available, and an attacker can always inject a specific pattern to the training data to further fine-tune the DNN model. However, in practice, such attack may not be feasible as the DNN model is encrypted and only available to the secure enclave. In this paper, we propose a novel black-box backdoor attack technique on face recognition systems, which can be conducted without the knowledge of the targeted DNN model. To be specific, we propose a backdoor attack with a novel color stripe pattern trigger, which can be generated by modulating LED in a specialized waveform. We also use an evolutionary computing strategy to optimize the waveform for backdoor attack. Our backdoor attack can be conducted in a very mild condition: 1) the adversary cannot manipulate the input in an unnatural way (e.g., injecting adversarial noise); 2) the adversary cannot access the training database; 3) the adversary has no knowledge of the training model as well as the training set used by the victim party. We show that the backdoor trigger can be quite effective, where the attack success rate can be up to $88\%$ based on our simulation study and up to $40\%$ based on our physical-domain study by considering the task of face recognition and verification based on at most three-time attempts during authentication. Finally, we evaluate several state-of-the-art potential defenses towards backdoor attacks, and find that our attack can still be effective. We highlight that our study revealed a new physical backdoor attack, which calls for the attention of the security issue of the existing face recognition/verification techniques.
General DeepLCP model for disease prediction : Case of Lung Cancer
Kahla, Mayssa Ben, Kanzari, Dalel, Maalel, Ahmed
According to GHO (Global Health Observatory (GHO), the high prevalence of a large variety of diseases such as Ischaemic heart disease, stroke, lung cancer disease and lower respiratory infections have remained the top killers during the past decade. The growth in the number of mortalities caused by these disease is due to the very delayed symptoms'detection. Since in the early stages, the symptoms are insignificant and similar to those of benign diseases (e.g. the flu ), and we can only detect the disease at an advanced stage. In addition, The high frequency of improper practices that are harmful to health, the hereditary factors, and the stressful living conditions can increase the death rates. Many researches dealt with these fatal disease, and most of them applied advantage machine learning models to deal with image diagnosis. However the drawback is that imagery permit only to detect disease at a very delayed stage and then patient can hardly be saved. In this Paper we present our new approach "DeepLCP" to predict fatal diseases that threaten people's lives. It's mainly based on raw and heterogeneous data of the concerned (or under-tested) person. "DeepLCP" results of a combination combination of the Natural Language Processing (NLP) and the deep learning paradigm.The experimental results of the proposed model in the case of Lung cancer prediction have approved high accuracy and a low loss data rate during the validation of the disease prediction.
Social network analytics for supervised fraud detection in insurance
Óskarsdóttir, María, Ahmed, Waqas, Antonio, Katrien, Baesens, Bart, Dendievel, Rémi, Donas, Tom, Reynkens, Tom
Insurance fraud occurs when policyholders file claims that are exaggerated or based on intentional damages. This contribution develops a fraud detection strategy by extracting insightful information from the social network of a claim. First, we construct a network by linking claims with all their involved parties, including the policyholders, brokers, experts, and garages. Next, we establish fraud as a social phenomenon in the network and use the BiRank algorithm with a fraud specific query vector to compute a fraud score for each claim. From the network, we extract features related to the fraud scores as well as the claims' neighborhood structure. Finally, we combine these network features with the claim-specific features and build a supervised model with fraud in motor insurance as the target variable. Although we build a model for only motor insurance, the network includes claims from all available lines of business. Our results show that models with features derived from the network perform well when detecting fraud and even outperform the models using only the classical claim-specific features. Combining network and claim-specific features further improves the performance of supervised learning models to detect fraud. The resulting model flags highly suspicions claims that need to be further investigated. Our approach provides a guided and intelligent selection of claims and contributes to a more effective fraud investigation process.
An Extensive Experimental Evaluation of Automated Machine Learning Methods for Recommending Classification Algorithms (Extended Version)
Basgalupp, Márcio P., Barros, Rodrigo C., de Sá, Alex G. C., Pappa, Gisele L., Mantovani, Rafael G., de Carvalho, André C. P. L. F., Freitas, Alex A.
This paper presents an experimental comparison among four Automated Machine Learning (AutoML) methods for recommending the best classification algorithm for a given input dataset. Three of these methods are based on Evolutionary Algorithms (EAs), and the other is Auto-WEKA, a well-known AutoML method based on the Combined Algorithm Selection and Hyper-parameter optimisation (CASH) approach. The EA-based methods build classification algorithms from a single machine learning paradigm: either decision-tree induction, rule induction, or Bayesian network classification. Auto-WEKA combines algorithm selection and hyper-parameter optimisation to recommend classification algorithms from multiple paradigms. We performed controlled experiments where these four AutoML methods were given the same runtime limit for different values of this limit. In general, the difference in predictive accuracy of the three best AutoML methods was not statistically significant. However, the EA evolving decision-tree induction algorithms has the advantage of producing algorithms that generate interpretable classification models and that are more scalable to large datasets, by comparison with many algorithms from other learning paradigms that can be recommended by Auto-WEKA. We also observed that Auto-WEKA has shown meta-overfitting, a form of overfitting at the meta-learning level, rather than at the base-learning level.