Collaborating Authors


Differential Privacy for Sequential Algorithms Machine Learning

We study the differential privacy of sequential statistical inference and learning algorithms that are characterized by random termination time. Using the two examples: sequential probability ratio test and sequential empirical risk minimization, we show that the number of steps such algorithms execute before termination can jeopardize the differential privacy of the input data in a similar fashion as their outputs, and it is impossible to use the usual Laplace mechanism to achieve standard differentially private in these examples. To remedy this, we propose a notion of weak differential privacy and demonstrate its equivalence to the standard case for large i.i.d. samples. We show that using the Laplace mechanism, weak differential privacy can be achieved for both the sequential probability ratio test and the sequential empirical risk minimization with proper performance guarantees. Finally, we provide preliminary experimental results on the Breast Cancer Wisconsin (Diagnostic) and Landsat Satellite Data Sets from the UCI repository.

Breast Cancer Diagnosis by Higher-Order Probabilistic Perceptrons Machine Learning

A two-layer neural network model that systematically includes correlations among input variables to arbitrary order and is designed to implement Bayes inference has been adapted to classify breast cancer tumors as malignant or benign, assigning a probability for either outcome. The inputs to the network represent measured characteristics of cell nuclei imaged in Fine Needle Aspiration biopsies. The present machine-learning approach to diagnosis (known as HOPP, for higher-order probabilistic perceptron) is tested on the much-studied, open-access Breast Cancer Wisconsin (Diagnosis) Data Set of Wolberg et al. This set lists, for each tumor, measured physical parameters of the cell nuclei of each sample. The HOPP model can identify the key factors -- input features and their combinations -- most relevant for reliable diagnosis. HOPP networks were trained on 90\% of the examples in the Wisconsin database, and tested on the remaining 10\%. Referred to ensembles of 300 networks, selected randomly for cross-validation, accuracy of classification for the test sets of up to 97\% was readily achieved, with standard deviation around 2\%, together with average Matthews correlation coefficients reaching 0.94 indicating excellent predictive performance. Demonstrably, the HOPP is capable of matching the predictive power attained by other advanced machine-learning algorithms applied to this much-studied database, over several decades. Analysis shows that in this special problem, which is almost linearly separable, the effects of irreducible correlations among the measured features of the Wisconsin database are of relatively minor importance, as the Naive Bayes approximation can itself yield predictive accuracy approaching 95\%. The advantages of the HOPP algorithm will be more clearly revealed in application to more challenging machine-learning problems.

Hybrid Machine Learning Model of Extreme Learning Machine Radial basis function for Breast Cancer Detection and Diagnosis; a Multilayer Fuzzy Expert System Machine Learning

-- Mammography is often used as the most common laboratory method for the detection of breast cancer, yet associated with the high cost and many side effects. M achine learning prediction as an alternative method has shown promising results. This paper present s a method based on a mul tilayer fuzzy expert system for the detection of breast cancer using an e xtreme learning machine (ELM) classification model integrated with radial basis function (RBF) kernel called ELM - RBF, considering the Wisconsin dataset . The performance of the propose d model is further compared with a l inear - SVM model. Furthermore, both models are studied in terms of criteria of accuracy, precision, sensitivity, specificity, validation, true positive rate (TPR), and false - negative rate (FNR). The ELM - RBF model for these criteria presents better performance compared to the SVM model . Breast cancer is among the most common disease of young women over the world [1 - 3]. Approximately 29.9% of mortality from can cer in women is due to breast cancer.

Welcome! You are invited to join a webinar: Production ML with the Autonomous Data Warehouse. After registering, you will receive a confirmation email about joining the webinar.


We use data from a popular Kaggle competition, the Wisconsin Breast Cancer data, to build a binary classification model for the liklihood of a tumor being benign or malignant. We see how OAC's Data Visualization can be used to profile & explore the data, and can be used to do a rapid prototype of a Machine Learning model with DVML. See how ADW can be used to easily drop a Machine Learning model into production and enabled as a REST API for custom Applications and websites. By registering for this TechCast you give permission for your name and email address to be shared with the presenter and for BIWA User Community so we can inform you of future TechCasts and conferences of interest.

Interpretable Counterfactual Explanations Guided by Prototypes Machine Learning

We propose a fast, model agnostic method for finding interpretable counterfactual explanations of classifier predictions by using class prototypes. We show that class prototypes, obtained using either an encoder or through class specific k-d trees, significantly speed up the the search for counterfactual instances and result in more interpretable explanations. We introduce two novel metrics to quantitatively evaluate local interpretability at the instance level. We use these metrics to illustrate the effectiveness of our method on an image and tabular dataset, respectively MNIST and Breast Cancer Wisconsin (Diagnostic). The method also eliminates the computational bottleneck that arises because of numerical gradient evaluation for $\textit{black box}$ models.

Machine Learning: A Dark Side of Cancer Computing Machine Learning

Cancer analysis and prediction is the utmost important research field for well-being of humankind. The Cancer data are analyzed and predicted using machine learning algorithms. Most of the researcher claims the accuracy of the predicted results within 99%. However, we show that machine learning algorithms can easily predict with an accuracy of 100% on Wisconsin Diagnostic Breast Cancer dataset. We show that the method of gaining accuracy is an unethical approach that we can easily mislead the algorithms. In this paper, we exploit the weakness of Machine Learning algorithms. We perform extensive experiments for the correctness of our results to exploit the weakness of machine learning algorithms. The methods are rigorously evaluated to validate our claim. In addition, this paper focuses on correctness of accuracy. This paper report three key outcomes of the experiments, namely, correctness of accuracies, significance of minimum accuracy, and correctness of machine learning algorithms.

Prediction of Malignant & Benign Breast Cancer: A Data Mining Approach in Healthcare Applications Machine Learning

As much as data science is playing a pivotal role everywhere, healthcare also finds it prominent application. Breast Cancer is the top rated type of cancer amongst women; which took away 627,000 lives alone. This high mortality rate due to breast cancer does need attention, for early detection so that prevention can be done in time. As a potential contributor to state-of-art technology development, data mining finds a multi-fold application in predicting Brest cancer. This work focuses on different classification techniques implementation for data mining in predicting malignant and benign breast cancer. Breast Cancer Wisconsin data set from the UCI repository has been used as experimental dataset while attribute clump thickness being used as an evaluation class. The performances of these twelve algorithms: Ada Boost M 1, Decision Table, J Rip, Lazy IBK, Logistics Regression, Multiclass Classifier, Multilayer Perceptron, Naive Bayes, Random forest and Random Tree are analyzed on this data set. Keywords- Data Mining, Classification Techniques, UCI repository, Breast Cancer, Classification Algorithms

Toward Efficient Breast Cancer Diagnosis and Survival Prediction Using L-Perceptron Artificial Intelligence

Breast cancer is the most frequently reported cancer type among the women around the globe and beyond that it has the second highest female fatality rate among all cancer types. Despite all the progresses made in prevention and early intervention, early prognosis and survival prediction rates are still unsatisfactory. In this paper, we propose a novel type of perceptron called L-Perceptron which outperforms all the previous supervised learning methods by reaching 97.42 \% and 98.73 \% in terms of accuracy and sensitivity, respectively in Wisconsin Breast Cancer dataset. Experimental results on Haberman's Breast Cancer Survival dataset, show the superiority of proposed method by reaching 75.18 \% and 83.86 \% in terms of accuracy and F1 score, respectively. The results are the best reported ones obtained in 10-fold cross validation in absence of any preprocessing or feature selection.

Breast Cancer Diagnosis via Classification Algorithms Machine Learning

In this paper, we analyze the Wisconsin Diagnostic Breast Cancer Data using Machine Learning classification techniques, such as the SVM, Bayesian Logistic Regression (Variational Approximation), and K-Nearest-Neighbors. We describe each model, and compare their performance through different measures. We conclude that SVM has the best performance among all other classifiers, while it competes closely with the Bayesian Logistic Regression that is ranked second best method for this dataset.

On Breast Cancer Detection: An Application of Machine Learning Algorithms on the Wisconsin Diagnostic Dataset Machine Learning

This paper presents a comparison of six machine learning (ML) algorithms: GRU-SVM (Agarap, 2017), Linear Regression, Multilayer Perceptron (MLP), Nearest Neighbor (NN) search, Softmax Regression, and Support Vector Machine (SVM) on the Wisconsin Diagnostic Breast Cancer (WDBC) dataset (Wolberg, Street, & Mangasarian, 1992) by measuring their classification test accuracy and their sensitivity and specificity values. The said dataset consists of features which were computed from digitized images of FNA tests on a breast mass (Wolberg, Street, & Mangasarian, 1992). For the implementation of the ML algorithms, the dataset was partitioned in the following fashion: 70% for training phase, and 30% for the testing phase. The hyper-parameters used for all the classifiers were manually assigned. Results show that all the presented ML algorithms performed well (all exceeded 90% test accuracy) on the classification task. The MLP algorithm stands out among the implemented algorithms with a test accuracy of ~99.04%.