Goto

Collaborating Authors

 spark machine learning logistic regression


Predicting Breast Cancer Using Apache Spark Machine Learning Logistic Regression

#artificialintelligence

In this blog post, I'll help you get started using Apache Spark's spark.ml Classification is a family of supervised machine learning algorithms that identify which category an item belongs to (for example, whether a cancer tissue observation is malignant or not), based on labeled examples of known items (for example, observations known to be malignant or not). Classification takes a set of data with known labels and pre-determined features and learns how to label new records based on that information. Features are the "if questions" that you ask. The label is the answer to those questions.


Predicting Breast Cancer Using Apache Spark Machine Learning Logistic Regression

#artificialintelligence

Let's go through an example of Cancer Tissue Observations: Logistic regression is a popular method to predict a binary response. It is a special case of Generalized Linear models that predicts the probability of the outcome. Logistic regression measures the relationship between the Y "Label" and the X "Features" by estimating probabilities using a logistic function. The model predicts a probability which is used to predict the label class. Our data is from the Wisconsin Diagnostic Breast Cancer (WDBC) Data Set which categorizes breast tumor cases as either benign or malignant based on 9 features to predict the diagnosis.