Machine Learning with InsightEdge: Part II - DZone Big Data
Now that we have training and test datasets sampled, initially preprocessed and available in the data grid, we can close Web Notebook and start experimenting with different techniques and algorithms by submitting Spark applications. For our first baseline approach let's take a single feature device_conn_type and logistic regression algorithm: We will explain a little bit more what happens here. At first, we load the training dataset from the data grid, which we prepared and saved earlier with Web Notebook. Then we use StringIndexer and OneHotEncoder to map a column of categories to a column of binary vectors. For example, with 4 categories of device_conn_type, an input value of the second category would map to an output vector of [0.0, 1.0, 0.0, 0.0, 0.0].
Oct-10-2016, 01:26:03 GMT
- Technology: