apache spark machine learning
Predicting Flight Delays with Apache Spark Machine Learning
This article was originally published on ETFTrends.com. Watch a complimentary webinar below to learn more about Apache Spark's MLlib, which makes machine learning scalable and easier with ML pipelines built on top of DataFrames. In MapR Technologies video, you'll get to learn about the following: Review Machine Learning Classification and Random Forests Use Spark SQL and DataFrames to explore real historic flight data Use [...]
Churn Prediction With Apache Spark Machine Learning - DZone AI
Churn prediction is big business. It minimizes customer defection by predicting which customers are likely to cancel a subscription to a service. Though originally used within the telecommunications industry, it has become common practice across banks, ISPs, insurance firms, and other verticals. The prediction process is heavily data-driven and often utilizes advanced machine learning techniques. In this post, we'll take a look at what types of customer data are typically used, do some preliminary analysis of the data, and generate churn prediction models -- all with Spark and its machine learning frameworks.
Churn Prediction with Apache Spark Machine Learning
We would like to determine which parameter values of the decision tree produce the best model. A common technique for model selection is k-fold cross validation, where the data is randomly split into k partitions. Each partition is used once as the testing data set, while the rest are used for training. Models are then generated using the training sets and evaluated with the testing sets, resulting in k model performance measurements. The average of the performance scores is often taken to be the overall score of the model, given its build parameters.
Using Apache Spark Machine Learning for Pattern Detection - RTInsights
As Oracle recounts, Apache Spark excels at running machine learning queries on massive data sets. Predicting consumer behavior is considered the holy grail of marketing, but a classic problem is filtering out the noise from customers who are ready to buy. Web activity such as search and browsing may generate petabytes of data, then there's past-purchase history and offline behavior such as in-store purchases. Then there's reams of demographic data to analyze -- age, income, affinities -- in search of the coveted target market. Alexander Sadovsky, director of data science at Oracle, runs a team responsible for crunching data on audience behavior and advertising across different channels, whether Google, Facebook, the wider internet, or in stores.