Claims Severity Prediction with Apache Spark 2.0 and Scala - Experfy Insights

#artificialintelligence 

Allstate Corporation, the second largest insurance company in United States, founded in 1931, recently launched a Machine Learning recruitment challenge in partnership with Kaggle. Allstate's objective was to predict the cost, and hence the severity, of claims. The competition organizers provide the competitors with more than 300,000 examples with masked and anonymous data consisting of more than 100 categorical and numerical attributes, thus being compliant with confidentiality constraints. The Spark/Scala script explained in this post obtains the training and test input datasets from local or Amazon's AWS S3 environment and trains a Random Forest model over it. The objective is to demonstrate the use of Spark 2.0 Machine Learning pipelines with Scala language, AWS S3 integration and some general good practices for building Machine Learning models. In order to keep this main objective, more sophisticated techniques (such as a thorough exploratory data analysis and feature engineering) are intentionally omitted. Since almost all personal computers nowadays have many Gigabytes of RAM (and it is in an accelerated growing) and powerful CPUs and GPUs, many real-world machine learning problems can be solved with a single computer and frameworks such as ScikitLearn, with no need of a distributed system.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found