Use H2O and data.table to build models on large data sets in R

#artificialintelligence 

Last week, I wrote an introductory article on the package data.table. It was intended to provide you a head start and become familiar with its unique and short syntax. The next obvious step is to focus on modeling, which we will do in this post today. Atleast, I used to think of myself as a crippled R user when faced with large data sets. I would like to thank Matt Dowle again for this accomplishment. Algorithms like random forest (ntrees 1000) takes forever to run on my data set with 800,000 rows. I'm sure there are many R users who are trapped in a similar situation. To overcome this painstaking hurdle, I decided to write this post which demonstrates using the two most powerful packages i.e. For practical understanding, I've taken the data set from a previously held competition and tried to improve the score using 4 different machine learning algorithms (with H2O) & feature engineering (with data.table).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found