R Addict Blog

#artificialintelligence 

Machine and statistical learning wizards are becoming more eager to perform analysis with Spark ML library if this is only possible. It's trendy, posh, spicy and gives the feeling of doing state of the art machine learning and being up to date with the newest computational trends. It is even more sexy and powerful when computations can be performed on the extraordinarily enormous computation cluster - let's say 100 machines on YARN hadoop cluster makes you the real data cruncher! In this post I present sparklyr package (by RStudio), the connector that will transform you from a regular R user, to the supa! Moreover, I present how I have extended the interface to K-means procedure, so that now it is also possible to compute cost for that model, which might be beneficial in determining the number of clusters in segmentation problems.