Operator Based Machine Learning Pipeline Construction · rstats-gsoc/gsoc2017 Wiki · GitHub

#artificialintelligence 

Many machine learning applications require extensive preprocessing on the data for state-of-the-art performance. A large number of preprocessing procedures have to be fused with a learner to create a wrapped learner, which assures that the same preprocessing is used in training and prediction. This creates a preprocessing chain which can be added to any learner with a single command to construct a configurable pipeline. The R programming language already offers a general purpose package for piping function output to new functions, magrittr. This is heavily utilized by the package dplyr for data manipulation, but mostly consists of basic low level functionality, e.g., mutation, aggregation, selection, filtering etc.