4 Steps to Machine Learning with Pentaho

#artificialintelligence 

The power of Pentaho Data Integration (PDI) for data access, blending and governance has been demonstrated and documented numerous times. However, perhaps less well known is how PDI as a platform, with all its data munging[1] power, is ideally suited to orchestrate and automate up to three stages of the CRISP-DM[2] life-cycle for the data science practitioner: generic data preparation/feature engineering, predictive modeling, and model deployment. By "generic data preparation" we are referring to the process of connecting to (potentially) multiple heterogeneous data sources and then joining, blending, cleaning, filtering, deriving and denormalizing data so that it ready for consumption by machine learning (ML) algorithms. Further ML-specific data transformations, such as supervised discretization, one-hot encoding etc. can then be applied as needed in an ML tool. For the data scientist, PDI can be used to remove the repetitive drudgery involved with manually performing similar data preparation processes repetitively, from one dataset to the next.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found