Build pipelines with Pandas using "pdpipe"
Pandas is an amazing library in the Python ecosystem for data analytics and machine learning. They form the perfect bridge between the data world, where Excel/CSV files and SQL tables live, and the modeling world where Scikit-learn or TensorFlow perform their magic. A data science flow is most often a sequence of steps -- datasets must be cleaned, scaled, and validated before they can be ready to be used by that powerful machine learning algorithm. These tasks can, of course, be done with many single-step functions/methods that are offered by packages like Pandas but a more elegant way is to use a pipeline. In almost all cases, a pipeline reduces the chance of error and saves time by automating repetitive tasks.
Dec-2-2019, 15:28:22 GMT