Goto

Collaborating Authors

Dask and Pandas and XGBoost: Playing nicely between distributed systems

@machinelearnbot

Editor's note: For an introduction to Dask, consider reading Introducing Dask for Parallel Programming: An Interview with Project Lead Developer. To read more about the most recent release, see Dask Release 0.14.1. This post talks about distributing Pandas Dataframes with Dask and then handing them over to distributed XGBoost for training. More generally it discusses the value of launching multiple distributed systems in the same shared-memory processes and smoothly handing data back and forth between them. XGBoost is a well-loved library for a popular class of machine learning algorithms, gradient boosted trees.


datas-frame – Scalable Machine Learning (Part 2): Partial Fit

#artificialintelligence

This work is supported by Anaconda, Inc. and the Data Driven Discovery Initiative from the Moore Foundation. This is part two of my series on scalable machine learning. The basic idea is that, for certain estimators, learning can be done in batches. The estimator will see a batch, and then incrementally update whatever it's learning (the coefficients, for example). Setting that aside, it wouldn't be great for a user, since working with generators of datasets is awkward compared to the expressivity we get from pandas and NumPy.


Python machine learning libraries

#artificialintelligence

This blog is a part of the learn machine learning coding basics in a weekend . The Python built-in list type does not allow for efficient array manipulation. The NumPy package is concerned with manipulation of multi-dimensional arrays. NumPy is at the foundation of almost all the other packages covering the Data Science aspects of Python. From a Data Science perspective, collections of Data types like Documents, Images, Sound etc can be represented as an array of numbers.


Python machine learning libraries

#artificialintelligence

This blog is a part of the learn machine learning coding basics in a weekend . The Python built-in list type does not allow for efficient array manipulation. The NumPy package is concerned with manipulation of multi-dimensional arrays. NumPy is at the foundation of almost all the other packages covering the Data Science aspects of Python. From a Data Science perspective, collections of Data types like Documents, Images, Sound etc can be represented as an array of numbers.


Python machine learning libraries

#artificialintelligence

This blog is a part of the learn machine learning coding basics in a weekend . The Python built-in list type does not allow for efficient array manipulation. The NumPy package is concerned with manipulation of multi-dimensional arrays. NumPy is at the foundation of almost all the other packages covering the Data Science aspects of Python. From a Data Science perspective, collections of Data types like Documents, Images, Sound etc can be represented as an array of numbers.