Out of the box, XGBoost cannot be trained on datasets larger than your computer memory; Python will throw a MemoryError. This tutorial will show you how to go beyond your local machine limitations by leveraging distributed XGBoost with Dask with only minor changes to your existing code. Here is the code we will use if you want to jump right in. By default, XGBoost trains models sequentially. This is fine for basic projects, but as the size of your dataset and/or ML model grows, you may want to consider running XGBoost in distributed mode with Dask to speed up computations and reduce the burden on your local machine.
Dec-20-2021, 16:50:41 GMT