Elastic Deep Learning With Horovod On Ray - AI Summary
Since its inception, the Ray ecosystem has grown to include a variety of features and tools useful for training ML models on the cloud, including Ray Tune for distributed hyperparameter tuning, the Ray Cluster Launcher for cluster provisioning, and load-based autoscaling . Because Ray is a general distributed compute platform, users of Ray are free to choose among a growing number of distributed data processing frameworks, including Spark, running on the same resources provisioned by Ray for the deep learning workflow. Now in the upcoming Ludwig 0.4 release, we're integrating Dask on Ray for distributed out-of-memory data preprocessing, Horovod on Ray for distributed training, and Ray Tune for hyperparameter optimization. Ludwig running in local mode (pre v0.4): all data needs to fit in memory on a single machine.Ludwig running on a Ray cluster (post v0.4): Ray scales out preprocessing and distributed training to process large datasets without needing to write any infrastructure code in Ludwig.By leveraging Dask, Ludwig's existing Pandas preprocessing can be scaled to handle large datasets with minimal code changes, and by leveraging Ray, we can combine the preprocessing, distributed training, and hyperparameter search all within a single job running a single training script.
Dec-24-2022, 19:05:40 GMT
- Technology: