Design Patterns for Machine Learning Pipelines - KDnuggets
Design patterns for ML pipelines have evolved several times in the past decade. These changes are usually driven by imbalances between memory and CPU performance. They are also distinct from traditional data processing pipelines (something like map reduce) as they need to support the execution of long-running, stateful tasks associated with deep learning. As growth in dataset sizes outpace memory availability, we have seen more ETL pipelines designed with distributed training and distributed storage as first-class principles. Not only can these pipelines train models in a parallel fashion using multiple accelerators, but they can also replace traditional distributed file systems with cloud object stores.
Nov-5-2021
- Technology: