Bag of Tricks for Optimizing Machine Learning Training Pipelines - MLOps Community

#artificialintelligence 

Finally, one more interesting aspect of our training infrastructure is that we use a multi-cloud setup in practice. As it was told earlier, GCP is our main vendor for training instances for cost and powerful machines availability-related reasons, while our default production infrastructure is AWS. It means that sometimes we need to combine the two: e.g. We use Flyte to orchestrate this process. Flyte is a workflow management system that allows us to define a pipeline as a DAG of tasks. It is useful for us because it allows us to define a pipeline once and run its steps on different machines with different computational resources allocation, and it also provides a nice UI for monitoring the progress of the pipeline.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found