Machine learning PhD students are in a unique position: they often need to run large-scale experiments to conduct state-of-the-art research but they don't have the support of the platform teams that industrial ML engineers can rely on. As former PhD students ourselves, we recount our hands-on experience with these challenges and explain how open-source tools like Determined would have made grad school a lot less painful. When we started graduate school as PhD students at Carnegie Mellon University (CMU), we thought the challenge laid in having novel ideas, testing hypotheses, and presenting research. Instead, the most difficult part was building out the tooling and infrastructure needed to run deep learning experiments. While industry labs like Google Brain and FAIR have teams of engineers to provide this kind of support, independent researchers and graduate students are left to manage on their own.
Azure Machine Learning (AML) is a cloud-based machine learning service for data scientists and ML engineers. You can use AML to manage the machine learning lifecycle--train, develop, and test models, but also run MLOps processes with speed, efficiency, and quality. For organizations that want to scale ML operations and unlock the potential of AI, tools like AML are important. Creating machine learning solutions that drive business growth becomes much easier. But what if you don't need a comprehensive MLOps solution like AML? Maybe you want to build your own stack, and need specific tools for tasks like tracking, deployment, or for managing other key parts of MLOps? Experiment tracking documents every piece of information that you care about during your ML experiments. Machine learning is an iterative process, so this is really important. Azure ML provides experimental tracking for all metrics in the machine learning environment.
This article provides an overview of the Collective Knowledge technology (CK or cKnowledge). CK attempts to make it easier to reproduce ML&systems research, deploy ML models in production, and adapt them to continuously changing data sets, models, research techniques, software, and hardware. The CK concept is to decompose complex systems and ad-hoc research projects into reusable sub-components with unified APIs, CLI, and JSON meta description. Such components can be connected into portable workflows using DevOps principles combined with reusable automation actions, software detection plugins, meta packages, and exposed optimization parameters. CK workflows can automatically plug in different models, data and tools from different vendors while building, running and benchmarking research code in a unified way across diverse platforms and environments. Such workflows also help to perform whole system optimization, reproduce results, and compare them using public or private scoreboards on the CK platform (https://cKnowledge.io). For example, the modular CK approach was successfully validated with industrial partners to automatically co-design and optimize software, hardware, and machine learning models for reproducible and efficient object detection in terms of speed, accuracy, energy, size, and other characteristics. The long-term goal is to simplify and accelerate the development and deployment of ML models and systems by helping researchers and practitioners to share and reuse their knowledge, experience, best practices, artifacts, and techniques using open CK APIs.
--T uning machine learning models at scale, especially finding the right hyperparameter values, can be difficult and time-consuming. In addition to the computational effort required, this process also requires some ancillary efforts including engineering tasks (e.g., job scheduling) as well as more mundane tasks (e.g., keeping track of the various parameters and associated results). We present Auptimizer, a general Hyperparameter Optimization (HPO) framework to help data scientists speed up model tuning and bookkeeping. With Auptimizer, users can use all available computing resources in distributed settings for model training. The design also allows researchers to integrate new HPO algorithms. T o demonstrate its flexibility, we show how Auptimizer integrates a few major HPO techniques (from random search to neural architecture search). Designing a Machine Learning (ML) framework for production faces challenges similar to those faced with Big Data. There is a large volume of models with a variety of configurations and training them efficiently at scale with reproducibility is critical to realizing their business value. In this paper, we address one design aspect of the ML framework, namely the HPO process, via a framework called Auptimizer. A. Hyperparameter Optimization ML models are typically sensitive to the values of hy-perparameters . Different from model parameters, these hyperparameters are values that control the model configuration or the training setup and thus need to be set before training the model. Due to the lack of gradient information for these hyperparameters, tuning them is often treated as a black-box optimization . As an alternative to manual selection (which is usually based on modeler's expertise), researchers have proposed different methods to accelerate the tuning process including Bayesian approaches , evolutionary algorithms , multi-armed bandits , and architecture search by learning . Tuning hyperparameters is often time-consuming especially when model training is computationally intensive . Therefore, in practice, an automated HPO solution is critically important for machine learning.
To answer the basic question of "What is MLOps?" we need to understand first that what is DevOps. DevOps is a set of practices that combines software development and IT operations. It aims to shorten the systems development life cycle and provide continuous delivery with high software quality. DevOps is complementary with Agile software development; several DevOps aspects came from Agile methodology. DevOps is the offspring of agile software development – born from the need to keep up with the increased software velocity and throughput agile methods have achieved.