When we approach modern Machine Learning problems in an AWS environment, there is more than traditional data preparation, model training, and final inferences to consider. Also, pure computing power is not the only concern we must deal with in creating an ML solution. There is a substantial difference between creating and testing a Machine Learning model inside a Jupyter Notebook locally and releasing it on a production infrastructure capable of generating business value. The complexities of going live with a Machine Learning workflow in the Cloud are called a deployment gap and we will see together through this article how to tackle it by combining speed and agility in modeling and training with criteria of solidity, scalability, and resilience required by production environments. The procedure we'll dive into is similar to what happened with the DevOps model for "traditional" software development, and the MLOps paradigm, this is how we call it, is commonly proposed as "an end-to-end process to design, create and manage Machine Learning applications in a reproducible, testable and evolutionary way". So as we will guide you through the following paragraphs, we will dive deep into the reasons and principles behind the MLOps paradigm and how it easily relates to the AWS ecosystem and the best practices of the AWS Well-Architected Framework. As said before, Machine Learning workloads can be essentially seen as complex pieces of software, so we can still apply "traditional" software practices.
As machine learning and AI propagate in software products and services, we need to establish best practices and tools to test, deploy, manage, and monitor ML models in real-world production. In short, with MLOps we strive to avoid "technical debt" in machine learning applications. SIG MLOps defines "an optimal MLOps experience [as] one where Machine Learning assets are treated consistently with all other software assets within a CI/CD environment. Machine Learning models can be deployed alongside the services that wrap them and the services that consume them as part of a unified release process." By codifying these practices, we hope to accelerate the adoption of ML/AI in software systems and fast delivery of intelligent software.
In the article, we'll explore some architectural design patterns that support the machine learning model life cycle. The standard data product pipeline, is an iterative process consisting of two phases -- build and deploy -- which mirror the machine learning pipeline.4 During the build phase, data is ingested and wrangled into a form that allows models to be fit and experimented on. During the deploy phase, models are selected and then used to make estimations or predictions that directly engage a user. Users respond to the output of models, creating feedback, which is in turn reingested and used to adapt models.
Development of Cyber Physical Systems (CPSs) requires close interaction between developers with expertise in many domains to achieve ever-increasing demands for improved performance, reduced cost, and more system autonomy. Each engineering discipline commonly relies on domain-specific modeling languages, and analysis and execution of these models is often automated with appropriate tooling. However, integration between these heterogeneous models and tools is often lacking, and most of the burden for inter-operation of these tools is placed on system developers. To address this problem, we introduce a workflow modeling language for the automation of complex CPS development processes and implement a platform for execution of these models in the Assurance-based Learning-enabled CPS (ALC) Toolchain. Several illustrative examples are provided which show how these workflow models are able to automate many time-consuming integration tasks previously performed manually by system developers.
To enable developers and citizen data scientists to take advantage of machine learning, the industry is moving towards AutoML - a simplified approach to building and training ML models. Training and deploying a sophisticated machine learning model involve multiple phases with each phase demanding a unique skill set. An enterprise data science team consists of data engineers, data scientists, business analysts, researches, ML developers and DevOps professionals to manage the workflow involved in operationalizing AI for businesses. Each stage is handled by an individual or a team specializing in that task. Data engineers deal with the ingestion and acquisition of data from disparate sources.