Amazon Web Services claims to have the broadest and most complete set of machine learning capabilities. I honestly don't know how the company can claim those superlatives with a straight face: Yes, the AWS machine learning offerings are broad and fairly complete and rather impressive, but so are those of Google Cloud and Microsoft Azure. Amazon SageMaker Clarify is the new add-on to the Amazon SageMaker machine learning ecosystem for Responsible AI. SageMaker Clarify integrates with SageMaker at three points: in the new Data Wrangler to detect data biases at import time, such as imbalanced classes in the training set, in the Experiments tab of SageMaker Studio to detect biases in the model after training and to explain the importance of features, and in the SageMaker Model Monitor, to detect bias shifts in a deployed model over time. Historically, AWS has presented its services as cloud-only.
As enterprises move from running ad hoc machine learning (ML) models to using AI/ML to transform their business at scale, the adoption of ML Operations (MLOps) becomes inevitable. As shown in the following figure, the ML lifecycle begins with framing a business problem as an ML use case followed by a series of phases, including data preparation, feature engineering, model building, deployment, continuous monitoring, and retraining. For many enterprises, a lot of these steps are still manual and loosely integrated with each other. Therefore, it's important to automate the end-to-end ML lifecycle, which enables frequent experiments to drive better business outcomes. Data preparation is one of the crucial steps in this lifecycle, because the ML model's accuracy depends on the quality of the training dataset.
Machine Learning and data science life cycle involved several phases. Each phase requires complex tasks executed by different teams, as explained by Microsoft in this article. To solve the complexity of these tasks, cloud providers like Amazon, Microsoft, and Google services automate these tasks that speed up end to end the machine learning lifecycle. This article explains Amazon Web Services (AWS) cloud services used in different tasks in a machine learning life cycle. To better understand each service, I will write a brief description, a use case, and a link to the documentation. In this article, machine learning lifecycle can be replaced with data science lifecycle.
Nearly three years after it was first launched, Amazon Web Services' SageMaker platform has gotten a significant upgrade in the form of new features making it easier for developers to automate and scale each step of the process to build new automation and machine learning capabilities, the company said. As machine learning moves into the mainstream, business units across organizations will find applications for automation, and AWS is trying to make the development of those bespoke applications easier for its customers. "One of the best parts of having such a widely-adopted service like SageMaker is that we get lots of customer suggestions which fuel our next set of deliverables," said AWS vice president of machine learning, Swami Sivasubramanian. "Today, we are announcing a set of tools for Amazon SageMaker that makes it much easier for developers to build end-to-end machine learning pipelines to prepare, build, train, explain, inspect, monitor, debug and run custom machine learning models with greater visibility, explainability, and automation at scale." Already companies like 3M, ADP, AstraZeneca, Avis, Bayer, Capital One, Cerner, Domino's Pizza, Fidelity Investments, Lenovo, Lyft, T-Mobile, and Thomson Reuters are using SageMaker tools in their own operations, according to AWS.
When we approach modern Machine Learning problems in an AWS environment, there is more than traditional data preparation, model training, and final inferences to consider. Also, pure computing power is not the only concern we must deal with in creating an ML solution. There is a substantial difference between creating and testing a Machine Learning model inside a Jupyter Notebook locally and releasing it on a production infrastructure capable of generating business value. The complexities of going live with a Machine Learning workflow in the Cloud are called a deployment gap and we will see together through this article how to tackle it by combining speed and agility in modeling and training with criteria of solidity, scalability, and resilience required by production environments. The procedure we'll dive into is similar to what happened with the DevOps model for "traditional" software development, and the MLOps paradigm, this is how we call it, is commonly proposed as "an end-to-end process to design, create and manage Machine Learning applications in a reproducible, testable and evolutionary way". So as we will guide you through the following paragraphs, we will dive deep into the reasons and principles behind the MLOps paradigm and how it easily relates to the AWS ecosystem and the best practices of the AWS Well-Architected Framework. As said before, Machine Learning workloads can be essentially seen as complex pieces of software, so we can still apply "traditional" software practices.