Collaborating Authors

Unpacking the Complexity of Machine Learning Deployments


Deploying and maintaining Machine Learning models at scale is one of the most pressing challenges faced by organizations today. Machine Learning workflow which includes Training, Building and Deploying machine learning models can be a long process with many roadblocks along the way. Many data science projects don't make it to production because of challenges that slow down or halt the entire process. To overcome the challenges of model deployment, we need to identify the problems and learn what causes them. End-to-end ML applications often comprise of components written in different programming languages.

DLHub: Model and Data Serving for Science Machine Learning

Abstract--While the Machine Learning (ML) landscape is evolving rapidly, there has been a relative lag in the development of the "learning systems" needed to enable broad adoption. Furthermore, few such systems are designed to support the specialized requirements of scientific ML. Here we present the Data and Learning Hub for science (DLHub), a multi-tenant system that provides both model repository and serving capabilities witha focus on science applications. First, its selfservice modelrepository allows users to share, publish, verify, reproduce, and reuse models, and addresses concerns related to model reproducibility by packaging and distributing models and all constituent components. Second, it implements scalable and low-latency serving capabilities that can leverage parallel and distributed computing resources to democratize access to published modelsthrough a simple web interface. Unlike other model serving frameworks, DLHub can store and serve any Python 3-compatible model or processing function, plus multiple-function pipelines. We show that relative to other model serving systems including TensorFlow Serving, SageMaker, and Clipper, DLHub provides greater capabilities, comparable performance without memoization and batching, and significantly better performance when the latter two techniques can be employed. We also describe early uses of DLHub for scientific applications. I. INTRODUCTION Machine Learning (ML) is disrupting nearly every aspect of computing. Researchers now turn to ML methods to uncover patterns in vast data collections and to make decisions with little or no human input. As ML becomes increasingly pervasive, newsystems are required to support the development, adoption, and application of ML. We refer to the broad class of systems designed to support ML as "learning systems." Learning systems need to support the entire ML lifecycle (see Figure 1), including model development [1, 2]; scalable training across potentially tens of thousands of cores and GPUs [3]; model publication and sharing [4]; and low latency and highthroughput inference[5]; all while encouraging best-practice software engineering when developing models [6].

Machine Learning model deployment


"Enterprise Machine Learning requires looking at the big picture […] from a data engineering and a data platform perspective," lectured Justin Norman during the talk on the deployment of Machine Learning models at this year's DataWorks Summit in Barcelona. Indeed, an industrial Machine Learning system is a part of a vast data infrastructure, which renders an end-to-end ML workflow particularly complex. The challenges linked to the development, deployment, and maintenance of the real-world ML systems should not be overlooked as we pursue the finest ML algorithms. Machine Learning is not necessarily meant to replace human decision making, it is mainly about helping humans make complex judgment base decisions. The talk I attended, Machine Learning Model Deployment: Strategy to Implementation, was given by Cloudera's experts, Justin Norman and Sagar Kewalramani.

ISTHMUS: Secure, Scalable, Real-time and Robust Machine Learning Platform for Healthcare Machine Learning

In recent times, machine learning (ML) and artificial intelligence (AI) based systems have evolved and scaled across different industries such as finance, retail, insurance, energy utilities, etc. Among other things, they have been used to predict patterns of customer behavior, to generate pricing models, and to predict the return on investments. But the successes in deploying machine learning models at scale in those industries have not translated into the healthcare setting. There are multiple reasons why integrating ML models into healthcare has not been widely successful, but from a technical perspective, general-purpose commercial machine learning platforms are not a good fit for healthcare due to complexities in handling data quality issues, mandates to demonstrate clinical relevance, and a lack of ability to monitor performance in a highly regulated environment with stringent security and privacy needs. In this paper, we describe Isthmus, a turnkey, cloud-based platform which addresses the challenges above and reduces time to market for operationalizing ML/AI in healthcare. Towards the end, we describe three case studies which shed light on Isthmus capabilities. These include (1) supporting an end-to-end lifecycle of a model which predicts trauma survivability at hospital trauma centers, (2) bringing in and harmonizing data from disparate sources to create a community data platform for inferring population as well as patient level insights for Social Determinants of Health (SDoH), and (3) ingesting live-streaming data from various IoT sensors to build models, which can leverage real-time and longitudinal information to make advanced time-sensitive predictions.

Best practices for implementing machine learning on Google Cloud


Use BigQuery to process tabular data. Use Dataflow to process unstructured data. Use managed datasets to link data to your models. The recommended approach for processing your data depends on the framework and data types you're using. This section provides high-level recommendations for common scenarios. For general recommendations on data engineering and feature engineering for ML, see Data preprocessing for machine learning: options and recommendations and Data preprocessing for machine learning using TensorFlow Transform. If you're using TensorFlow for model development, use TensorFlow Extended to prepare your data for training. TensorFlow Transform is the TensorFlow component that enables defining and executing a preprocessing function to transform your data.