Amazon SageMaker Serverless Inference (Preview) was recently announced at re:Invent 2021 as a new model hosting feature that lets customers serve model predictions without having to explicitly provision compute instances or configure scaling policies to handle traffic variations. Serverless Inference is a new deployment capability that complements SageMaker's existing options for deployment that include: SageMaker Real-Time Inference for workloads with low latency requirements in the order of milliseconds, SageMaker Batch Transform to run predictions on batches of data, and SageMaker Asynchronous Inference for inferences with large payload sizes or requiring long processing times. Serverless Inference means that you don't need to configure and manage the underlying infrastructure hosting your models. When you host your model on a Serverless Inference endpoint, simply select the memory and max concurrent invocations. Then, SageMaker will automatically provision, scale, and terminate compute capacity based on the inference request volume.
We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 - 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Amazon just unveiled Serverless Inference, a new option for SageMaker, its fully managed machine learning (ML) service. The goal for Amazon SageMaker Serverless Inference is to serve use cases with intermittent or infrequent traffic patterns, lowering total cost of ownership (TCO) and making the service easier to use. VentureBeat connected with Bratin Saha, AWS VP of Machine Learning, to discuss where Amazon SageMaker Serverless fits into the big picture of Amazon's machine learning offering and how it affects ease of use and TCO, as well as Amazon's philosophy and process in developing its machine learning portfolio. Inference is the productive phase of ML-powered applications.
I have recently published a step-by-step guide to serverless model deployments with Amazon SageMaker Pipelines, Amazon API Gateway, and AWS Lambda. With AWS Lambda, you pay only for what you use. Lambda charges based on the number of requests, execution duration, and amount of memory allocated to the function. So how much memory should you allocate to your inference function? In this post, I will show how you can use SageMaker Hyperparameter Tuning (HPO) jobs and a load-testing tool to automatically optimize the price/performance ratio of your serverless inference service.
Identifying paraphrased text has business value in many use cases. For example, by identifying sentence paraphrases, a text summarization system could remove redundant information. Another application is to identify plagiarized documents. In this post, we fine-tune a Hugging Face transformer on Amazon SageMaker to identify paraphrased sentence pairs in a few steps. A truly robust model can identify paraphrased text when the language used may be completely different, and also identify differences when the language used has high lexical overlap.
The Amazon SageMaker machine learning service is a full platform that greatly simplifies the process of training and deploying your models at scale. However, there are still major gaps to enabling data scientists to do research and development without having to go through the heavy lifting of provisioning the infrastructure and developing their own continuous delivery practices to obtain quick feedback. In this talk, you will learn how to leverage AWS CodePipeline, CloudFormation, CodeBuild, and SageMaker to create continuous delivery pipelines that allow the data scientist to use a repeatable process to build, train, test and deploy their models. Below, I've included a screencast of the talk I gave at the AWS NYC Summit in July 2018 along with a transcript (generated by Amazon Transcribe – another Machine Learning service – along with lots of human editing). The last six minutes of the talk include two demos on using SageMaker, CodePipeline, and CloudFormation as part of the open source solution we created.