Goto

Collaborating Authors

 inference container


Deploy Amazon SageMaker Autopilot models to serverless inference endpoints

#artificialintelligence

Amazon SageMaker Autopilot automatically builds, trains, and tunes the best machine learning (ML) models based on your data, while allowing you to maintain full control and visibility. Autopilot can also deploy trained models to real-time inference endpoints automatically. If you have workloads with spiky or unpredictable traffic patterns that can tolerate cold starts, then deploying the model to a serverless inference endpoint would be more cost efficient. Amazon SageMaker Serverless Inference is a purpose-built inference option ideal for workloads with unpredictable traffic patterns and that can tolerate cold starts. Unlike a real-time inference endpoint, which is backed by a long-running compute instance, serverless endpoints provision resources on demand with built-in auto scaling.

  Genre: Press Release (0.31)
  Industry: Retail > Online (0.40)

Train and deploy deep learning models using JAX with Amazon SageMaker

#artificialintelligence

Amazon SageMaker is a fully managed service that enables developers and data scientists to quickly and easily build, train, and deploy machine learning (ML) models at any scale. Typically, you can use the pre-built and optimized training and inference containers that have been optimized for AWS hardware. Although those containers cover many deep learning workloads, you may have use cases where you want to use a different framework or otherwise customize the contents of your OS libraries within the container. To accommodate this, SageMaker provides the flexibility to train models using any framework that can run in a Docker container. This functionality enables you to use existing SageMaker training capabilities such as training jobs, hyperparameter tuning, and Managed Spot Training.


Optimizing TensorFlow model serving with Kubernetes and Amazon Elastic Inference Amazon Web Services

#artificialintelligence

The only aspect of the code that isn't straightforward is the need to enable EC2 instance termination protection while workers are processing videos, as shown in the following code example: After the job processes, a similar API call disables termination protection. This example application uses termination protection because the jobs are long-running, and you don't want an EC2 instance terminated during a scale-in event if it is still processing a video. You can easily modify the inference code and optimize it for your use case, so this post doesn't spend further time examining it. To review the Dockerfile for the inference code, see the amazon-elastic-inference-eks GitHub repo, under the /Dockerfile directory. The code itself is in the test.py