As customers continue to come up with new use-cases for machine learning, data gravity is as important as ever. Where latency and network connectivity is not an issue, generating data in one location (such as a manufacturing facility) and sending it to the cloud for inference is acceptable for some use-cases. With other critical use-cases, such as fraud detection for financial transactions, product quality in manufacturing, or analyzing video surveillance in real-time, customers are faced with the challenges that come with having to move that data to the cloud first. One of the challenges customers are facing with performing inference in the cloud is the lack of real-time inference and/or security requirements preventing user data to be sent or stored in the cloud. Tens of thousands of customers use Amazon SageMaker to accelerate their Machine Learning (ML) journey by helping data scientists and developers to prepare, build, train, and deploy machine learning models quickly.
In today's world, being able to quickly bring on-premises machine learning (ML) models to the cloud is an integral part of any cloud migration journey. This post provides a step-by-step guide for launching a solution that facilitates the migration journey for large-scale ML workflows. This solution was developed by the Amazon ML Solutions Lab for customers with streaming data applications (e.g., predictive maintenance, fleet management, autonomous driving). Some of the AWS services used in this solution include Amazon SageMaker, which is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy ML models quickly, and Amazon Kinesis, which helps with real-time data ingestion at scale. Being able to automatically refresh ML models with new data can be of high value to any business when an ML model drifts.
Cloud security at AWS is the highest priority. Amazon SageMaker Studio offers various mechanisms to protect your data and code using integration with AWS security services like AWS Identity and Access Management (IAM), AWS Key Management Service (AWS KMS), or network isolation with Amazon Virtual Private Cloud (Amazon VPC). Customers in highly regulated industries, like financial services, can set up Studio in VPC only mode to enable network isolation and disable internet access from Studio notebooks. You can use IAM integration with Studio to control which users have access to resources like Studio notebooks, the Studio IDE, or Amazon SageMaker training jobs. A popular use case is to restrict access to the Studio IDE to only users from inside a specified network CIDR range or a designated VPC.
Amazon SageMaker Studio is the first fully integrated development environment (IDE) for machine learning (ML). With a single click, data scientists and developers can quickly spin up Studio notebooks to explore and prepare datasets to build, train, and deploy ML models in a single pane of glass. We're excited to announce a new set of capabilities that enable interactive Spark-based data processing from Studio notebooks. Data scientists and data engineers can now visually browse, discover, and connect to Spark data processing environments running on Amazon EMR, right from your Studio notebooks in a few simple clicks. After you're connected, you can interactively query, explore and visualize data, and run Spark jobs to prepare data using the built-in SparkMagic notebook environments for Python and Scala.
Amazon SageMaker helps data scientists and developers prepare, build, train, and deploy high-quality machine learning (ML) models quickly by bringing together a broad set of capabilities purpose-built for ML. SageMaker accelerates innovation within your organization by providing purpose-built tools for every step of ML development, including labeling, data preparation, feature engineering, statistical bias detection, AutoML, training, tuning, hosting, explainability, monitoring, and workflow automation. Companies are increasingly training ML models based on individual user data. For example, an image sharing service designed to enable discovery of information on the internet trains custom models based on each user's uploaded images and browsing history to personalize recommendations for that user. The company can also train custom models based on search topics for recommending images per topic.