bentoml
On the Cost of Model-Serving Frameworks: An Experimental Evaluation
De Rosa, Pasquale, Bromberg, Yérom-David, Felber, Pascal, Mvondo, Djob, Schiavoni, Valerio
In machine learning (ML), the inference phase is the process of applying pre-trained models to new, unseen data with the objective of making predictions. During the inference phase, end-users interact with ML services to gain insights, recommendations, or actions based on the input data. For this reason, serving strategies are nowadays crucial for deploying and managing models in production environments effectively. These strategies ensure that models are available, scalable, reliable, and performant for real-world applications, such as time series forecasting, image classification, natural language processing, and so on. In this paper, we evaluate the performances of five widely-used model serving frameworks (TensorFlow Serving, TorchServe, MLServer, MLflow, and BentoML) under four different scenarios (malware detection, cryptocoin prices forecasting, image classification, and sentiment analysis). We demonstrate that TensorFlow Serving is able to outperform all the other frameworks in serving deep learning (DL) models. Moreover, we show that DL-specific frameworks (TensorFlow Serving and TorchServe) display significantly lower latencies than the three general-purpose ML frameworks (BentoML, MLFlow, and MLServer).
- Europe > Switzerland > Neuchâtel > Neuchâtel (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > France > Brittany > Ille-et-Vilaine > Rennes (0.04)
- (7 more...)
- Information Technology > Security & Privacy (1.00)
- Banking & Finance > Trading (1.00)
Machine Learning Streaming with Kafka, Debezium, and BentoML
Putting a Machine Learning project to life is not a simple task and, just like any other software product, it requires many different kinds of knowledge: infrastructure, business, data science, etc. I must confess that, for a long time, I just neglected the infrastructure part, making my projects rest in peace inside Jupiter notebooks. But as soon as I started learning it, I realized that is a very interesting topic. Machine learning is still a growing field and, in comparison with other IT-related areas like Web development, the community still has a lot to learn. Luckily, in the last years we have seen a lot of new technologies arise to help us build an ML application, like Mlflow, Apache Spark's Mlib, and BentoML, explored in this post. In this post, a machine learning architecture is explored with some of these technologies to build a real-time price recommender system. To bring this concept to life, we needed not only ML-related tools (BentoML & Scikit-learn) but also other software pieces (Postgres, Debezium, Kafka). Of course, this is a simple project that doesn't even have a user interface, but the concepts explored in this post could be easily extended to many cases and real scenarios. I hope this post helped you somehow, I am not an expert in any of the subjects discussed, and I strongly recommend further reading (see some references below).
BentoML: Create an ML Powered Prediction Service in Minutes
You have just built a machine learning model to predict which group a customer belongs to. The model seems to do a good job in segmenting your customers. You decide to give this model to your team members so that they can develop a web application on top of your model. Wait, but how will you ship this model to your team members? Wouldn't it be nice if your team members can use your model without setting up any environment or messing with your code?
GitHub - bentoml/BentoML: Model Serving Made Easy
BentoML is a flexible, high-performance framework for serving, managing, and deploying machine learning models. By providing a standard interface for describing a prediction service, BentoML abstracts away how to run model inference efficiently and how model serving workloads can integrate with cloud infrastructures. Be sure to check out deployment overview doc to understand which deployment option is best suited for your use case. BentoML provides APIs for defining a prediction service, a servable model so to speak, which includes the trained ML model itself, plus its pre-processing, post-processing code, input/output specifications and dependencies. The generated BentoML bundle is a file directory that contains all the code files, serialized models, and configs required for reproducing this prediction service for inference. BentoML automatically captures all the python dependencies information and have everything versioned and managed together in one place.
8 Alternatives to TensorFlow Serving
TensorFlow Serving is an easy-to-deploy, flexible and high performing serving system for machine learning models built for production environments. It allows easy deployment of algorithms and experiments while allowing developers to keep the same server architecture and APIs. TensorFlow Serving provides seamless integration with TensorFlow models, and can also be easily extended to other models and data. Open-source platform Cortex makes execution of real-time inference at scale seamless. It is designed to deploy trained machine learning models directly as a web service in production.
Serve Your ML Models in AWS Using Python
Automate your ML model train-deploy cycle, garbage collection, and rollbacks, all from Python with an open-source PyPi package based on Cortex. It all started with modernization of a product categorization project. The goal was to replace complex low-level Docker commands with a very simple and user-friendly deployment utility called Cortex. The solution in the form of a Python package proved to be re-usable since we successfully used it as part of our recommendation engine project. We plan to deploy all ML projects like this. Since GLAMI relies heavily on open-source software, we wanted to contribute back and decided to open-source the package, calling it Cortex Serving Client.
- Information Technology > Software (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (0.62)