AITopics | torchserve

Collaborating Authors

torchserve

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

On the Cost of Model-Serving Frameworks: An Experimental Evaluation

De Rosa, Pasquale, Bromberg, Yérom-David, Felber, Pascal, Mvondo, Djob, Schiavoni, Valerio

arXiv.org Artificial IntelligenceNov-15-2024

In machine learning (ML), the inference phase is the process of applying pre-trained models to new, unseen data with the objective of making predictions. During the inference phase, end-users interact with ML services to gain insights, recommendations, or actions based on the input data. For this reason, serving strategies are nowadays crucial for deploying and managing models in production environments effectively. These strategies ensure that models are available, scalable, reliable, and performant for real-world applications, such as time series forecasting, image classification, natural language processing, and so on. In this paper, we evaluate the performances of five widely-used model serving frameworks (TensorFlow Serving, TorchServe, MLServer, MLflow, and BentoML) under four different scenarios (malware detection, cryptocoin prices forecasting, image classification, and sentiment analysis). We demonstrate that TensorFlow Serving is able to outperform all the other frameworks in serving deep learning (DL) models. Moreover, we show that DL-specific frameworks (TensorFlow Serving and TorchServe) display significantly lower latencies than the three general-purpose ML frameworks (BentoML, MLFlow, and MLServer).

machine learning, natural language, torchserve, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/IC2E61754.2024.00032

2411.10337

Country:

Europe > Switzerland > Neuchâtel > Neuchâtel (0.05)
North America > United States > New York > New York County > New York City (0.04)
Europe > France > Brittany > Ille-et-Vilaine > Rennes (0.04)
(7 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Packrat: Automatic Reconfiguration for Latency Minimization in CPU-based DNN Serving

Bhardwaj, Ankit, Phanishayee, Amar, Narayanan, Deepak, Tarta, Mihail, Stutsman, Ryan

arXiv.org Artificial IntelligenceNov-29-2023

In this paper, we investigate how to push the performance limits of serving Deep Neural Network (DNN) models on CPU-based servers. Specifically, we observe that while intra-operator parallelism across multiple threads is an effective way to reduce inference latency, it provides diminishing returns. Our primary insight is that instead of running a single instance of a model with all available threads on a server, running multiple instances each with smaller batch sizes and fewer threads for intra-op parallelism can provide lower inference latency. However, the right configuration is hard to determine manually since it is workload- (DNN model and batch size used by the serving system) and deployment-dependent (number of CPU cores on server). We present Packrat, a new serving system for online inference that given a model and batch size ($B$) algorithmically picks the optimal number of instances ($i$), the number of threads each should be allocated ($t$), and the batch sizes each should operate on ($b$) that minimizes latency. Packrat is built as an extension to TorchServe and supports online reconfigurations to avoid serving downtime. Averaged across a range of batch sizes, Packrat improves inference latency by 1.43$\times$ to 1.83$\times$ on a range of commonly used DNNs.

configuration, latency, packrat, (16 more...)

arXiv.org Artificial Intelligence

2311.18174

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Austria > Vienna (0.14)
North America > United States > Utah (0.04)
(20 more...)

Genre: Research Report (0.64)

Industry: Information Technology (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Serving ML Models with TorchServe

#artificialintelligenceApr-1-2023, 04:15:26 GMT

This post will walk you through a process of serving your deep learning Torch model with the TorchServe framework. There are quite a bit of articles about this topic. However, typically they are focused either on deploying TorchServe itself or on writing custom handlers and getting the end results. That was a motivation for me to write this post. It covers both parts and gives end-to-end example.

model architecture, port, torchserve, (13 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

Top Tools To Do Machine Learning Serving In Production

#artificialintelligenceNov-12-2022, 04:35:06 GMT

Creating a model is one thing, but using that model in production is quite another. The next step after a data scientist completes a model is to deploy it so that it can serve the application. Batch and online model serving are the two main categories. Batch refers to feeding a large amount of data into a model and writing the results to a table, usually as a scheduled operation. You must deploy the model online using an endpoint for applications to send a request to the model and receive a quick response with no latency.

application, deployment, interface, (15 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

Serving PyTorch Models Using TorchServe - Supertype

#artificialintelligenceAug-22-2022, 11:48:55 GMT

Model serving has always been a crucial process in MLOps as it decides whether an AI product will be accessible to the user. Upon developing a model that can perform a certain task, the next step is to serve the model so that it is accessible through an API, hence enabling applications to incorporate AI into the system. This process also includes model monitoring and management, which gives the ability to ensure that the model can function properly and scale the model on demand. Various tools have been built as a solution to serve models. Don't worry if some of the terms does not make any sense to you yet.

handler, prediction, torchserve, (13 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.42)

Add feedback

GitHub - pytorch/serve: Serve, optimize and scale PyTorch models in production

#artificialintelligenceAug-22-2022, 11:48:54 GMT

TorchServe is a flexible and easy to use tool for serving and scaling PyTorch models in production. To learn more about how to contribute, see the contributor guide here. This repository is jointly operated and maintained by Amazon, Meta and a number of individual contributors listed in the CONTRIBUTORS file. For questions directed at Meta, please send an email to opensource@fb.com. For questions directed at Amazon, please send an email to torchserve@amazon.com.

github, optimize and scale pytorch model, pytorch serve, (4 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MODEL SERVING IN PYTORCH

#artificialintelligenceJan-15-2022, 04:25:30 GMT

Deploying ML models in Production and scaling your ML services still continue to be big challenge. TorchServe, the model serving solution for PyTorch solves this problem and has now evolved into a multi-platform solution that can run on-prem or on any cloud with integrations for major OSS platforms like Kubernetes, MLflow, Kubeflow Pipelines, KServe. This talk will cover new features launched in TorchServe like model interpretability using Captum, best practices for production deployments in a responsible manner, along with examples of how companies like Amazon Ads, Meta AI and broader PyTorch community are using TorchServe.

model serving, pytorch, torchserve

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Azure ML (AML) Alternatives for MLOps - neptune.ai

#artificialintelligenceOct-23-2021, 18:55:32 GMT

Azure Machine Learning (AML) is a cloud-based machine learning service for data scientists and ML engineers. You can use AML to manage the machine learning lifecycle--train, develop, and test models, but also run MLOps processes with speed, efficiency, and quality. For organizations that want to scale ML operations and unlock the potential of AI, tools like AML are important. Creating machine learning solutions that drive business growth becomes much easier. But what if you don't need a comprehensive MLOps solution like AML? Maybe you want to build your own stack, and need specific tools for tasks like tracking, deployment, or for managing other key parts of MLOps? Experiment tracking documents every piece of information that you care about during your ML experiments. Machine learning is an iterative process, so this is really important. Azure ML provides experimental tracking for all metrics in the machine learning environment.

azure ml, experiment, platform, (15 more...)

#artificialintelligence

Industry:

Education (1.00)
Information Technology > Services (0.90)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

8 Alternatives to TensorFlow Serving

#artificialintelligenceJun-14-2021, 19:50:43 GMT

TensorFlow Serving is an easy-to-deploy, flexible and high performing serving system for machine learning models built for production environments. It allows easy deployment of algorithms and experiments while allowing developers to keep the same server architecture and APIs. TensorFlow Serving provides seamless integration with TensorFlow models, and can also be easily extended to other models and data. Open-source platform Cortex makes execution of real-time inference at scale seamless. It is designed to deploy trained machine learning models directly as a web service in production.

deployment, information, tensorflow serving, (12 more...)

#artificialintelligence

Industry: Information Technology (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.44)

Add feedback

Bootstrap your own Handler: How and why to create custom handlers for PyTorch's TorchServe

#artificialintelligenceJun-2-2021, 23:11:00 GMT

TorchServe is a great tool to deploy trained PyTorch models, there is no denying that. But, as with any relatively new project, it is still creating a community around it to help with the more niche aspects of its implementation. As part of this community, we can contribute to this. So today, we will be discussing how to develop advanced custom handlers with PyTorch's TorchServe. We will also be reviewing the process of saving your PyTorch model with torch-model-archiver and how to include all the new artifacts created while we are at it.

custom handler, handler, torchserve, (16 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback