Scalable and Cost-Efficient ML Inference: Parallel Batch Processing with Serverless Functions

Jan-30-2025–arXiv.org Artificial Intelligence

As data-intensive applications grow, batch processing in limited-resource environments faces scalability and resource management challenges. Serverless computing offers a flexible alternative, enabling dynamic resource allocation and automatic scaling. This paper explores how serverless architectures can make large-scale ML inference tasks faster and cost-effective by decomposing monolithic processes into parallel functions. Through a case study on sentiment analysis using the DistilBERT model and the IMDb dataset, we demonstrate that serverless parallel processing can reduce execution time by over 95% compared to monolithic approaches, at the same cost.

artificial intelligence, batch processing, natural language, (13 more...)

arXiv.org Artificial Intelligence

Jan-30-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States > Michigan (0.04)

Genre:
- Research Report (0.83)

Technology:
- Information Technology > Artificial Intelligence > Natural Language
  - Information Extraction (0.35)
  - Discourse & Dialogue (0.35)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found