Prompt-Aware Scheduling for Efficient Text-to-Image Inferencing System

Agarwal, Shubham, Iqbal, Saud, Mitra, Subrata

Jan-28-2025–arXiv.org Artificial Intelligence

Traditional ML models utilize controlled approximations during high loads, employing faster, but less accurate models in a process called accuracy scaling. However, this method is less effective for generative text-to-image models due to their sensitivity to input prompts and performance degradation caused by large model loading overheads. This work introduces a novel text-to-image inference system that optimally matches prompts across multiple instances of the same model operating at various approximation levels to deliver high-quality images under high loads and fixed budgets.

artificial intelligence, machine learning, violation, (14 more...)

arXiv.org Artificial Intelligence

Jan-28-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (0.71)
  - Vision (1.00)