TAPAS: Thermal- and Power-Aware Scheduling for LLM Inference in Cloud Platforms

Open in new window