M\'elange: Cost Efficient Large Language Model Serving by Exploiting GPU Heterogeneity

Open in new window