Tangram: Accelerating Serverless LLM Loading through GPU Memory Reuse and Affinity

Open in new window