Scaling On-Device GPU Inference for Large Generative Models