LLMCad: Fast and Scalable On-device Large Language Model Inference

Open in new window