DP-LLM: Runtime Model Adaptation with Dynamic Layer-wise Precision Assignment
–Neural Information Processing Systems
How can we effectively handle queries for on-device large language models (LLMs) with varying runtime constraints, such as latency and accuracy?
Neural Information Processing Systems
Jun-22-2026, 21:48:25 GMT