Efficient Hybrid Inference for LLMs: Reward-Based Token Modelling with Selective Cloud Assistance

Open in new window