Memory- and Latency-Constrained Inference of Large Language Models via Adaptive Split Computing

Open in new window