Scaling Up On-Device LLMs via Active-Weight Swapping Between DRAM and Flash

Open in new window