Efficient Deployment of Large Language Models on Resource-constrained Devices