Inference Acceleration for Large Language Models on CPUs