Characterizing and Optimizing LLM Inference Workloads on CPU-GPU Coupled Architectures

Open in new window