PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU

Open in new window