Hogwild! Inference: Parallel LLM Generation via Concurrent Attention

Open in new window