Kraken: Inherently Parallel Transformers For Efficient Multi-Device Inference

Open in new window