Kraken: InherentlyParallelTransformersFor EfficientMulti-DeviceInference

Neural Information Processing Systems 

Large Transformer networks are increasingly used in settings where low inference latency is necessary to enable new applications and improve the end-user experience.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found