Blockwise Parallel Transformers for Large Context Models

Neural Information Processing Systems 

Model sizes range from 1B to 70B.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found