96671501524948bc3937b4b30d0e57b9-Paper.pdf

Neural Information Processing Systems 

BERT is incapable of processing long texts due to its quadratically increasing memory andtimeconsumption. Themost natural waystoaddress thisproblem, such as slicing the text by a sliding window or simplifying transformers, suffer from insufficient long-range attentions orneed customized CUDAkernels.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found