CogLTX: Applying BERT to Long Texts Chang Zhou Tsinghua University

Neural Information Processing Systems 

BERT is incapable of processing long texts due to its quadratically increasing memory and time consumption. The most natural ways to address this problem, such as slicing the text by a sliding window or simplifying transformers, suffer from insufficient long-range attentions or need customized CUDA kernels. The maximum length limit in BERT reminds us the limited capacity (5 9 chunks) of the working memory of humans --- then how do human beings Cognize Long TeXts?

Similar Docs  Excel Report  more

TitleSimilaritySource
None found