IncorporatingBERTinto ParallelSequenceDecodingwithAdapters
–Neural Information Processing Systems
While largescale pre-trained language models such asBERT[5]haveachieved greatsuccess onvariousnatural language understanding tasks,howtoefficiently and effectively incorporate them into sequence-to-sequence models and the corresponding text generation tasks remains a non-trivial problem.
Neural Information Processing Systems
Feb-9-2026, 01:59:01 GMT