AD DROPforToken LevelTasks
–Neural Information Processing Systems
Fortoken-leveltasks (e.g., NER and text generation), as we have several logit outputs to produce the corresponding attribution matrices for each attention map, applyingAD-DROPhas the challenge ofhowtofuse theseattributionmatrices. The results on the test sets are reported in Table 1 and Table 2. Moreover, to verify thatAD-DROPcan be adapted to other pre-trained models, for CoNLL-2003 NER, we chooseELECTRAasthebasemodel.ForWMT2016,OPUS-MTischosen. We discuss potential limitations ofAD-DROP as follows.
Neural Information Processing Systems
Feb-8-2026, 22:12:50 GMT
- Technology: