South America
AnEfficientAsynchronousMethodforIntegrating EvolutionaryandGradient-basedPolicySearch
These have the opposite properties, with DRL having good sample efficiencyandpoor stability, while ESbeing vice versa. Recently,there havebeen attempts tocombine these algorithms, butthesemethods fullyrelyonsynchronous updatescheme, making it not ideal to maximize the benefits of the parallelism in ES.
AD-DROP: Attribution-DrivenDropoutforRobust LanguageModelFine-Tuning
Pre-training large language models (PrLMs) on massive unlabeled corpora and fine-tuning them on downstream tasks has become a new paradigm [1-3]. Their success can be partly attributed to the self-attention mechanism [4], yet these self-attention networks are often redundant [5, 6] and tend to cause overfitting when fine-tuned on downstream tasks due to the mismatch between their overparameterization and the limited annotated data [7-13]. To address this issue, various regularization techniques such as data augmentation [14, 15], adversarial training [16, 17]), and dropout-based methods [11,13,18]have been developed.