Multi-armed bandits for resource efficient, online optimization of language model pre-training: the use case of dynamic masking

Open in new window