On the Inductive Bias of Masked Language Modeling: From Statistical to Syntactic Dependencies

Open in new window