Learning to Skip for Language Modeling