LearningandTransferringSparseContextualBigrams withLinearTransformers