Dual-Clustering Maximum Entropy with Application to Classification and Word Embedding