A provable SVD-based algorithm for learning topics in dominant admixture corpus

Trapit Bansal, Chiranjib Bhattacharyya, Ravindran Kannan

Neural Information Processing Systems 

Topic models, such as Latent Dirichlet Allocation (LDA), posit that documents are drawn from admixtures of distributions over words, known as topics. The inference problem of recovering topics from such a collection of documents drawn from admixtures, is NP-hard. Making a strong assumption called separability, [4] gave the first provable algorithm for inference. For the widely used LDA model, [6] gave a provable algorithm using clever tensor-methods.