r/textdatamining - LDA in Python – How to grid search best topic models? (A Comprehensive LDA Tutorial)
Yes, but it also groups different words that have the same base form. So biographies and texts about animals might be wrongly grouped together, introducing noise in the corpus. I suspect, depending on the language, that this can happen a lot (or not) and greatly influence the process. I know it helps in Finnish and French and doesn't help in Swedish (with the texts I've used; I have compared LDA output on lemmatised and non-lemmatised versions of the same corpus), I was wondering if you had experience with other languages?
May-21-2018, 00:26:39 GMT
- Technology: