Applying Data Science to the Supreme Court: Topic Modeling Over Time with NMF (and a D3.js bonus)
With natural language processing, we have a pile of documents (that's Supreme Court cases in this project), and we need to get to their true essence. Most words aren't helpful in this process, so we drop them (stopwords). We also know that words like "liking" are really the same as "like" in this context (shh don't tell my literature professors from college I said that), so we lemmatize, which means we replace all those -ings with their roots. After this, we have a few choices. We need to turn the words into "vectors" (fancy term for number, really) and use those vectors to inform our topic groups.
Sep-20-2016, 19:25:54 GMT