Probability Smoothing for Natural Language Processing - Lazy Programmer

#artificialintelligence 

This is a very basic technique that can be applied to most machine learning algorithms you will come across when you're doing NLP. Suppose for example, you are creating a "bag of words" model, and you have just collected data from a set of documents with a very small vocabulary. You would naturally assume that the probability of seeing the word "cat" is 1/3, and similarly P(dog) 1/3 and P(parrot) 1/3. Now, suppose I want to determine the probability of P(mouse). Since "mouse" does not appear in my dictionary, its count is 0, therefore P(mouse) 0. If you wanted to do something like calculate a likelihood, you'd have P(document) P(words that are not mouse) \times P(mouse) 0 We simply add 1 to the numerator and the vocabulary size (V total number of distinct words) to the denominator of our probability estimate.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found