Probability Smoothing for Natural Language Processing - Lazy Programmer

May-11-2016, 18:21:26 GMT–#artificialintelligence

This is a very basic technique that can be applied to most machine learning algorithms you will come across when you're doing NLP. Suppose for example, you are creating a "bag of words" model, and you have just collected data from a set of documents with a very small vocabulary. You would naturally assume that the probability of seeing the word "cat" is 1/3, and similarly P(dog) 1/3 and P(parrot) 1/3. Now, suppose I want to determine the probability of P(mouse). Since "mouse" does not appear in my dictionary, its count is 0, therefore P(mouse) 0. If you wanted to do something like calculate a likelihood, you'd have P(document) P(words that are not mouse) \times P(mouse) 0 We simply add 1 to the numerator and the vocabulary size (V total number of distinct words) to the denominator of our probability estimate.

artificial intelligence, machine learning, natural language, (9 more...)

#artificialintelligence

May-11-2016, 18:21:26 GMT

News Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Machine Learning > Learning Graphical Models (0.52)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found