Interpolating between types and tokens by estimating power-law generators

Goldwater, Sharon, Johnson, Mark, Griffiths, Thomas L.

Dec-31-2006–Neural Information Processing Systems

Standard statistical models of language fail to capture one of the most striking properties of natural languages: the power-law distribution in the frequencies of word tokens. We present a framework for developing statistical models that generically produce power-laws, augmenting standard generativemodels with an adaptor that produces the appropriate pattern of token frequencies. We show that taking a particular stochastic process - the Pitman-Yor process - as an adaptor justifies the appearance of type frequencies in formal analyses of natural language, and improves the performance of a model for unsupervised learning of morphology.

artificial intelligence, frequency, text processing, (18 more...)

Neural Information Processing Systems

Dec-31-2006

Conferences PDF

Add feedback

Country:
- North America > United States (0.28)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Natural Language > Text Processing (0.49)

Duplicate Docs Excel Report

Title
Interpolating between types and tokens by estimating power-law generators
Interpolating between types and tokens by estimating power-law generators

Similar Docs Excel Report more

Title	Similarity	Source
None found