Interpolating between types and tokens by estimating power-law generators

Goldwater, Sharon, Johnson, Mark, Griffiths, Thomas L.

Neural Information Processing Systems 

Standard statistical models of language fail to capture one of the most striking properties of natural languages: the power-law distribution in the frequencies of word tokens. We present a framework for developing statistical models that generically produce power-laws, augmenting standard generativemodels with an adaptor that produces the appropriate pattern of token frequencies. We show that taking a particular stochastic process - the Pitman-Yor process - as an adaptor justifies the appearance of type frequencies in formal analyses of natural language, and improves the performance of a model for unsupervised learning of morphology.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found