How Billions of Trivial Data Points can Lead to Understanding
Peter Norvig (Director of Research, Google) presents as part of the UBC Department of Computer Science's Distinguished Lecture Series, September 23, 2010.
In decades past, models of human language were wrought from the sweat and pencils of linguists. In the modern day, it is more common to think of language modeling as an exercise in probabilistic inference from data: we observe how words and combinations of words are used, and from that build computer models of what the phrases mean. This approach is hopeless with a small amount of data, but somewhere in the range of millions or billions of examples, we pass a threshold, and the hopeless suddenly becomes effective, and computer models sometimes meet or exceed human performance. This talk gives examples of the data available in large repositories of text, images, and videos, and shows some tasks that can be accomplished with the resulting models.