Maron, M. E.
The fundamental thesis says, in effect, that statistics on kind, frequency, location, order, etc., of selected words are adequate to make reasonably good predictions about the subject matter of documents containing those words. Given this approach to automatic indexing, two problems present themselves, viz., the selection of clue words and the prediction techniques relating clue words and subject categories. Statistical data relating clue words and subject categories constitute hypotheses. Another and different class of documents was obtained and using the statistical data gathered initially, a machine was programmed to index automatically the documents in question.