Publication
This is a chapter in the larger work Introduction to Information Retrieval.
We begin this chapter with a general introduction to the text classification problem including a formal definition (Section 13.1); we then cover Naive Bayes, a particularly simple and effective classification method (Sections 13.2-13.4). All of the classification algorithms we study represent documents in high-dimensional spaces. To improve the efficiency of these algorithms, it is generally desirable to reduce the dimensionality of these spaces; to this end, a technique known as feature selection is commonly applied in text classification as discussed in Section 13.5 . Section 13.6 covers evaluation of text classification. In the following chapters, Chapters 14 & 15, we look at two other families of classification methods, vector space classifiers and support vector machines.
Source
By Manning, C.D., Raghavan, P., and Schütze, H., 2008