Document Type Classification in Online Digital Libraries
Caragea, Cornelia (University of North Texas) | Wu, Jian (Pennsylvania State University) | Gollapalli, Sujatha Das (Institute for Infocomm Research, A*STAR) | Giles, C. Lee (Pennsylvania State University)
Online digital libraries make it easier for researchers to search for scientific information. They have been proven as powerful resources in many data mining, machine learning and information retrieval applications that require high-quality data. The quality of the data highly depends on the accuracy of classifiers that identify the types of documents that are crawled from the Web, e.g., as research papers, slides, books, etc., for appropriate indexing. These classifiers in turn depend on the choice of the feature representation. We propose novel features that result in high-accuracy classifiers for document type classification. Experimental results on several datasets show that our classifiers outperform models that are employed in current systems.
Feb-10-2016
- Country:
- North America > United States
- Texas > Denton County
- Denton (0.14)
- Pennsylvania
- Centre County > University Park (0.04)
- Allegheny County > Pittsburgh (0.04)
- New York > New York County
- New York City (0.04)
- Texas > Denton County
- Asia
- Singapore (0.04)
- Middle East > Jordan (0.04)
- North America > United States
- Technology: