Document Type Classification in Online Digital Libraries

Caragea, Cornelia (University of North Texas) | Wu, Jian (Pennsylvania State University) | Gollapalli, Sujatha Das (Institute for Infocomm Research, A*STAR) | Giles, C. Lee (Pennsylvania State University)

AAAI Conferences 

Online digital libraries make it easier for researchers to search for scientific information. They have been proven as powerful resources in many data mining, machine learning and information retrieval applications that require high-quality data. The quality of the data highly depends on the accuracy of classifiers that identify the types of documents that are crawled from the Web, e.g., as research papers, slides, books, etc., for appropriate indexing. These classifiers in turn depend on the choice of the feature representation. We propose novel features that result in high-accuracy classifiers for document type classification. Experimental results on several datasets show that our classifiers outperform models that are employed in current systems.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found