Web Page Classification Based on Uncorrelated Semi-Supervised Intra-View and Inter-View Manifold Discriminant Feature Extraction
Jing, Xiao-Yuan (Wuhan University) | Liu, Qian (Wuhan University and Nanjing University of Posts and Telecommunications) | Wu, Fei (Wuhan University) | Xu, Baowen (Wuhan University) | Zhu, Yangping (Wuhan University) | Chen, Songcan (Nanjing University of Aeronautics and Astronautics)
Web page classification has attracted increasing research interest. It is intrinsically a multi-view and semi-supervised application, since web pages usually contain two or more types of data, such a text, hyperlinks and images, and unlabeled pages are generally much more than labeled ones. Web page data is commonly high-dimensional. Thus, how to extract useful features from this kind of data in the multi-view semi-supervised scenario is important for web page classification. To our knowledge, only one method is specially presented for this topic. And with respect to a few semi-supervised multi-view feature extraction methods on other applications, there still exists much room for improvement. In this paper, we firstly design a feature extraction schema called semi-supervised intra-view and inter-view manifold discriminant (SI2MD) learning, which sufficiently utilizes the intra-view and inter-view discriminant information of labeled samples and the local neighborhood structures of unlabeled samples. We then design a semi-supervised uncorrelation constraint for the SI2MD schema to remove the multi-view correlation in the semi-supervised scenario. By combining the SI2MD schema with the constraint, we propose an uncorrelated semi-supervised intra-view and inter-view manifold discriminant (USI2MD) learning approach for web page classification. Experiments on public web page databases validate the proposed approach.
Jul-15-2015
- Country:
- Asia > China
- Hubei Province > Wuhan (0.04)
- Jiangsu Province > Nanjing (0.04)
- Asia > China
- Genre:
- Research Report (0.68)
- Technology: