Feng, Wei
Integrating Semantic Relatedness and Words' Intrinsic Features for Keyword Extraction
Zhang, Wei (Tsinghua University) | Feng, Wei (Tsinghua University) | Wang, Jianyong (Tsinghua University)
Keyword extraction attracts much attention for its significant role in various natural language processing tasks. While some existing methods for keyword extraction have considered using single type of semantic relatedness between words or inherent attributes of words, almost all of them ignore two important issues: 1) how to fuse multiple types of semantic relations between words into a uniform semantic measurement and automatically learn the weights of the edges between the words in the word graph of each document, and 2) how to integrate the relations between words and words' intrinsic features into a unified model. In this work, we tackle the two issues based on the supervised random walk model. We propose a supervised ranking based method for keyword extraction, which is called SEAFARER. It can not only automatically learn the weights of the edges in the unified graph of each document which includes multiple semantic relations but also combine the merits of semantic relations of edges and intrinsic attributes of nodes together. We conducted extensive experimental study on an established benchmark and the experimental results demonstrate that SEAFARER outperforms the state-of-the-art supervised and unsupervised methods.
Higher-Order Markov Tag-Topic Models for Tagged Documents and Images
Zeng, Jia, Feng, Wei, Cheung, William K., Li, Chun-Hung
This paper studies the topic modeling problem of tagged documents and images. Higher-order relations among tagged documents and images are major and ubiquitous characteristics, and play positive roles in extracting reliable and interpretable topics. In this paper, we propose the tag-topic models (TTM) to depict such higher-order topic structural dependencies within the Markov random field (MRF) framework. First, we use the novel factor graph representation of latent Dirichlet allocation (LDA)-based topic models from the MRF perspective, and present an efficient loopy belief propagation (BP) algorithm for approximate inference and parameter estimation. Second, we propose the factor hypergraph representation of TTM, and focus on both pairwise and higher-order relation modeling among tagged documents and images. Efficient loopy BP algorithm is developed to learn TTM, which encourages the topic labeling smoothness among tagged documents and images. Extensive experimental results confirm the incorporation of higher-order relations to be effective in enhancing the overall topic modeling performance, when compared with current state-of-the-art topic models, in many text and image mining tasks of broad interests such as word and link prediction, document classification, and tag recommendation.