Goto

Collaborating Authors

 Statistical Learning


Cross-People Mobile-Phone Based Activity Recognition

AAAI Conferences

Activity recognition using mobile phones has great potential in many applications including mobile healthcare. In order to let a person easily know whether he is in strict compliance with the doctor's exercise prescription and adjust his exercise amount accordingly, we can use a smart-phone based activity reporting system to accurately recognize a range of daily activities and report the duration of each activity. A triaxial accelerometer embedded in the smart phone is used for the classification of several activities, such as staying still, walking, running, and going upstairs and downstairs. The model learnt from a specific person often cannot yield accurate results when used on a different person. To solve the cross-people activity recognition problem, we propose an algorithm known as TransEMDT (Transfer learning EMbedded Decision Tree) that integrates a decision tree and the k-means clustering algorithm for personalized activity-recognition model adaptation. Tested on a real-world data set, the results show that our algorithm outperforms several traditional baseline algorithms.


Kinship Verification Through Transfer Learning

AAAI Conferences

Because of the inevitable impact factors such as pose, expression, lighting and aging on faces, identity verification through faces is still an unsolved problem. Research on biometrics raises an even challenging problem — is it possible to determine the kinship merely based on face images? A critical observation that faces of parents captured while they were young are more alike their children's compared with images captured when they are old has been revealed by genetics studies. This enlightens us the following research. First, a new kinship database named UB KinFace composed of child, young parent and old parent face images is collected from Internet. Second, an extended transfer subspace learning method is proposed aiming at mitigating the enormous divergence of distributions between children and old parents. The key idea is to utilize an intermediate distribution close to both the source and target distributions to bridge them and reduce the divergence. Naturally the young parent set is suitable for this task. Through this learning process, the large gap between distributions can be significantly reduced and kinship verification problem becomes more discriminative. Experimental results show that our hypothesis on the role of young parents is valid and transfer learning is effective to enhance the verification accuracy.


A New Search Engine Integrating Hierarchical Browsing and Keyword Search

AAAI Conferences

The original Yahoo! search engine consists of manually organized topic hierarchy of webpages for easy browsing. Modern search engines (such as Google and Bing), on the other hand, return a flat list of webpages based on keywords. It would be ideal if hierarchical browsing and keyword search can be seamlessly combined. The main difficulty in doing so is to automatically (i.e., not manually) classify and rank a massive number of webpages into various hierarchies (such as topics, media types, regions of the world). In this paper we report our attempt towards building this integrated search engine, called SEE (Search Engine with hiErarchy). We implement a hierarchical classification system based on Support Vector Machines, and embed it in SEE. We also design a novel user interface that allows users to dynamically adjust their desire for a higher accuracy vs. more results in any (sub)category of the hierarchy. Though our current search engine is still small (indexing about 1.2 million webpages), the results, including a small user study, have shown a great promise for integrating such techniques in the next-generation search engine.


Non-Linear Monte-Carlo Search in Civilization II

AAAI Conferences

This paper presents a new Monte-Carlo search algorithm for very large sequential decision-making problems. We apply non-linear regression within Monte-Carlo search, online, to estimate a state-action value function from the outcomes of random roll-outs. This value function generalizes between related states and actions, and can therefore provide more accurate evaluations after fewer rollouts. A further significant advantage of this approach is its ability to automatically extract and leverage domain knowledge from external sources such as game manuals. We apply our algorithm to the game of Civilization II, a challenging multi-agent strategy game with an enormous state space and around 10^21 joint actions. We approximate the value function by a neural network, augmented by linguistic knowledge that is extracted automatically from the official game manual. We show that this non-linear value function is significantly more efficient than a linear value function, which is itself more efficient than Monte-Carlo tree search. Our non-linear Monte-Carlo search wins over 78% of games against the built-in AI of Civilization II.


Recommender Systems from "Words of Few Mouths"

AAAI Conferences

This paper identifies a widely existing phenomenon in web data, which we call the "words of few mouths" phenomenon. This phenomenon, in the context of online reviews, refers to the case that a large fraction of the reviews are each voted only by very few users. We discuss the challenges of "words of few mouths" in the development of recommender systems based on users' opinions and advocate probabilistic methodologies to handle such challenges. We develop a probabilistic model and correspondingly a logistic regression based learning algorithm for review helpfulness prediction. Our experimental results indicate that the proposed model outperforms the current state-of-the-art algorithms not only in the presence of the "words of few mouths" phenomenon, but also in the absence of such phenomena.


Predicting Epidemic Tendency through Search Behavior Analysis

AAAI Conferences

The possibility that influenza activity can be generally detected through search log analysis has been explored in recent years. However, previous studies have mainly focused on influenza, and little attention has been paid to other epidemics. With an analysis of web user behavior data, we consider the problem of predicting the tendency of hand-foot -and-mouth disease  (HFMD), whose out-break in 2010 resulted in a great panic in China. In addi-tion to search queries, we consider users’ interactions with search engines. Given the collected search logs, we cluster HFMD-related search queries, medical pages and news reports into the following sets: epidemic-related queries (ERQs), epidemic-related pages (ERPs) and ep-idemic-related news (ERNs). Furthermore, we count their own frequencies as different features, and we conduct a regression analysis with current HFMD occurrences. The experimental results show that these features exhibit good performances on both accuracy and timeliness.


Line Orthogonality in Adjacency Eigenspace with Application to Community Partition

AAAI Conferences

Different from Laplacian or normal matrix, the properties of the adjacency eigenspace received much less attention. Recent work showed that nodes projected into the adjacency eigenspace exhibit an orthogonal line pattern and nodes from the same community locate along the same line. In this paper, we conduct theoretical studies based on graph perturbation to demonstrate why this line orthogonality property holds in the adjacency eigenspace and why it generally disappears in the Laplacian and normal eigenspaces. Using the orthogonality property in the adjacency eigenspace, we present a graph partition algorithm, AdjCluster, which first projects node coordinates to the unit sphere and then applies the classic k-means to find clusters. Empirical evaluations on synthetic data and real-world social networks validate our theoretical findings and show the effectiveness of our graph partition algorithm.


A Wikipedia Based Semantic Graph Model for Topic Tracking in Blogosphere

AAAI Conferences

There are two key issues for information diffusion in blogosphere: (1) blog posts are usually short, noisy and contain multiple themes, (2) information diffusion through blogosphere is primarily driven by the “word-of-mouth” effect, thus making topics evolve very fast. This paper presents a novel topic tracking approach to deal with these issues by modeling a topic as a semantic graph in which the semantic relatedness between terms are learned from Wikipedia. For a given topic/post, the named entities, Wikipedia concepts, and the semantic relatedness are extracted to generate the graph model. Noises are filtered out through a graph clustering algorithm. To handle topic evolution, the topic model is enriched by using Wikipedia as background knowledge. Furthermore, graph edit distance is used to measure the similarity between a topic and its posts. The proposed method is tested using real-world blog data. Experimental results show the advantage of the proposed method on tracking topics in short, noisy text.


Fast Algorithm for Affinity Propagation

AAAI Conferences

Affinity Propagation is a state-of-the-art clustering method recently proposed by Frey and Dueck. It has been successfully applied to broad areas of computer science research because it has much better clustering performance than traditional clustering methods such as k -means. In order to obtain high quality sets of clusters, the original Affinity Propagation algorithm iteratively exchanges real-valued messages between all pairs of data points until convergence. However, this algorithm does not scale for large datasets because it requires quadratic CPU time in the number of data points to compute the messages. This paper proposes an efficient Affinity Propagation algorithm that guarantees the same clustering result as the original algorithm after convergence. The heart of our approach is (1) to prune unnecessary message exchanges in the iterations and (2) to compute the convergence values of pruned messages after the iterations to determine clusters. Experimental evaluations on several different datasets demonstrate the effectiveness of our algorithm.


Entity Linking with Effective Acronym Expansion, Instance Selection and Topic Modeling

AAAI Conferences

Entity linking maps name mentions in the documents to entries in a knowledge base through resolving the name variations and ambiguities. In this paper, we propose three advancements for entity linking. Firstly, expanding acronyms can effectively reduce the ambiguity of the acronym mentions. However, only rule-based approaches relying heavily on the presence of text markers have been used for entity linking. In this paper, we propose a supervised learning algorithm to expand more complicated acronyms encountered, which leads to 15.1% accuracy improvement over state-of-the-art acronym expansion methods. Secondly, as entity linking annotation is expensive and labor intensive, to automate the annotation process without compromise of accuracy, we propose an instance selection strategy to effectively utilize the automatically generated annotation. In our selection strategy, an informative and diverse set of instances are selected for effective disambiguation. Lastly, topic modeling is used to model the semantic topics of the articles. These advancements give statistical significant improvement to entity linking individually. Collectively they lead the highest performance on KBP-2010 task.