Genre
Community Detection in Social Networks Through Community Formation Games
Chen, Wei (Microsoft Research Asia) | Liu, Zhenming (Harvard University) | Sun, Xiaorui (Shanghai Jiao Tong University) | Wang, Yajun (Microsoft Research Asia)
We introduce a game-theoretic framework to address the community detection problem based on the social networks’ structure. The dynamics of community formation is framed as a strategic game called community formation game: Given a social network, each node is selfish and selects communities to join or leave based on her own utility measurement. A community structure can be interpreted as an equilibrium of this game. We formulate the agents’ utility by the combination of a gain function and a loss function. Each agent can select multiple communities, which naturally captures the concept of “overlapping communities”. We propose a gain function based on Newman’s modularity function and a simple loss function that reflects the intrinsic costs incurred when people join the communities. We conduct extensive experiments under this framework; our results show that our algorithm is effective in identifying overlapping communities, and is often better than other algorithms we evaluated especially when many people belong to multiple communities.
Human-Guided Machine Learning for Fast and Accurate Network Alarm Triage
Amershi, Saleema (University of Washington) | Lee, Bongshin (Microsoft Research) | Kapoor, Ashish (Microsoft Research) | Mahajan, Ratul (Microsoft Research) | Christian, Blaine (Microsoft Corporation)
Network alarm triage refers to grouping and prioritizing a stream of low-level device health information to help operators find and fix problems. Today, this process tends to be largely manual because existing rule-based tools cannot easily evolve with the network. We present CueT, a system that uses interactive machine learning to constantly learn from the triaging decisions of operators. It then uses that learning in novel visualizations to help them quickly and accurately triage alarms. Unlike prior interactive machine learning systems, CueT handles a highly dynamic environment where the groups of interest are not known a priori and evolve constantly. Our evaluations with real operators and data from a large network show that CueT significantly improves the speed and accuracy of alarm triage.
CHIME: An Efficient Error-Tolerant Chinese Pinyin Input Method
Zheng, Yabin (Tsinghua University) | Li, Chen (University of California, Irvine) | Sun, Maosong (Tsinghua University)
Chinese Pinyin input methods are very important for Chinese language processing. In many cases, users may make typing errors. For example, a user wants to type in "shenme" (什么, meaning "what" in English) but may type in "shenem" instead. Existing Pinyin input methods fail in converting such a Pinyin sequence with errors to the right Chinese words. To solve this problem, we developed an efficient error-tolerant Pinyin input method called "CHIME'' that can handle typing errors. By incorporating state-of-the-art techniques and language-specific features, the method achieves a better performance than state-of-the-art input methods. It can efficiently find relevant words in milliseconds for an input Pinyin sequence.
Cross-People Mobile-Phone Based Activity Recognition
Zhao, Zhongtang (Chinese Academy of Sciences and Graduate University of the Chinese Academy of Sciences) | Chen, Yiqiang (Chinese Academy of Sciences) | Liu, Junfa (Chinese Academy of Sciences) | Shen, Zhiqi (Nanyang Technological University) | Liu, Mingjie (Institute of Computing Technology and Graduate University of the Chinese Academy of Sciences)
Activity recognition using mobile phones has great potential in many applications including mobile healthcare. In order to let a person easily know whether he is in strict compliance with the doctor's exercise prescription and adjust his exercise amount accordingly, we can use a smart-phone based activity reporting system to accurately recognize a range of daily activities and report the duration of each activity. A triaxial accelerometer embedded in the smart phone is used for the classification of several activities, such as staying still, walking, running, and going upstairs and downstairs. The model learnt from a specific person often cannot yield accurate results when used on a different person. To solve the cross-people activity recognition problem, we propose an algorithm known as TransEMDT (Transfer learning EMbedded Decision Tree) that integrates a decision tree and the k-means clustering algorithm for personalized activity-recognition model adaptation. Tested on a real-world data set, the results show that our algorithm outperforms several traditional baseline algorithms.
Kinship Verification Through Transfer Learning
Xia, Siyu (Southeast University and State University of New York at Buffalo) | Shao, Ming (State University of New York at Buffalo) | Fu, Yun (State University of New York at Buffalo)
Because of the inevitable impact factors such as pose, expression, lighting and aging on faces, identity verification through faces is still an unsolved problem. Research on biometrics raises an even challenging problem — is it possible to determine the kinship merely based on face images? A critical observation that faces of parents captured while they were young are more alike their children's compared with images captured when they are old has been revealed by genetics studies. This enlightens us the following research. First, a new kinship database named UB KinFace composed of child, young parent and old parent face images is collected from Internet. Second, an extended transfer subspace learning method is proposed aiming at mitigating the enormous divergence of distributions between children and old parents. The key idea is to utilize an intermediate distribution close to both the source and target distributions to bridge them and reduce the divergence. Naturally the young parent set is suitable for this task. Through this learning process, the large gap between distributions can be significantly reduced and kinship verification problem becomes more discriminative. Experimental results show that our hypothesis on the role of young parents is valid and transfer learning is effective to enhance the verification accuracy.
Embedding System Dynamics in Agent Based Models for Complex Adaptive Systems
Teose, Maarika (Cornell University) | Ahmadizadeh, Kiyan (Cornell University) | O' (Cornell University) | Mahony, Eoin (Cornell University) | Smith, Rebecca L. (Cornell University) | Lu, Zhao (Cornell University) | Ellner, Stephen P. (Cornell University) | Gomes, Carla (Cornell University) | Grohn, Yrjo
Complex adaptive systems (CAS) are composed of interacting agents, exhibit nonlinear properties such as positive and negative feedback, and tend to produce emergent behavior that cannot be wholly explained by deconstructing the system into its constituent parts. Both system dynamics (equation-based) approaches and agent-based approaches have been used to model such systems, and each has its benefits and drawbacks. In this paper, we introduce a class of agent-based models with an embedded system dynamics model, and detail the semantics of a simulation framework for these models. This model definition, along with the simulation framework, combines agent-based and system dynamics approaches in a way that retains the strengths of both paradigms. We show the applicability of our model by instantiating it for two example complex adaptive systems in the field of Computational Sustainability, drawn from ecology and epidemiology. We then present a more detailed application in epidemiology, in which we compare a previously unstudied intervention strategy to established ones. Our experimental results, unattainable using previous methods, yield insight into the effectiveness of these intervention strategies.
Extending Computer Assisted Assessment Systems with Natural Language Processing, User Modeling and Recommendations Based on Human Computer Interaction and Data Mining
Pascual-Nieto, Ismael (UNED) | Santos, Olga C. (UNED) | Perez-Marin, Diana (Universidad Rey Juan Carlos) | Boticario, Jesus G. (UNED)
Willow is a free-text Adaptive Computer Assisted Assessment system, which supports natural language processing and user modeling. In this paper we discuss the benefits coming from extending Willow with recommendations. The approach combines human computer interaction methods to elicit the recommendations with data mining techniques to adjust their definition. Following a scenario-based approach, 12 recommendations were designed and delivered in a large scale evaluation with 377 learners. A statistically significant positive impact was found on indicators dealing with the engagement in the course, the learning effectiveness and efficiency, as well as the knowledge acquisition. We present the overall system functionality, the interaction among the different subsystems involved and some evaluation findings.
Interest Prediction on Multinomial, Time-Evolving Social Graph
Nori, Nozomi (The University of Tokyo) | Bollegala, Danushka (The University of Tokyo) | Ishizuka, Mitsuru (The University of Tokyo)
We propose a method to predict users’ interests in social media, using time-evolving, multinomial relational data. We exploit various actions performed by users, and their preferences to predict user interests. Actions performed by users in social media such as Twitter, Delicious and Facebook have two fundamental properties. (a) User actions can be represented as high-dimensional or multinomial relations - e.g. referring URLs, bookmarking and tagging, clicking a favorite button on a post etc. (b) User actions are time-varying and user-specific – each user has unique preferences that change over time. Consequently, it is appropriate to represent each user’s action at some point in time as a multinomial relational data. We propose ActionGraph, a novel graph representation for modeling users’ multinomial, time-varying actions. Each user’s action at some time point is represented by an action node. ActionGraph is a bipartite graph whose edges connect an action node to its involving entities, referred to as object nodes. Using real-world social media data, we empirically justify the proposed graph structure. Our experimental results show that the proposed ActionGraph improves the accuracy in a user interest prediction task by outperforming several baselines including standard tensor analysis, a previously proposed state-of-the-art LDA-based method and other graph-based variants. Moreover, the proposed method shows robust performances in the presence of sparse data.
Learning to Identify Review Spam
Li, Fangtao Huang (Tsinghua University) | Huang, Minlie (Tsinghua University) | Yang, Yi (Tsinghua University) | Zhu, Xiaoyan (Tsinghua University)
In the past few years, sentiment analysis and opinion mining becomes a popular and important task. These studies all assume that their opinion resources are real and trustful. However, they may encounter the faked opinion or opinion spam problem. In this paper, we study this issue in the context of our product review mining system. On product review site, people may write faked reviews, called review spam, to promote their products, or defame their competitors' products. It is important to identify and filter out the review spam. Previous work only focuses on some heuristic rules, such as helpfulness voting, or rating deviation, which limits the performance of this task. In this paper, we exploit machine learning methods to identify review spam. Toward the end, we manually build a spam collection from our crawled reviews. We first analyze the effect of various features in spam identification. We also observe that the review spammer consistently writes spam. This provides us another view to identify review spam: we can identify if the author of the review is spammer. Based on this observation, we provide a two-view semi-supervised method, co-training, to exploit the large amount of unlabeled data. The experiment results show that our proposed method is effective. Our designed machine learning methods achieve significant improvements in comparison to the heuristic baselines.
Resource-Bounded Crowd-Sourcing of Commonsense Knowledge
Kuo, Yen-Ling (National Taiwan University) | Hsu, Jane Yung-jen (National Taiwan University)
Knowledge acquisition is the essential process of extracting and encoding knowledge, both domainspecific and commonsense, to be used in intelligent systems. While many large knowledge bases have been constructed, none is close to complete. This paper presents an approach to improving a knowledge base efficiently under resource constraints. Using a guiding knowledge base, questions are generated from a weak form of similarity-based inference given the glossary mapping between two knowledge bases. The candidate questions are prioritized in terms of the concept coverage of the target knowledge. Experiments were conducted to find questions to grow the Chinese ConceptNet using the English ConceptNet as a guide. The results were evaluated by online users to verify that 94.17% of the questions and 85.77% of the answersare good. In addition, the answers collected in a six-week period showed consistent improvement to a 36.33% increase in concept coverage of the Chinese commonsense knowledge base against the English ConceptNet.