Jilin University
Group Sparse Bayesian Learning for Active Surveillance on Epidemic Dynamics
Pei, Hongbin (Jilin University) | Yang, Bo (Jilin University) | Liu, Jiming (Hong Kong Baptist University) | Dong, Lei (Peking University)
Predicting epidemic dynamics is of great value in understanding and controlling diffusion processes, such as infectious disease spread and information propagation. This task is intractable, especially when surveillance resources are very limited. To address the challenge, we study the problem of active surveillance, i.e., how to identify a small portion of system components as sentinels to effect monitoring, such that the epidemic dynamics of an entire system can be readily predicted from the partial data collected by such sentinels. We propose a novel measure, the gamma value, to identify the sentinels by modeling a sentinel network with row sparsity structure. We design a flexible group sparse Bayesian learning algorithm to mine the sentinel network suitable for handling both linear and non-linear dynamical systems by using the expectation maximization method and variational approximation. The efficacy of the proposed algorithm is theoretically analyzed and empirically validated using both synthetic and real-world data.
Event-Based Structural Change Detection in Urban-Scale Contact Network
Bai, Yuan (Jilin University) | Yang, Bo (Jilin University) | Eggo, Rosalind (London School of Hygiene and Tropical Medicine) | Du, Zhanwei (University of Texas at Austin)
The detection of structural changes is an important task in analyzing network evolution, especially for interactions between people, that may be driven by external events.Existing work relies on snapshot data and misses out some key functions of networks. Here, we study contact network evolution where no snapshot data are available.In spite of the challenge, this study demonstrates how contact networks can be used to predict and control infectious disease epidemics.We first model structural changes in contact networks during the 2009 influenza pandemic in Hong Kong, and then present a probabilistic framework to address it, aiming to answer when and how the underlying structure changes, utilizing multiple data sources including demographic data, and epidemic surveillance data.The efficacy and public health utility of the method are demonstrated using both synthetic and real data.
Acquiring Planning Knowledge via Crowdsourcing
Gao, Jie (Jilin University) | Zhuo, Hankz Hankui (Sun Yat-sen University) | Kambhampati, Subbarao (Arizona State University) | Li, Lei (Sun Yat-sen University)
Plan synthesis often requires complete domain models and initial states as input. In many real world applications, it is difficult to build domain models and provide complete initial state beforehand. In this paper we propose to turn to the crowd for help before planning. We assume there are annotators available to provide information needed for building domain models and initial states. However, there might be a substantial amount of discrepancy within the inputs from the crowd. It is thus challenging to address the planning problem with possibly noisy information provided by the crowd. We address the problem by two phases. We first build a set of Human Intelligence Tasks (HITs), and collect values from the crowd. We then estimate the actual values of variables and feed the values to a planner to solve the problem.
A Stochastic Model for Detecting Heterogeneous Link Communities in Complex Networks
He, Dongxiao (Tianjin University) | Liu, Dayou (Jilin University) | Jin, Di (Tianjin University) | Zhang, Weixiong (Washington University in Saint Louis)
Discovery of communities in networks is a fundamental data analysis problem. Most of the existing approaches have focused on discovering communities of nodes, while recent studies have shown great advantages and utilities of the knowledge of communities of links. Stochastic models provides a promising class of techniques for the identification of modular structures, but most stochastic models mainly focus on the detection of node communities rather than link communities. We propose a stochastic model, which not only describes the structure of link communities, but also considers the heterogeneous distribution of community sizes, a property which is often ignored by other models. We then learn the model parameters using a method of maximum likelihood based on an expectation-maximization algorithm. To deal with large complex real networks, we extend the method by a strategy of iterative bipartition. The extended method is not only efficient, but is also able to determine the number of communities for a given network. We test our approach on both synthetic benchmarks and real-world networks including an application to a large biological network, and also compare it with two existing methods. The results demonstrate the superior performance of our approach over the competing methods for detecting link communities.
Exploring Social Context for Topic Identification in Short and Noisy Texts
Wang, Xin (Jilin University;Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education) | Wang, Ying (Changchun Institute of Tech) | Zuo, Wanli (Jilin University) | Cai, Guoyong (Jilin University)
With the pervasion of social media, topic identification in short texts attracts increasing attention inย recent years. However, in nature the texts of social media are short and noisy, and the structures are sparse and dynamic, resulting in difficulty to identify topic categories exactly from online social media. Inspired by social science findings that preference consistency and social contagion are observed in social media, we investigate topic identification in short and noisy texts by exploring social context from the perspective of social sciences. In particular, we present a mathematical optimization formulation that incorporates the preference consistency and social contagion theories into a supervised learning method, and conduct feature selection to tackle short and noisy texts in social media, which result in a Sociological framework for Topic Identification (STI). Experimental results on real-world datasets from Twitter and Citation Network demonstrate the effectiveness of the proposed framework. Further experiments are conducted to understand the importance of social context in topic identification.
Modelling Individual Negative Emotion Spreading Process with Mobile Phones
Du, Zhanwei (Jilin University) | Yang, Yongjian (Jilin Univerisity) | Ma, Chuang (Jilin Univerisity) | Bai, Yuan (Jilin Univerisity)
Individual mood is important for physical and emotional well-being, creativity and working memory. However, due to the lack of long-term real tracking daily data in individual level, most current works focus their efforts on population level and short-term small group. An ignored yet important task is to find the sentiment spreading mechanism in individual level from their daily behavior data. This paper studies this task by raising the following fundamental and summarization question, being not sufficiently answered by the literature so far:Given a social network, how the sentiment spread? The current individual-level network spreading models always assume one can infect others only when he/she has been infected. Considering the negative emotion spreading characters in individual level, we loose this assumption, and give an individual negative emotion spreading model. In this paper, we propose a Graph-Coupled Hidden Markov Sentiment Model for modeling the propagation of infectious negative sentiment locally within a social network. Taking the MIT Social Evolution dataset as an example, the experimental results verify the efficacy of our techniques on real-world data.
Bayesian Approach to Modeling and Detecting Communities in Signed Network
Yang, Bo (Jilin University) | Zhao, Xuehua (Jilin University) | Liu, Xueyan (Jilin University)
There has been an increasing interest in exploring signed networks with positive and negative links in that they contain more information than unsigned networks. As fundamental problems of signed network analysis, community detection and sign (or attitude) prediction are still primary challenges. To address them, we propose a generative Bayesian approach, in which 1) a signed stochastic blockmodel is proposed to characterize the community structure in context of signed networks, by means of explicitly formulating the distributions of both density and frustration of signed links from a stochastic perspective, and 2) a model learning algorithm is proposed by theoretically deriving a variational Bayes EM for parameter estimation and a variation based approximate evidence for model selection. Through the comparisons with state-of-the-art methods on synthetic and real-world networks, the proposed approach shows its superiority in both community detection and sign prediction for exploratory networks.
On the Scalable Learning of Stochastic Blockmodel
Yang, Bo (Jilin University) | Zhao, Xuehua (Jilin University)
Stochastic blockmodel (SBM) enables us to decompose and analyze an exploratory network without a priori knowledge about its intrinsic structure. However, the task of effectively and efficiently learning a SBM from a large-scale network is still challenging due to the high computational cost of its model selection and parameter estimation. To address this issue, we present a novel SBM learning algorithm referred to as BLOS (BLOckwise Sbm learning). Distinct from the literature, the model selection and parameter estimation of SBM are concurrently, rather than alternately, executed in BLOS by embedding the minimum message length criterion into a block-wise EM algorithm, which greatly reduces the time complexity of SBM learning without losing learning accuracy and modeling flexibility. Its effectiveness and efficiency have been tested through rigorous comparisons with the state-of-the-art methods on both synthetic and real-world networks.
Modeling and Mining Spatiotemporal Patterns of Infection Risk from Heterogeneous Data for Active Surveillance Planning
Yang, Bo (Jilin University) | Guo, Hua (Jilin University) | Yang, Yi (Jilin University) | Shi, Benyun (Hong Kong Baptist University) | Zhou, Xiaonong (Chinese CDC) | Liu, Jiming (Hong Kong Baptist University)
Active surveillance is a desirable way to prevent the spread of infectious diseases in that it aims to timely discover individual incidences through an active searching for patients. However, in practice active surveillance is difficult to implement especially when monitoring space is large but available resources are limited. Therefore, it is extremely important for public health authorities to know how to distribute their very sparse resources to high-priority regions so as to maximize the outcomes of active surveillance. In this paper, we raise the problem of active surveillance planning and provide an effective method to address it via modeling and mining spatiotemporal patterns of infection risks from heterogeneous data sources. Taking malaria as an example, we perform an empirical study on real-world data to validate our method and provide our new findings.
Case-Based Meta-Prediction for Bioinformatics
Yun, Xi (The Graduate Center of The City University of New York) | Epstein, Susan L. (The Graduate Center and Hunter College of The City University of New York) | Han, Weiwei (Jilin University) | Xie, Lei (The Graduate Center and Hunter College of The City University of New York)
Before laboratory testing, bioinformatics problems often require a machine-learned predictor to identify the most likely choices among a wealth of possibilities. Researchers may advocate different predictors for the same problem, none of which is best in all situations. This paper introduces a case-based meta-predictor that combines a set of elaborate, pre-existing predictors to improve their accuracy on a difficult and important problem: protein-ligand docking. The method focuses on the reliability of its component predictors, and has broad potential applications in biology and chemistry. Despite noisy and biased input, the method outperforms its individual components on benchmark data. It provides a promising solution for the performance improvement of compound virtual screening, which would thereby reduce the time and cost of drug discovery.