A key challenge in reducing the burden of cardiovascular disease is matching patients to treatments that are most appropriate for them. Different cardiac assessment tools have been developed to address this goal. Recent research has focused on heart rate motifs, i.e., short-term heart rate sequences that are over- or under-represented in long-term electrocardiogram (ECG) recordings of patients experiencing cardiovascular outcomes, which provide novel and valuable information for risk stratification. However, this approach can leverage only a small number of motifs for prediction and results in difficult to interpret models. We address these limitations by identifying latent structure in the large numbers of motifs found in long-term ECG recordings. In particular, we explore the application of topic models to heart rate time series to identify functional sets of heart rate sequences and to concisely describe patients using task-independent features for various cardiovascular outcomes. We evaluate the approach on a large collection of real-world ECG data, and investigate the performance of topic mixture features for the prediction of cardiovascular mortality. The topics provided an interpretable representation of the recordings and maintained valuable information for clinical assessment when compared with motif frequencies, even after accounting for commonly used clinical risk scores.
Cardiovascular disease is the leading cause of death globally, resulting in 17 million deaths each year. Despite the availability of various treatment options, existing techniques based upon conventional medical knowledge often fail to identify patients who might have benefited from more aggressive therapy. In this paper, we describe and evaluate a novel unsupervised machine learning approach for cardiac risk stratification. The key idea of our approach is to avoid specialized medical knowledge, and assess patient risk using symbolic mismatch, a new metric to assess similarity in long-term time-series activity. We hypothesize that high risk patients can be identified using symbolic mismatch, as individuals in a population with unusual long-term physiological activity. We describe related approaches that build on these ideas to provide improved medical decision making for patients who have recently suffered coronary attacks. We first describe how to compute the symbolic mismatch between pairs of long term electrocardiographic (ECG) signals. This algorithm maps the original signals into a symbolic domain, and provides a quantitative assessment of the difference between these symbolic representations of the original signals. We then show how this measure can be used with each of a one-class SVM, a nearest neighbor classifier, and hierarchical clustering to improve risk stratification. We evaluated our methods on a population of 686 cardiac patients with available long-term electrocardiographic data. In a univariate analysis, all of the methods provided a statistically significant association with the occurrence of a major adverse cardiac event in the next 90 days. In a multivariate analysis that incorporated the most widely used clinical risk variables, the nearest neighbor and hierarchical clustering approaches were able to statistically significantly distinguish patients with a roughly two-fold risk of suffering a major adverse cardiac event in the next 90 days.
Socio-economic maps contain important information regarding the population of a country. Computing these maps is critical given that policy makers often times make important decisions based upon such information. However, the compilation of socio-economic maps requires extensive resources and becomes highly expensive. On the other hand, the ubiquitous presence of cell phones, is generating large amounts of spatiotemporal data that can reveal human behavioral traits related to specific socio-economic characteristics. Traditional inference approaches have taken advantage of these datasets to infer regional socio-economic characteristics. In this paper, we propose a novel approach whereby topic models are used to infer socio-economic levels from large-scale spatio-temporal data. Instead of using a pre-determined set of features, we use latent Dirichlet Allocation (LDA) to extract latent recurring patterns of co-occurring behaviors across regions, which are then used in the prediction of socio-economic levels. We show that our approach improves state of the art prediction results by 9%.
Alharbi, Basma (King Abdullah University of Science and Technology (KAUST)) | Qahtan, Abdulhakim (King Abdullah University of Science and Technology (KAUST)) | Zhang, Xiangliang (King Abdullah University of Science and Technology (KAUST))
Utilizing trajectories for modeling human mobility often involves extracting descriptive features for each individual, a procedure heavily based on experts' knowledge. In this work, our objective is to minimize human involvement and exploit the power of community in learning `features' for individuals from their location traces. We propose a probabilistic graphical model that learns distribution of latent concepts, named motifs, from anonymized sequences of user locations. To handle variation in user activity level, our model learns motif distributions from sequence-level location co-occurrence of all users. To handle the big variation in location popularity, our model uses an asymmetric prior, conditioned on per-sequence features. We evaluate the new representation in a link prediction task and compare our results to those of baseline approaches.
Wikipedia is a collaborative setting with both combative and cooperative editing. We propose a new method for investigating the types of editor interactions using a novel representation of Wikipedia's revision history as a temporal, bipartite network with multiple node and edge types for users and revisions. From this representation we identify significant author interactions as network motifs and show how the motif types capture important, diverse editing behaviors. Two experiments demonstrate the further benefit of motifs. First, we demonstrate significant performance improvement over a purely revision-based analysis in classifying pages as combative or cooperative page by using motifs; and second we use motifs as a basis for analyzing trends in the dynamics of editor behavior to explain Wikipedia's content growth.