Alharbi, Basma (King Abdullah University of Science and Technology (KAUST)) | Qahtan, Abdulhakim (King Abdullah University of Science and Technology (KAUST)) | Zhang, Xiangliang (King Abdullah University of Science and Technology (KAUST))
Utilizing trajectories for modeling human mobility often involves extracting descriptive features for each individual, a procedure heavily based on experts' knowledge. In this work, our objective is to minimize human involvement and exploit the power of community in learning `features' for individuals from their location traces. We propose a probabilistic graphical model that learns distribution of latent concepts, named motifs, from anonymized sequences of user locations. To handle variation in user activity level, our model learns motif distributions from sequence-level location co-occurrence of all users. To handle the big variation in location popularity, our model uses an asymmetric prior, conditioned on per-sequence features. We evaluate the new representation in a link prediction task and compare our results to those of baseline approaches.
Most of existing methods for DNA motif discovery consider only a single set of sequences to find an over-represented motif. In contrast, we consider multiple sets of sequences where we group sets associated with the same motif into a cluster, assuming that each set involves a single motif. Clustering sets of sequences yields clusters of coherent motifs, improving signal-to-noise ratio or enabling us to identify multiple motifs. We present a probabilistic model for DNA motif discovery where we identify multiple motifs through searching for patterns which are shared across multiple sets of sequences. Our model infers cluster-indicating latent variables and learns motifs simultaneously, where these two tasks interact with each other.
Networks are a fundamental and flexible way of representing various complex systems. Many domains such as communication, citation, procurement, biology, social media, and transportation can be modeled as a set of entities and their relationships. Temporal networks are a specialization of general networks where the temporal evolution of the system is as important to understand as the structure of the entities and relationships. We present the Independent Temporal Motif (ITeM) to characterize temporal graphs from different domains. The ITeMs are edge-disjoint temporal motifs that can be used to model the structure and the evolution of the graph. For a given temporal graph, we produce a feature vector of ITeM frequencies and apply this distribution to the task of measuring the similarity of temporal graphs. We show that ITeM has higher accuracy than other motif frequency-based approaches. We define various metrics based on ITeM that reveal salient properties of a temporal network. We also present importance sampling as a method for efficiently estimating the ITeM counts. We evaluate our approach on both synthetic and real temporal networks.
In this spirit, Sali and Blundell (1990) develop an elaborate scheme for the comparison of protein structures. The results of a comparison form a "generalized protein," which can be used in predicting 3D conformation of the sequence of the unknown. Similar to the work of Lathrop et al. (1987), proteins are described by a hierarchy, with each level being a sequence of typed elements. Elements of fragments are represented by a host of computed properties, rather than by a single identifier. Attributes of fragment elements can refer to other elements in the sequence, thus representing binary relationships such as hydrogen bonding between elements.
As the popularity of content sharing websites has increased, they have become targets for spam, phishing and the distribution of malware. On YouTube, the facility for users to post comments can be used by spam campaigns to direct unsuspecting users to malicious third-party websites. In this paper, we demonstrate how such campaigns can be tracked over time using network motif profiling, i.e. by tracking counts of indicative network motifs. By considering all motifs of up to five nodes, we identify discriminating motifs that reveal two distinctly different spam campaign strategies, and present an evaluation that tracks two corresponding active campaigns.