Clustering sequence sets for motif discovery

Neural Information Processing Systems

Most of existing methods for DNA motif discovery consider only a single set of sequences to find an over-represented motif. In contrast, we consider multiple sets of sequences where we group sets associated with the same motif into a cluster, assuming that each set involves a single motif. Clustering sets of sequences yields clusters of coherent motifs, improving signal-to-noise ratio or enabling us to identify multiple motifs. We present a probabilistic model for DNA motif discovery where we identify multiple motifs through searching for patterns which are shared across multiple sets of sequences. Our model infers cluster-indicating latent variables and learns motifs simultaneously, where these two tasks interact with each other.


HONE: Higher-Order Network Embeddings

arXiv.org Machine Learning

This paper describes a general framework for learning Higher-Order Network Embeddings (HONE) from graph data based on network motifs. The HONE framework is highly expressive and flexible with many interchangeable components. The experimental results demonstrate the effectiveness of learning higher-order network representations. In all cases, HONE outperforms recent embedding methods that are unable to capture higher-order structures with a mean relative gain in AUC of $19\%$ (and up to $75\%$ gain) across a wide variety of networks and embedding methods.


Toward Unsupervised Activity Discovery Using Multi Dimensional Motif Detection in Time Series

AAAI Conferences

This paper addresses the problem of activity and event discovery in multi dimensional time series data by proposing a novel method for locating multi dimensional motifs in time series. While recent work has been done in finding single dimensional and multi dimensional motifs in time series, we address motifs in general case, where the elements of multi dimensional motifs have temporal, length, and frequency variations. The proposed method is validated by synthetic data, and empirical evaluation has been done on several wearable systems that are used by real subjects.


[Editors' Choice] Fluorine frolicking with eight friends

Science

Fluorine plays a supporting role in some of the best-known hypervalent compounds, such as PF5 and SF6. Goesten et al. now suggest that the halogen can also play the lead part in constrained environs. Using density functional theory, the authors report that all eight engage in stabilizing Si-F orbital interactions. Whereas hypervalency is more often associated with third- and fourth-row elements, in this motif, sterics preclude analogous bonding to the heavier halides.


Knowledge Discovery of Multilevel Protein Motifs

AAAI Conferences

Protein motifs can be classified into four categories. Sequence motifs are linear strings of residue identifiers with an implicit topological ordering. Sequence-structure motifs are sequence motifs with predefined secondary structural elements attached to one or more residues in the motif. The sequence is assumed to be predictive of the associated structure. Structure motifs are 3d structural objects, described by positions of residue objects in 3d Euclidean space.