Goto

Collaborating Authors

 simple unsupervised representation


N-Gram Graph: Simple Unsupervised Representation for Graphs, with Applications to Molecules

Neural Information Processing Systems

Machine learning techniques have recently been adopted in various applications in medicine, biology, chemistry, and material engineering. An important task is to predict the properties of molecules, which serves as the main subroutine in many downstream applications such as virtual screening and drug design. Despite the increasing interest, the key challenge is to construct proper representations of molecules for learning algorithms. This paper introduces the N-gram graph, a simple unsupervised representation for molecules.


N-Gram Graph: Simple Unsupervised Representation for Graphs, with Applications to Molecules

Neural Information Processing Systems

Machine learning techniques have recently been adopted in various applications in medicine, biology, chemistry, and material engineering. An important task is to predict the properties of molecules, which serves as the main subroutine in many downstream applications such as virtual screening and drug design. Despite the increasing interest, the key challenge is to construct proper representations of molecules for learning algorithms. This paper introduces the N-gram graph, a simple unsupervised representation for molecules. It then constructs a compact representation for the graph by assembling the vertex embeddings in short walks in the graph, which we show is equivalent to a simple graph neural network that needs no training.


Reviews: N-Gram Graph: Simple Unsupervised Representation for Graphs, with Applications to Molecules

Neural Information Processing Systems

This is critical because most of the molecule datasets are small. Learning an unsupervised representation allows the model to better generalize and potentially utilize unlabeled data in a semi-supervised setting. Currently there are few methods working on learning unsupervised molecular representation and therefore I think this paper is original. The paper provides theoretical analysis characterizing the model's representation power and generalization bound, which is important for understanding the model. It would be good to see the average sparsity of c(n) on some molecule datasets.


N-Gram Graph: Simple Unsupervised Representation for Graphs, with Applications to Molecules

Neural Information Processing Systems

Machine learning techniques have recently been adopted in various applications in medicine, biology, chemistry, and material engineering. An important task is to predict the properties of molecules, which serves as the main subroutine in many downstream applications such as virtual screening and drug design. Despite the increasing interest, the key challenge is to construct proper representations of molecules for learning algorithms. This paper introduces the N-gram graph, a simple unsupervised representation for molecules. It then constructs a compact representation for the graph by assembling the vertex embeddings in short walks in the graph, which we show is equivalent to a simple graph neural network that needs no training.


N-Gram Graph: Simple Unsupervised Representation for Graphs, with Applications to Molecules

Neural Information Processing Systems

Machine learning techniques have recently been adopted in various applications in medicine, biology, chemistry, and material engineering. An important task is to predict the properties of molecules, which serves as the main subroutine in many downstream applications such as virtual screening and drug design. Despite the increasing interest, the key challenge is to construct proper representations of molecules for learning algorithms. This paper introduces the N-gram graph, a simple unsupervised representation for molecules. It then constructs a compact representation for the graph by assembling the vertex embeddings in short walks in the graph, which we show is equivalent to a simple graph neural network that needs no training. The representations can thus be efficiently computed and then used with supervised learning methods for prediction.