Collaborating Authors

Dimension Reduction Approach for Interpretability of Sequence to Sequence Recurrent Neural Networks Machine Learning

Encoder-decoder recurrent neural network models (Seq2Seq) have achieved great success in ubiquitous areas of computation and applications. It was shown to be successful in modeling data with both temporal and spatial dependencies for translation or prediction tasks. In this study, we propose a dimension reduction approach to visualize and interpret the representation of the data by these models. We propose to view the hidden states of the encoder and the decoder as spatio-temporal snapshots of network dynamics and to apply proper orthogonal decomposition to their concatenation to compute a low-dimensional embedding for hidden state dynamics. Projection of the decoder states onto such interpretable embedding space shows that Seq2Seq training to predict sequences using gradient-descent back propagation effectively performs dimension reduction consisting of only a small percentage of dimensions of the network's hidden units. Furthermore, sequences are being clustered into well separable clusters in the low dimensional space each of which corresponds to a different type of dynamics. The projection methodology also clarifies the roles of the encoder and the decoder components of the network. We show that the projection of encoder hidden states onto the low dimensional space provides an initializing trajectory directing the sequence to the cluster which corresponds to that particular type of distinct dynamics and the projection of the decoder hidden states constitutes the embedded cluster attractor. Inspection of the low dimensional space and the projections onto it during training shows that the estimation of clusters separability in the embedding can be utilized to estimate the optimality of model training. We test and demonstrate our proposed interpretability methodology on synthetic examples (dynamics on a circle and an ellipse) and on 3D human body movement data.

A Hierarchical Latent Vector Model for Learning Long-Term Structure in Music Machine Learning

The Variational Autoencoder (VAE) has proven to be an effective model for producing semantically meaningful latent representations for natural data. However, it has thus far seen limited application to sequential data, and, as we demonstrate, existing recurrent VAE models have difficulty modeling sequences with long-term structure. To address this issue, we propose the use of a hierarchical decoder, which first outputs embeddings for subsequences of the input and then uses these embeddings to generate each subsequence independently. This structure encourages the model to utilize its latent code, thereby avoiding the "posterior collapse" problem which remains an issue for recurrent VAEs. We apply this architecture to modeling sequences of musical notes and find that it exhibits dramatically better sampling, interpolation, and reconstruction performance than a "flat" baseline model. An implementation of our "MusicVAE" is available online at

Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules


They used an autoencoder, as illustrated in the diagram hereafter, which is constituted of an encoder and a decoder. The encoder converts a molecule (represented as a SMILES string) into a continuous probabilistic representation. This continuous representation of molecules constitutes the latent space, which is of lower dimensionality than the starting space (thus, the input molecule is compressed into its latent representation). The decoder is able to generate a molecular structure (as a SMILES string) from any point in the continuous latent space. Additionally, the latent representation of a molcule can be used to predict its properties, such as its drug-likeness or synthetic accessibility, using a neural network.

Supervising the Decoder of Variational Autoencoders to Improve Scientific Utility Machine Learning

Probabilistic generative models are attractive for scientific modeling because their inferred parameters can be used to generate hypotheses and design experiments. This requires that the learned model provide an accurate representation of the input data and yield a latent space that effectively predicts outcomes relevant to the scientific question. Supervised Variational Autoencoders (SVAEs) have previously been used for this purpose, where a carefully designed decoder can be used as an interpretable generative model while the supervised objective ensures a predictive latent representation. Unfortunately, the supervised objective forces the encoder to learn a biased approximation to the generative posterior distribution, which renders the generative parameters unreliable when used in scientific models. This issue has remained undetected as reconstruction losses commonly used to evaluate model performance do not detect bias in the encoder. We address this previously-unreported issue by developing a second order supervision framework (SOS-VAE) that influences the decoder to induce a predictive latent representation. This ensures that the associated encoder maintains a reliable generative interpretation. We extend this technique to allow the user to trade-off some bias in the generative parameters for improved predictive performance, acting as an intermediate option between SVAEs and our new SOS-VAE. We also use this methodology to address missing data issues that often arise when combining recordings from multiple scientific experiments. We demonstrate the effectiveness of these developments using synthetic data and electrophysiological recordings with an emphasis on how our learned representations can be used to design scientific experiments.

Implicit Deep Latent Variable Models for Text Generation Machine Learning

Deep latent variable models (LVM) such as variational auto-encoder (VAE) have recently played an important role in text generation. One key factor is the exploitation of smooth latent structures to guide the generation. However, the representation power of VAEs is limited due to two reasons: (1) the Gaussian assumption is often made on the variational posteriors; and meanwhile (2) a notorious "posterior collapse" issue occurs. In this paper, we advocate sample-based representations of variational distributions for natural language, leading to implicit latent features, which can provide flexible representation power compared with Gaussian-based posteriors. We further develop an LVM to directly match the aggregated posterior to the prior. It can be viewed as a natural extension of VAEs with a regularization of maximizing mutual information, mitigating the "posterior collapse" issue. We demonstrate the effectiveness and versatility of our models in various text generation scenarios, including language modeling, unaligned style transfer, and dialog response generation. The source code to reproduce our experimental results is available on GitHub.