Canonical transformation plays a fundamental role in simplifying and solving classical Hamiltonian systems. We construct flexible and powerful canonical transformations as generative models using symplectic neural networks. The model transforms physical variables towards a latent representation with an independent harmonic oscillator Hamiltonian. Correspondingly, the phase space density of the physical system flows towards a factorized Gaussian distribution in the latent space. Since the canonical transformation preserves the Hamiltonian evolution, the model captures nonlinear collective modes in the learned latent representation. We present an efficient implementation of symplectic neural coordinate transformations and two ways to train the model. The variational free energy calculation is based on the analytical form of physical Hamiltonian. While the phase space density estimation only requires samples in the coordinate space for separable Hamiltonians. We demonstrate appealing features of neural canonical transformation using toy problems including two-dimensional ring potential and harmonic chain. Finally, we apply the approach to real-world problems such as identifying slow collective modes in alanine dipeptide and conceptual compression of the MNIST dataset.
This paper introduces a new encoder-decoder architecture that is trained to reconstruct images by disentangling the salient information of the image and the values of attributes directly in the latent space. As a result, after training, our model can generate different realistic versions of an input image by varying the attribute values. By using continuous attribute values, we can choose how much a specific attribute is perceivable in the generated image. This property could allow for applications where users can modify an image using sliding knobs, like faders on a mixing console, to change the facial expression of a portrait, or to update the color of some objects. Compared to the state-of-the-art which mostly relies on training adversarial networks in pixel space by altering attribute values at train time, our approach results in much simpler training schemes and nicely scales to multiple attributes. We present evidence that our model can significantly change the perceived value of the attributes while preserving the naturalness of images.
Learning useful representations without supervision remains a key challenge in machine learning. In this paper, we propose a simple yet powerful generative model that learns such discrete representations. Our model, the Vector Quantised-Variational AutoEncoder (VQ-VAE), differs from VAEs in two key ways: the encoder network outputs discrete, rather than continuous, codes; and the prior is learnt rather than static. In order to learn a discrete latent representation, we incorporate ideas from vector quantisation (VQ). Using the VQ method allows the model to circumvent issues of ``posterior collapse'' -— where the latents are ignored when they are paired with a powerful autoregressive decoder -— typically observed in the VAE framework. Pairing these representations with an autoregressive prior, the model can generate high quality images, videos, and speech as well as doing high quality speaker conversion and unsupervised learning of phonemes, providing further evidence of the utility of the learnt representations.
To overcome the high dimensionality of data, learning latent feature representations for clustering has been widely studied recently. However, it is still challenging to learn "cluster-friendly" latent representations due to the unsupervised fashion of clustering. In this paper, we propose Disentangling Latent Space Clustering (DLS-Clustering), a new clustering mechanism that directly learning cluster assignment during the disentanglement of latent spacing without constructing the "cluster-friendly" latent representation and additional clustering methods. We achieve the bidirectional mapping by enforcing an inference network (i.e. encoder) and the generator of GAN to form a deterministic encoder-decoder pair with a maximum mean discrepancy (MMD)-based regularization. We utilize a weight-sharing procedure to disentangle latent space into the one-hot discrete latent variables and the continuous latent variables. The disentangling process is actually performing the clustering operation. Eventually the one-hot discrete latent variables can be directly expressed as clusters, and the continuous latent variables represent remaining unspecified factors. Experiments on six benchmark datasets of different types demonstrate that our method outperforms existing state-of-the-art methods. We further show that the latent representations from DLS-Clustering also maintain the ability to generate diverse and high-quality images, which can support more promising application scenarios.
We introduce Deep Reasoning Networks (DRNets), an end-to-end framework that combines deep learning with reasoning for solving complex tasks, typically in an unsupervised or weakly-supervised setting. DRNets exploit problem structure and prior knowledge by tightly combining logic and constraint reasoning with stochastic-gradient-based neural network optimization. We illustrate the power of DRNets on de-mixing overlapping hand-written Sudokus (Multi-MNIST-Sudoku) and on a substantially more complex task in scientific discovery that concerns inferring crystal structures of materials from X-ray diffraction data under thermodynamic rules (Crystal-Structure-Phase-Mapping). At a high level, DRNets encode a structured latent space of the input data, which is constrained to adhere to prior knowledge by a reasoning module. The structured latent encoding is used by a generative decoder to generate the targeted output. Finally, an overall objective combines responses from the generative decoder (thinking fast) and the reasoning module (thinking slow), which is optimized using constraint-aware stochastic gradient descent. We show how to encode different tasks as DRNets and demonstrate DRNets' effectiveness with detailed experiments: DRNets significantly outperform the state of the art and experts' capabilities on Crystal-Structure-Phase-Mapping, recovering more precise and physically meaningful crystal structures. On Multi-MNIST-Sudoku, DRNets perfectly recovered the mixed Sudokus' digits, with 100% digit accuracy, outperforming the supervised state-of-the-art MNIST de-mixing models. Finally, as a proof of concept, we also show how DRNets can solve standard combinatorial problems -- 9-by-9 Sudoku puzzles and Boolean satisfiability problems (SAT), outperforming other specialized deep learning models. DRNets are general and can be adapted and expanded to tackle other tasks.