Goto

Collaborating Authors

 Country


Option Compatible Reward Inverse Reinforcement Learning

arXiv.org Machine Learning

Reinforcement learning with complex tasks is a challenging problem. Often, expert demonstrations of complex multitasking operations are required to train agents. However, it is difficult to design a reward function for given complex tasks. In this paper, we solve a hierarchical inverse reinforcement learning (IRL) problem within the framework of options. A gradient method for parametrized options is used to deduce a defining equation for the Q-feature space, which leads to a reward feature space. Using a second-order optimality condition for option parameters, an optimal reward function is selected. Experimental results in both discrete and continuous domains confirm that our segmented rewards provide a solution to the IRL problem for multitasking operations and show good performance and robustness against the noise created by expert demonstrations.


Deep Learning Models for Global Coordinate Transformations that Linearize PDEs

arXiv.org Machine Learning

Deep Learning Models for Global Coordinate Transformations that Linearize PDEs Craig Gin 1, Bethany Lusch 2, Steven L. Brunton 1,3, and J. Nathan Kutz 1 1 Department of Applied Mathematics, University of Washington, Seattle, WA, 98195, USA 2 Argonne Leadership Computing Facility, Argonne National Laboratory, Lemont, IL, USA 3 Department of Mechanical Engineering, University of Washington, Seattle, WA, 98195, USA (Received 11 November 2019) We develop a deep autoencoder architecture that can be used to find a coordinate transformation which turns a nonlinear PDE into a linear PDE. Our architecture is motivated by the linearizing transformations provided by the Cole-Hopf transform for Burgers equation and the inverse scattering transform for completely integrable PDEs. By leveraging a residual network architecture, a near-identity transformation can be exploited to encode intrinsic coordinates in which the dynamics are linear. The resulting dynamics are given by a Koopman operator matrix K. The decoder allows us to transform back to the original coordinates as well. Multiple time step prediction can be performed by repeated multiplication by the matrix K in the intrinsic coordinates. We demonstrate our method on a number of examples, including the heat equation and Burgers equation, as well as the substantially more challenging Kuramoto-Sivashinsky equation, showing that our method provides a robust architecture for discovering interpretable, linearizing transforms for nonlinear PDEs. Key Words: Koopman theory, deep neural nets, residual networks, linearizing transforms, Cole-Hopf transform 2010 Mathematics Subject Classification: 35A22, 35A35, 37M99, 65P99, 68T99 1 Introduction Partial differential equations (PDEs) provide a theoretical framework for modeling spatiotemporal systems across the biological, physical and engineering sciences. Analytic solution techniques are readily available for PDEs that are linear and have constant coefficients [12]. These PDEs include canonical models such as the heat equation, wave equation and Laplace's equation which are amenable to standard separation of variable techniques and linear superposition. In contrast, there is no general mathematical architecture for solving nonlinear PDEs as methods like separation of variables fail to hold, thus recourse to computational solutions is necessary. There are a few, but notable, exceptions: (i) the Cole-Hopf transformation [14, 6] for solving diffusively regularized Burgers equation, and (ii) the Inverse Scattering Transform (IST) [1] for solving a class of completely integrable PDEs such as Korteweg deVries (KdV), nonlinear Schr odinger arXiv:1911.02710v1 A deep autoencoder is used to find coordinate transformations to linearize PDEs. The encoder finds a set of intrinsic coordinates for which the dynamics are linear.


Uncertainty relations and fluctuation theorems for Bayes nets

arXiv.org Machine Learning

The pioneering paper [Ito and Sagawa, 2013] analyzed the non-equilibrium statistical physics of a set of multiple interacting systems, S, whose joint discrete-time evolution is specified by a Bayesian network. The major result of [Ito and Sagawa, 2013] was an integral fluctuation theorem (IFT) governing the sum of two quantities: the entropy production (EP) of an arbitrary single v in S, and the transfer entropy from v to the other systems. Here I extend the analysis in [Ito and Sagawa, 2013]. I derive several detailed fluctuation theorems (DFTs), concerning arbitrary subsets of all the systems (including the full set). I also derive several associated IFTs, concerning an arbitrary subset of the systems, thereby extending the IFT in [Ito and Sagawa, 2013]. In addition I derive "conditional" DFTs and IFTs, involving conditional probability distributions rather than (as in conventional fluctuation theorems) unconditioned distributions. I then derive thermodynamic uncertainty relations relating the total EP of the Bayes net to the set of all the precisions of probability currents within the individual systems. I end with an example of that uncertainty relation.


A Comprehensive Survey on Transfer Learning

arXiv.org Machine Learning

Transfer learning aims at improving the performance of target learners on target domains by transferring the knowledge contained in different but related source domains. In this way, the dependence on a large number of target domain data can be reduced for constructing target learners. Due to the wide application prospects, transfer learning has become a popular and promising area in machine learning. Although there are already some valuable and impressive surveys on transfer learning, these surveys introduce approaches in a relatively isolated way and lack the recent advances in transfer learning. As the rapid expansion of the transfer learning area, it is both necessary and challenging to comprehensively review the relevant studies. This survey attempts to connect and systematize the existing transfer learning researches, as well as to summarize and interpret the mechanisms and the strategies in a comprehensive way, which may help readers have a better understanding of the current research status and ideas. Different from previous surveys, this survey paper reviews over forty representative transfer learning approaches from the perspectives of data and model. The applications of transfer learning are also briefly introduced. In order to show the performance of different transfer learning models, twenty representative transfer learning models are used for experiments. The models are performed on three different datasets, i.e., Amazon Reviews, Reuters-21578, and Office-31. And the experimental results demonstrate the importance of selecting appropriate transfer learning models for different applications in practice.


Physics-Guided Architecture (PGA) of Neural Networks for Quantifying Uncertainty in Lake Temperature Modeling

arXiv.org Machine Learning

Water temperature is known to be principal driver of the growth, survival, and reproduction of economically viable fish [21, 30] (see Appendix for more details). Increases in water temperature are also linked to the occurrence of aquatic invasive species [28, 29], which may displace fish and native aquatic organisms, and further result in harmful algal blooms [9, 26]. Hence, accurate and timely information about water temperature is necessary to monitor the ecological health of lakes and forecast future populations of fish and other aquatic taxa. Since observations of water temperatures are incomplete at broad spatial scales (or nonexistent for most lakes), physics-based models of lake temperature, e.g., the General Lake Model (GLM) [10], are commonly used for studying lake processes. A standard formulation in these models is to assume horizontal heterogeneity is limited and that the most relevant dynamics are captured in the vertical dimension of the lake, thereby modeling the lake as a series of vertical layers. These modeling studies often use temperature of water at the centre of a lake at varying depth values 1 and time points for model validation. We adopt the same formulation to model the temperature of water in a lake, Y d,tat depth d and time t . In particular, we leverage two key physical principles of our problem to guide neural network approaches, briefly described in the following.


Generalized Transformation-based Gradient

arXiv.org Machine Learning

The reparameterization trick has become one of the most useful tools in the field of variational inference. However, the reparameterization trick is based on the standardization transformation which restricts the scope of application of this method to distributions that have tractable inverse cumulative distribution functions or are expressible as deterministic transformations of such distributions. In this paper, we generalized the reparameterization trick by allowing a general transformation. We discover that the proposed model is a special case of control variate indicating that the proposed model can combine the advantages of CV and generalized reparameterization. Based on the proposed gradient model, we propose a new polynomial-based gradient estimator which has better theoretical performance than the reparameterization trick under certain condition and can be applied to a larger class of variational distributions. In studies of synthetic and real data, we show that our proposed gradient estimator has a significantly lower gradient variance than other state-of-the-art methods thus enabling a faster inference procedure.


Invariance and identifiability issues for word embeddings

arXiv.org Machine Learning

Word embeddings are commonly obtained as optimizers of a criterion function $f$ of a text corpus, but assessed on word-task performance using a different evaluation function $g$ of the test data. We contend that a possible source of disparity in performance on tasks is the incompatibility between classes of transformations that leave $f$ and $g$ invariant. In particular, word embeddings defined by $f$ are not unique; they are defined only up to a class of transformations to which $f$ is invariant, and this class is larger than the class to which $g$ is invariant. One implication of this is that the apparent superiority of one word embedding over another, as measured by word task performance, may largely be a consequence of the arbitrary elements selected from the respective solution sets. We provide a formal treatment of the above identifiability issue, present some numerical examples, and discuss possible resolutions.


Data Generation for Neural Programming by Example

arXiv.org Machine Learning

Programming by example is the problem of synthesizing a program from a small set of input / output pairs. Recent works applying machine learning methods to this task show promise, but are typically reliant on generating synthetic examples for training. A particular challenge lies in generating meaningful sets of inputs and outputs, which well-characterize a given program and accurately demonstrate its behavior. Where examples used for testing are generated by the same method as training data then the performance of a model may be partly reliant on this similarity. In this paper we introduce a novel approach using an SMT solver to synthesize inputs which cover a diverse set of behaviors for a given program. We carry out a case study comparing this method to existing synthetic data generation procedures in the literature, and find that data generated using our approach improves both the discriminatory power of example sets and the ability of trained machine learning models to generalize to unfamiliar data.


Map Enhanced Route Travel Time Prediction using Deep Neural Networks

arXiv.org Machine Learning

Travel time estimation is a fundamental problem in transportation science with extensive literature. The study of these techniques has intensified due to availability of many publicly available large trip datasets. Recently developed deep learning based models have improved the generality and performance and have focused on estimating times for individual sub-trajectories and aggregating them to predict the travel time of the entire trajectory. However, these techniques ignore the road network information. In this work, we propose and study techniques for incorporating road networks along with historical trips' data into travel time prediction. We incorporate both node embeddings as well as road distance into the existing model. Experiments on large real-world benchmark datasets suggest improved performance, especially when the train data is small. As expected, the proposed method performs better than the baseline when there is a larger difference between road distance and Vincenty distance between start and end points.


Hyper-SAGNN: a self-attention based graph neural network for hypergraphs

arXiv.org Machine Learning

Graph representation learning for hypergraphs can be used to extract patterns among higher-order interactions that are critically important in many real world problems. Current approaches designed for hypergraphs, however, are unable to handle different types of hypergraphs and are typically not generic for various learning tasks. Indeed, models that can predict variable-sized heterogeneous hyperedges have not been available. Here we develop a new self-attention based graph neural network called Hyper-SAGNN applicable to homogeneous and heterogeneous hypergraphs with variable hyperedge sizes. We perform extensive evaluations on multiple datasets, including four benchmark network datasets and two single-cell Hi-C datasets in genomics. We demonstrate that Hyper-SAGNN significantly outperforms the state-of-the-art methods on traditional tasks while also achieving great performance on a new task called outsider identification. Hyper-SAGNN will be useful for graph representation learning to uncover complex higher-order interactions in different applications.