Goto

Collaborating Authors

 Chen, Yifan


Solving and Learning Nonlinear PDEs with Gaussian Processes

arXiv.org Machine Learning

We introduce a simple, rigorous, and unified framework for solving nonlinear partial differential equations (PDEs), and for solving inverse problems (IPs) involving the identification of parameters in PDEs, using the framework of Gaussian processes. The proposed approach (1) provides a natural generalization of collocation kernel methods to nonlinear PDEs and IPs, (2) has guaranteed convergence with a path to compute error bounds in the PDE setting, and (3) inherits the state-of-the-art computational complexity of linear solvers for dense kernel matrices. The main idea of our method is to approximate the solution of a given PDE with a MAP estimator of a Gaussian process given the observation of the PDE at a finite number of collocation points. Although this optimization problem is infinite-dimensional, it can be reduced to a finite-dimensional one by introducing additional variables corresponding to the values of the derivatives of the solution at collocation points; this generalizes the representer theorem arising in Gaussian process regression. The reduced optimization problem has a quadratic loss and nonlinear constraints, and it is in turn solved with a variant of the Gauss-Newton method. The resulting algorithm (a) can be interpreted as solving successive linearizations of the nonlinear PDE, and (b) is found in practice to converge in a small number (two to ten) of iterations in experiments conducted on a range of PDEs. For IPs, while the traditional approach has been to iterate between the identifications of parameters in the PDE and the numerical approximation of its solution, our algorithm tackles both simultaneously. Experiments on nonlinear elliptic PDEs, Burgers' equation, a regularized Eikonal equation, and an IP for permeability identification in Darcy flow illustrate the efficacy and scope of our framework.


Fast Statistical Leverage Score Approximation in Kernel Ridge Regression

arXiv.org Machine Learning

Nystr\"om approximation is a fast randomized method that rapidly solves kernel ridge regression (KRR) problems through sub-sampling the n-by-n empirical kernel matrix appearing in the objective function. However, the performance of such a sub-sampling method heavily relies on correctly estimating the statistical leverage scores for forming the sampling distribution, which can be as costly as solving the original KRR. In this work, we propose a linear time (modulo poly-log terms) algorithm to accurately approximate the statistical leverage scores in the stationary-kernel-based KRR with theoretical guarantees. Particularly, by analyzing the first-order condition of the KRR objective, we derive an analytic formula, which depends on both the input distribution and the spectral density of stationary kernels, for capturing the non-uniformity of the statistical leverage scores. Numerical experiments demonstrate that with the same prediction accuracy our method is orders of magnitude more efficient than existing methods in selecting the representative sub-samples in the Nystr\"om approximation.


Accumulations of Projections--A Unified Framework for Random Sketches in Kernel Ridge Regression

arXiv.org Machine Learning

Building a sketch of an n-by-n empirical kernel matrix is a common approach to accelerate the computation of many kernel methods. In this paper, we propose a unified framework of constructing sketching methods in kernel ridge regression (KRR), which views the sketching matrix S as an accumulation of m rescaled sub-sampling matrices with independent columns. Our framework incorporates two commonly used sketching methods, sub-sampling sketches (known as the Nystr\"om method) and sub-Gaussian sketches, as special cases with m=1 and m=infinity respectively. Under the new framework, we provide a unified error analysis of sketching approximation and show that our accumulation scheme improves the low accuracy of sub-sampling sketches when certain incoherence characteristic is high, and accelerates the more accurate but computationally heavier sub-Gaussian sketches. By optimally choosing the number m of accumulations, we show that a best trade-off between computational efficiency and statistical accuracy can be achieved. In practice, the sketching method can be as efficiently implemented as the sub-sampling sketches, as only minor extra matrix additions are needed. Our empirical evaluations also demonstrate that the proposed method may attain the accuracy close to sub-Gaussian sketches, while is as efficient as sub-sampling-based sketches.


Consistency of Empirical Bayes And Kernel Flow For Hierarchical Parameter Estimation

arXiv.org Machine Learning

Hierarchical modeling and learning has proven very powerful in the field of Gaussian process regression and kernel methods, especially for machine learning applications and, increasingly, within the field of inverse problems more generally. The classical approach to learning hierarchical information is through Bayesian formulations of the problem, implying a posterior distribution on the hierarchical parameters or, in the case of empirical Bayes, providing an optimization criterion for them. Recent developments in the machine learning literature have suggested new criteria for hierarchical learning, based on approximation theoretic considerations that can be interpreted as variants of cross-validation, and exploiting approximation consistency in data splitting. The purpose of this paper is to compare the empirical Bayesian and approximation theoretic approaches to hierarchical learning, in terms of large data consistency, variance of estimators, robustness of the estimators to model misspecification, and computational cost. Our analysis is rooted in the setting of Mat\'ern-like Gaussian random field priors, with smoothness, amplitude and inverse lengthscale as hierarchical parameters, in the regression setting. Numerical experiments validate the theory and extend the scope of the paper beyond the Mat\'ern setting.


One-to-one Mapping for Unpaired Image-to-image Translation

arXiv.org Artificial Intelligence

Recently image-to-image translation has attracted significant interests in the literature, starting from the successful use of the generative adversarial network (GAN), to the introduction of cyclic constraint, to extensions to multiple domains. However, in existing approaches, there is no guarantee that the mapping between two image domains is unique or one-to-one. Here we propose a self-inverse network learning approach for unpaired image-to-image translation. Building on top of CycleGAN, we learn a self-inverse function by simply augmenting the training samples by swapping inputs and outputs during training and with separated cycle consistency loss for each mapping direction. The outcome of such learning is a proven one-to-one mapping function. Our extensive experiments on a variety of datasets, including cross-modal medical image synthesis, object transfiguration, and semantic labeling, consistently demonstrate clear improvement over the CycleGAN method both qualitatively and quantitatively. Especially our proposed method reaches the state-of-the-art result on the cityscapes benchmark dataset for the label to photo unpaired directional image translation.


Towards Learning a Self-inverse Network for Bidirectional Image-to-image Translation

arXiv.org Artificial Intelligence

The one-to-one mapping is necessary for many bidirectional image-to-image translation applications, such as MRI image synthesis as MRI images are unique to the patient. State-of-the-art approaches for image synthesis from domain X to domain Y learn a convolutional neural network that meticulously maps between the domains. A different network is typically implemented to map along the opposite direction, from Y to X. In this paper, we explore the possibility of only wielding one network for bi-directional image synthesis. In other words, such an autonomous learning network implements a self-inverse function. A self-inverse network shares several distinct advantages: only one network instead of two, better generalization and more restricted parameter space. Most importantly, a self-inverse function guarantees a one-to-one mapping, a property that cannot be guaranteed by earlier approaches that are not self-inverse. The experiments on three datasets show that, compared with the baseline approaches that use two separate models for the image synthesis along two directions, our self-inverse network achieves better synthesis results in terms of standard metrics. Finally, our sensitivity analysis confirms the feasibility of learning a self-inverse function for the bidirectional image translation.


Regularized Ensembles and Transferability in Adversarial Learning

arXiv.org Machine Learning

Despite the considerable success of convolutional neural networks in a broad array of domains, recent research has shown these to be vulnerable to small adversarial perturbations, commonly known as adversarial examples. Moreover, such examples have shown to be remarkably portable, or transferable, from one model to another, enabling highly successful black-box attacks. We explore this issue of transferability and robustness from two dimensions: first, considering the impact of conventional $l_p$ regularization as well as replacing the top layer with a linear support vector machine (SVM), and second, the value of combining regularized models into an ensemble. We show that models trained with different regularizers present barriers to transferability, as does partial information about the models comprising the ensemble.


A post-processing method to improve the white matter hyperintensity segmentation accuracy for randomly-initialized U-net

arXiv.org Machine Learning

White matter hyperintensity (WMH) is commonly found in elder individuals and appears to be associated with brain diseases. U-net is a convolutional network that has been widely used for biomedical image segmentation. Recently, U-net has been successfully applied to WMH segmentation. Random initialization is usally used to initialize the model weights in the U-net. However, the model may coverage to different local optima with different randomly initialized weights. We find a combination of thresholding and averaging the outputs of U-nets with different random initializations can largely improve the WMH segmentation accuracy. Based on this observation, we propose a post-processing technique concerning the way how averaging and thresholding are conducted. Specifically, we first transfer the score maps from three U-nets to binary masks via thresholding and then average those binary masks to obtain the final WMH segmentation. Both quantitative analysis (via the Dice similarity coefficient) and qualitative analysis (via visual examinations) reveal the superior performance of the proposed method. This post-processing technique is independent of the model used. As such, it can also be applied to situations where other deep learning models are employed, especially when random initialization is adopted and pre-training is unavailable.


Run-and-Inspect Method for Nonconvex Optimization and Global Optimality Bounds for R-Local Minimizers

arXiv.org Machine Learning

Many optimization algorithms converge to stationary points. When the underlying problem is nonconvex, they may get trapped at local minimizers and occasionally stagnate near saddle points. We propose the Run-and-Inspect Method, which adds an "inspect" phase to existing algorithms that helps escape from non-global stationary points. The inspection samples a set of points in a radius $R$ around the current point. When a sample point yields a sufficient decrease in the objective, we move there and resume an existing algorithm. If no sufficient decrease is found, the current point is called an approximate $R$-local minimizer. We show that an $R$-local minimizer is globally optimal, up to a specific error depending on $R$, if the objective function can be implicitly decomposed into a smooth convex function plus a restricted function that is possibly nonconvex, nonsmooth. For high-dimensional problems, we introduce blockwise inspections to overcome the curse of dimensionality while still maintaining optimality bounds up to a factor equal to the number of blocks. Our method performs well on a set of artificial and realistic nonconvex problems by coupling with gradient descent, coordinate descent, EM, and prox-linear algorithms.