Goto

Collaborating Authors

 Cai, Zhiqiang


ReLU neural network approximation to piecewise constant functions

arXiv.org Artificial Intelligence

This paper studies the approximation property of ReLU neural networks (NNs) to piecewise constant functions with unknown interfaces in bounded regions in $\mathbb{R}^d$. Under the assumption that the discontinuity interface $\Gamma$ may be approximated by a connected series of hyperplanes with a prescribed accuracy $\varepsilon >0$, we show that a three-layer ReLU NN is sufficient to accurately approximate any piecewise constant function and establish its error bound. Moreover, if the discontinuity interface is convex, an analytical formula of the ReLU NN approximation with exact weights and biases is provided.


Fast Iterative Solver For Neural Network Method: II. 1D Diffusion-Reaction Problems And Data Fitting

arXiv.org Artificial Intelligence

This paper expands the damped block Newton (dBN) method introduced recently in [4] for 1D diffusion-reaction equations and least-squares data fitting problems. To determine the linear parameters (the weights and bias of the output layer) of the neural network (NN), the dBN method requires solving systems of linear equations involving the mass matrix. While the mass matrix for local hat basis functions is tri-diagonal and well-conditioned, the mass matrix for NNs is dense and ill-conditioned. For example, the condition number of the NN mass matrix for quasi-uniform meshes is at least ${\cal O}(n^4)$. We present a factorization of the mass matrix that enables solving the systems of linear equations in ${\cal O}(n)$ operations. To determine the non-linear parameters (the weights and bias of the hidden layer), one step of a damped Newton method is employed at each iteration. A Gauss-Newton method is used in place of Newton for the instances in which the Hessian matrices are singular. This modified dBN is referred to as dBGN. For both methods, the computational cost per iteration is ${\cal O}(n)$. Numerical results demonstrate the ability dBN and dBGN to efficiently achieve accurate results and outperform BFGS for select examples.


Weak Generative Sampler to Efficiently Sample Invariant Distribution of Stochastic Differential Equation

arXiv.org Artificial Intelligence

Sampling invariant distributions from an Ito diffusion process presents a significant challenge in stochastic simulation. Traditional numerical solvers for stochastic differential equations require both a fine step size and a lengthy simulation period, resulting in both biased and correlated samples. Current deep learning-based method solves the stationary Fokker--Planck equation to determine the invariant probability density function in form of deep neural networks, but they generally do not directly address the problem of sampling from the computed density function. In this work, we introduce a framework that employs a weak generative sampler (WGS) to directly generate independent and identically distributed (iid) samples induced by a transformation map derived from the stationary Fokker--Planck equation. Our proposed loss function is based on the weak form of the Fokker--Planck equation, integrating normalizing flows to characterize the invariant distribution and facilitate sample generation from the base distribution. Our randomized test function circumvents the need for mini-max optimization in the traditional weak formulation. Distinct from conventional generative models, our method neither necessitates the computationally intensive calculation of the Jacobian determinant nor the invertibility of the transformation map. A crucial component of our framework is the adaptively chosen family of test functions in the form of Gaussian kernel functions with centres selected from the generated data samples. Experimental results on several benchmark examples demonstrate the effectiveness of our method, which offers both low computational costs and excellent capability in exploring multiple metastable states.


A Structure-Guided Gauss-Newton Method for Shallow ReLU Neural Network

arXiv.org Artificial Intelligence

In this paper, we propose a structure-guided Gauss-Newton (SgGN) method for solving least squares problems using a shallow ReLU neural network. The method effectively takes advantage of both the least squares structure and the neural network structure of the objective function. By categorizing the weights and biases of the hidden and output layers of the network as nonlinear and linear parameters, respectively, the method iterates back and forth between the nonlinear and linear parameters. The nonlinear parameters are updated by a damped Gauss-Newton method and the linear ones are updated by a linear solver. Moreover, at the Gauss-Newton step, a special form of the Gauss-Newton matrix is derived for the shallow ReLU neural network and is used for efficient iterations. It is shown that the corresponding mass and Gauss-Newton matrices in the respective linear and nonlinear steps are symmetric and positive definite under reasonable assumptions. Thus, the SgGN method naturally produces an effective search direction without the need of additional techniques like shifting in the Levenberg-Marquardt method to achieve invertibility of the Gauss-Newton matrix. The convergence and accuracy of the method are demonstrated numerically for several challenging function approximation problems, especially those with discontinuities or sharp transition layers that pose significant challenges for commonly used training algorithms in machine learning.


Qubit-Wise Architecture Search Method for Variational Quantum Circuits

arXiv.org Artificial Intelligence

To develop a strategy to design VQC in an automated way, i.e. quantum architecture search (QAS), some researchers Considering the noise level limit, one crucial aspect have turned their attention to the classical Neural Architecture for quantum machine learning is to design a highperforming Search (NAS) framework. NAS focuses on automating variational quantum circuit architecture the design of neural network structures [Elsken et al., with small number of quantum gates. As the classical 2019], but often grapple with the challenge of evaluating a neural architecture search (NAS), quantum architecture vast number of possible network architectures. The Monte search methods (QAS) employ methods Carlo Tree Search (MCTS) algorithm addresses this issue by like reinforcement learning, evolutionary algorithms iteratively exploring and evaluating segments of the search and supernet optimization to improve the space, thereby identifying promising neural network structures search efficiency. In this paper, we propose a novel without exhaustive enumeration [Silver et al., 2016; qubit-wise architecture search (QWAS) method, Wang et al., 2020]. However, the efficiency of the search is which progressively search one-qubit configuration significantly influenced by the manually predefined action per stage, and combine with Monte Carlo Tree space before the tree construction. To address this issue, Search algorithm to find good quantum architectures [Wang et al., 2021] proposed an improved MCTS-based algorithm by partitioning the search space into several called Latent Action Neural Architecture Search good and bad subregions. The numerical experimental (LaNAS) that learns a latent action space that best fits the results indicate that our proposed method can problem to be solved.


Least-Squares Neural Network (LSNN) Method For Scalar Nonlinear Hyperbolic Conservation Laws: Discrete Divergence Operator

arXiv.org Artificial Intelligence

A least-squares neural network (LSNN) method was introduced for solving scalar linear and nonlinear hyperbolic conservation laws (HCLs) in [7, 6]. This method is based on an equivalent least-squares (LS) formulation and uses ReLU neural network as approximating functions, making it ideal for approximating discontinuous functions with unknown interface location. In the design of the LSNN method for HCLs, the numerical approximation of differential operators is a critical factor, and standard numerical or automatic differentiation along coordinate directions can often lead to a failed NN-based method. To overcome this challenge, this paper rewrites HCLs in their divergence form of space and time and introduces a new discrete divergence operator. As a result, the proposed LSNN method is free of penalization of artificial viscosity. Theoretically, the accuracy of the discrete divergence operator is estimated even for discontinuous solutions. Numerically, the LSNN method with the new discrete divergence operator was tested for several benchmark problems with both convex and non-convex fluxes, and was able to compute the correct physical solution for problems with rarefaction, shock or compound waves. The method is capable of capturing the shock of the underlying problem without oscillation or smearing, even without any penalization of the entropy condition, total variation, and/or artificial viscosity.


Residual-Quantile Adjustment for Adaptive Training of Physics-informed Neural Network

arXiv.org Artificial Intelligence

Adaptive training methods for Physics-informed neural network (PINN) require dedicated constructions of the distribution of weights assigned at each training sample. To efficiently seek such an optimal weight distribution is not a simple task and most existing methods choose the adaptive weights based on approximating the full distribution or the maximum of residuals. In this paper, we show that the bottleneck in the adaptive choice of samples for training efficiency is the behavior of the tail distribution of the numerical residual. Thus, we propose the Residual-Quantile Adjustment (RQA) method for a better weight choice for each training sample. After initially setting the weights proportional to the $p$-th power of the residual, our RQA method reassign all weights above $q$-quantile ($90\%$ for example) to the median value, so that the weight follows a quantile-adjusted distribution derived from the residuals. This iterative reweighting technique, on the other hand, is also very easy to implement. Experiment results show that the proposed method can outperform several adaptive methods on various partial differential equation (PDE) problems.


Deep least-squares methods: an unsupervised learning-based numerical method for solving elliptic PDEs

arXiv.org Machine Learning

The approach makes use of the deep neural network to approximate solutions of PDEs through the compositional construction and employs least-squares functionals as loss functions to determine parameters of the deep neural network. There are various least-squares functionals for a partial differential equation. This paper focuses on the so-called first-order system least-squares (FOSLS) functional studied in [3], which is based on a first-order system of scalar second-order elliptic PDEs. Numerical results for second-order elliptic PDEs in one dimension are presented.


Comparison of Google Translation with Human Translation

AAAI Conferences

Google Translate provides a multilingual machine-translation service by automatically translating one written language to another. Google translate is allegedly limited in its accuracy in translation, however. This study investigated the accuracy of Google Chinese-to-English translation from the perspectives of formality and cohesion with two comparisons: Google translation with human expert translation, and Google translation with Chinese source language. The text sample was a collection of 289 spoken and written texts excerpts from the Selected Works of Mao Zedong in both Chinese and English versions. Google translate was used to translate the Chinese texts into English. These texts were analyzed by the automated text analysis tools: the Chinese and English LIWC, and the Chinese and English Coh-Metrix. Results of Pearson correlations on formality and cohesion showed Google English translation was highly correlated with both human English translation and the original Chinese texts.


A Comparative Study on English and Chinese Word Uses with LIWC

AAAI Conferences

This paper compared the linguistic and psychological word uses in English and Chinese languages with LIWC (Linguistic Inquiry and Word Count) programs. A Principal Component Analysis uncovered six linguistic and psychological components, among which five components were significantly correlated. The correlated components were ranked as Negative Valence (r=.92), Embodiment (r=.88), Narrative (r=.68), Achievement (r=.65), and Social Relation (r=.64). However, the results showed the order of the representative features differs in two languages and certain word categories co-occurred with different components in English and Chinese. The differences were interpreted from the perspective of distinctive eastern and western cultures.