Goto

Collaborating Authors

 Li, Yinan


Quantum Compiling with Reinforcement Learning on a Superconducting Processor

arXiv.org Artificial Intelligence

To effectively implement quantum algorithms on noisy intermediate-scale quantum (NISQ) processors is a central task in modern quantum technology. NISQ processors feature tens to a few hundreds of noisy qubits with limited coherence times and gate operations with errors, so NISQ algorithms naturally require employing circuits of short lengths via quantum compilation. Here, we develop a reinforcement learning (RL)-based quantum compiler for a superconducting processor and demonstrate its capability of discovering novel and hardware-amenable circuits with short lengths. We show that for the three-qubit quantum Fourier transformation, a compiled circuit using only seven CZ gates with unity circuit fidelity can be achieved. The compiler is also able to find optimal circuits under device topological constraints, with lengths considerably shorter than those by the conventional method. Our study exemplifies the codesign of the software with hardware for efficient quantum compilation, offering valuable insights for the advancement of RL-based compilers.


HGE: Embedding Temporal Knowledge Graphs in a Product Space of Heterogeneous Geometric Subspaces

arXiv.org Artificial Intelligence

Temporal knowledge graphs represent temporal facts $(s,p,o,\tau)$ relating a subject $s$ and an object $o$ via a relation label $p$ at time $\tau$, where $\tau$ could be a time point or time interval. Temporal knowledge graphs may exhibit static temporal patterns at distinct points in time and dynamic temporal patterns between different timestamps. In order to learn a rich set of static and dynamic temporal patterns and apply them for inference, several embedding approaches have been suggested in the literature. However, as most of them resort to single underlying embedding spaces, their capability to model all kinds of temporal patterns was severely limited by having to adhere to the geometric property of their one embedding space. We lift this limitation by an embedding approach that maps temporal facts into a product space of several heterogeneous geometric subspaces with distinct geometric properties, i.e.\ Complex, Dual, and Split-complex spaces. In addition, we propose a temporal-geometric attention mechanism to integrate information from different geometric subspaces conveniently according to the captured relational and temporal information. Experimental results on standard temporal benchmark datasets favorably evaluate our approach against state-of-the-art models.


Efficient Active Learning Halfspaces with Tsybakov Noise: A Non-convex Optimization Approach

arXiv.org Machine Learning

Active learning [Settles, 2009] is a practical machine learning paradigm motivated by the expensiveness of label annotation costs and the wide availability of unlabeled data. Consider the binary classification setting, where given an instance spaceX and a binary label spaceY = { 1,+1} and a data distributionD overX Y, we would like to learn a classifier that accurately predicts the labels of examples drawn from D. As the performance measure of a classifier h, we define its error rate to be err(h):= P


Provable Advantage of Parameterized Quantum Circuit in Function Approximation

arXiv.org Artificial Intelligence

Understanding the power of parameterized quantum circuits (PQCs) in accomplishing machine learning tasks is one of the most important questions in quantum machine learning. In this paper, we analyze the expressivity of PQCs through the lens of function approximation. Previously established universal approximation theorems for PQCs are mainly nonconstructive, leading us to the following question: How large do the PQCs need to be to approximate the target function up to a given error? We exhibit explicit constructions of data re-uploading PQCs for approximating continuous and smooth functions and establish quantitative approximation error bounds in terms of the width, the depth and the number of trainable parameters of the PQCs. To achieve this, we utilize techniques from quantum signal processing and linear combinations of unitaries to construct PQCs that implement multivariate polynomials. We implement global and local approximation techniques using Bernstein polynomials and local Taylor expansion and analyze their performances in the quantum setting. We also compare our proposed PQCs to nearly optimal deep neural networks in approximating high-dimensional smooth functions, showing that the ratio between model sizes of PQC and deep neural networks is exponentially small with respect to the input dimension. This suggests a potentially novel avenue for showcasing quantum advantages in quantum machine learning.


Noise-Augmented $\ell_0$ Regularization of Tensor Regression with Tucker Decomposition

arXiv.org Machine Learning

Tensor data are multi-dimension arrays. Low-rank decomposition-based regression methods with tensor predictors exploit the structural information in tensor predictors while significantly reducing the number of parameters in tensor regression. We propose a method named NA$_0$CT$^2$ (Noise Augmentation for $\ell_0$ regularization on Core Tensor in Tucker decomposition) to regularize the parameters in tensor regression (TR), coupled with Tucker decomposition. We establish theoretically that NA$_0$CT$^2$ achieves exact $\ell_0$ regularization in linear TR and generalized linear TR on the core tensor from the Tucker decomposition. To our knowledge, NA$_0$CT$^2$ is the first Tucker decomposition-based regularization method in TR to achieve $\ell_0$ in core tensor. NA$_0$CT$^2$ is implemented through an iterative procedure and involves two simple steps in each iteration -- generating noisy data based on the core tensor from the Tucker decomposition of the updated parameter estimate and running a regular GLM on noise-augmented data on vectorized predictors. We demonstrate the implementation of NA$_0$CT$^2$ and its $\ell_0$ regularization effect in both simulation studies and real data applications. The results suggest that NA$_0$CT$^2$ improves predictions compared to other decomposition-based TR approaches, with or without regularization and it also helps to identify important predictors though not designed for that purpose.


Adaptive Noisy Data Augmentation for Regularized Estimation and Inference in Generalized Linear Models

arXiv.org Machine Learning

We propose the AdaPtive Noise Augmentation (PANDA) procedure to regularize the estimation and inference of generalized linear models (GLMs). PANDA iteratively optimizes the objective function given noise augmented data until convergence to obtain the regularized model estimates. The augmented noises are designed to achieve various regularization effects, including $l_0$, bridge (lasso and ridge included), elastic net, adaptive lasso, and SCAD, as well as group lasso and fused ridge. We examine the tail bound of the noise-augmented loss function and establish the almost sure convergence of the noise-augmented loss function and its minimizer to the expected penalized loss function and its minimizer, respectively. We derive the asymptotic distributions for the regularized parameters, based on which, inferences can be obtained simultaneously with variable selection. PANDA exhibits ensemble learning behaviors that help further decrease the generalization error. Computationally, PANDA is easy to code, leveraging existing software for implementing GLMs, without resorting to complicated optimization techniques. We demonstrate the superior or similar performance of PANDA against the existing approaches of the same type of regularizers in simulated and real-life data. We show that the inferences through PANDA achieve nominal or near-nominal coverage and are far more efficient compared to a popular existing post-selection procedure.


Noise-Augmented Privacy-Preserving Empirical Risk Minimization with Dual-purpose Regularizer and Privacy Budget Retrieval and Recycling

arXiv.org Machine Learning

Empirical risk minimization (ERM) is a principle in statistical learning. Through ERM, we can measure the performance of a family of learning algorithms based on a set of observed training data empirically without knowing the true distribution of the data and derive theoretical bounds on the performance. ERM is routinely applied in a wide range of learning problems such as regression, classification, and clustering. In recent years, with the increasing popularity in privacy-preserving machine learning that satisfies formal privacy guarantees such as differential privacy (DP) [10], the topic of privacy-preserving ERM has also been investigated. Generally speaking, differentially private empirical risk minimization (DP-ERM) can be realized by perturbing the output (estimation or prediction), the objective function (input), or iteratively during the algorithmic optimization, given an ERM problem. For output perturbation, randomization mechanisms need to be applied every time a new output is released; for iterative algorithmic perturbation, each iteration incurs a privacy loss, careful planning and implementation of privacy accounting methods to minimize the overall privacy loss is critical. In this paper, we focus on differentially private perturbation of objective functions. Once an objective function is perturbed, the subsequent optimization does not incur additional privacy loss and all outputs generated from the optimization are also differentially private.


Improved Algorithms for Efficient Active Learning Halfspaces with Massart and Tsybakov noise

arXiv.org Machine Learning

We develop a computationally-efficient PAC active learning algorithm for $d$-dimensional homogeneous halfspaces that can tolerate Massart noise~\citep{massart2006risk} and Tsybakov noise~\citep{tsybakov2004optimal}. Specialized to the $\eta$-Massart noise setting, our algorithm achieves an information-theoretic optimal label complexity of $\tilde{O}\left( \frac{d}{(1-2\eta)^2} \mathrm{polylog}(\frac1\epsilon) \right)$ under a wide range of unlabeled data distributions (specifically, the family of "structured distributions" defined in~\citet{diakonikolas2020polynomial}). Under the more challenging Tsybakov noise condition, we identify two subfamilies of noise conditions, under which our algorithm achieves computational efficiency and provide label complexity guarantees strictly lower than passive learning algorithms.


Panda: AdaPtive Noisy Data Augmentation for Regularization of Undirected Graphical Models

arXiv.org Machine Learning

We propose PANDA, an AdaPtive Noise Augmentation technique to regularize estimating and constructing undirected graphical models (UGMs). PANDA iteratively solves MLEs given noise augmented data in the regression-based framework until convergence to achieve the designed regularization effects. The augmented noises can be designed to achieve various regularization effects on graph estimation, including the bridge, elastic net, adaptive lasso, and SCAD penalization; it can also offer group lasso and fused ridge when some nodes belong to the same group. We establish theoretically that the noise-augmented loss functions and its minimizer converge almost surely to the expected penalized loss function and its minimizer, respectively. We derive the asymptotic distributions for the regularized regression coefficients through PANDA in GLMs, based on which, the inferences for the parameters can be obtained simultaneously with variable selection. Our empirical results suggest the inferences achieve nominal or near-nominal coverage and are far more efficient compared to some existing post-selection procedures. On the algorithm level, PANDA can be easily programmed in any standard software without resorting to complicated optimization techniques. We show the non-inferior performance of PANDA in constructing graphs of different types in simulation studies and also apply PANDA to the autism spectrum disorder data to construct a mixed-node graph.


Whiteout: Gaussian Adaptive Noise Regularization in FeedForward Neural Networks

arXiv.org Machine Learning

Noise injection (NI) is an approach to mitigate over-fitting in feedforward neural networks (NNs). The Bernoulli NI procedure as implemented in dropout and shakeout has connections with $l_1$ and $l_2$ regularization on the NN model parameters and demonstrates the efficiency and feasibility of NI in regularizing NNs. We propose whiteout, a new NI regularization technique with adaptive Gaussian noise in NNs. Whiteout is more versatile than dropout and shakeout. We show that the optimization objective function associated with whiteout in generalized linear models has a closed-form penalty term that has connections with a wide range of regularization and includes the bridge, lasso, ridge, and elastic net penalization as special cases; it can be also extended to offer regularization similar to the adaptive lasso and group lasso. We prove that whiteout can also be viewed as robust learning of NNs in the presence of small perturbations in input and hidden nodes. We establish that the noise-perturbed empirical loss function with whiteout converges almost surely to the ideal loss function, and the estimates of NN parameters obtained from minimizing the former loss function are consistent with those obtained from minimizing the ideal loss function. Computationally, whiteout can be easily incorporated in the back-propagation algorithm. The superiority of whiteout over dropout and shakeout in learning NNs with relatively small sized training data is demonstrated using the the LSVT voice rehabilitation data and the LIBRAS hand movement data.