Goto

Collaborating Authors

 Country


Exact expressions for double descent and implicit regularization via surrogate random design

arXiv.org Machine Learning

Double descent refers to the phase transition that is exhibited by the generalization error of unregularized learning models when varying the ratio between the number of parameters and the number of training samples. The recent success of highly over-parameterized machine learning models such as deep neural networks has motivated a theoretical analysis of the double descent phenomenon in classical models such as linear regression which can also generalize well in the over-parameterized regime. We build on recent advances in Randomized Numerical Linear Algebra (RandNLA) to provide the first exact non-asymptotic expressions for double descent of the minimum norm linear estimator. Our approach involves constructing what we call a surrogate random design to replace the standard i.i.d. design of the training sample. This surrogate design admits exact expressions for the mean squared error of the estimator while preserving the key properties of the standard design. We also establish an exact implicit regularization result for over-parameterized training samples. In particular, we show that, for the surrogate design, the implicit bias of the unregularized minimum norm estimator precisely corresponds to solving a ridge-regularized least squares problem on the population distribution.


No-Trick (Treat) Kernel Adaptive Filtering using Deterministic Features

arXiv.org Machine Learning

Kernel methods form a powerful, versatile, and theoretically-grounded unifying framework to solve nonlinear problems in signal processing and machine learning. The standard approach relies on the kernel trick to perform pairwise evaluations of a kernel function, which leads to scalability issues for large datasets due to its linear and superlinear growth with respect to the training data. A popular approach to tackle this problem, known as random Fourier features (RFFs), samples from a distribution to obtain the data-independent basis of a higher finite-dimensional feature space, where its dot product approximates the kernel function. Recently, deterministic, rather than random construction has been shown to outperform RFFs, by approximating the kernel in the frequency domain using Gaussian quadrature. In this paper, we view the dot product of these explicit mappings not as an approximation, but as an equivalent positive-definite kernel that induces a new finite-dimensional reproducing kernel Hilbert space (RKHS). This opens the door to no-trick (NT) online kernel adaptive filtering (KAF) that is scalable and robust. Random features are prone to large variances in performance, especially for smaller dimensions. Here, we focus on deterministic feature-map construction based on polynomial-exact solutions and show their superiority over random constructions. Without loss of generality, we apply this approach to classical adaptive filtering algorithms and validate the methodology to show that deterministic features are faster to generate and outperform state-of-the-art kernel methods based on random Fourier features.


Learning Pose Estimation for UAV Autonomous Navigation andLanding Using Visual-Inertial Sensor Data

arXiv.org Machine Learning

Abstract-- In this work, we propose a robust network-in-the-loop control system that allows an Unmanned-Aerial-Vehicles to navigate and land autonomously on a desired target. To estimate the global pose of the aerial vehicle, we develop a deep neural network architecture for visual-inertial odometry, which provides a robust alternative to traditional techniques for autonomous navigation of Unmanned-Aerial-Vehicles. We first provide experimental results on the accuracy of the estimation by comparing the prediction of our model to traditional visual-inertial approaches on the publicly available EuRoC MAV dataset. The results indicate a clear improvement in the accuracy of the pose estimation up to 25% against the baseline. Second, we use Airsim, a simulator available as a plugin for Unreal Engine, to create new datasets of photorealistic images and inertial measurement to train and test our model. We finally integrate the proposed architecture for global localization with the Airsim closed-loop control system, and we provide simulation results for the autonomous landing of the aerial vehicle. I. INTRODUCTION Unmanned-Aerial-Vehicles (UAVs) can provide significant support for many applications, such as rescue operations, environmental monitoring, package delivery, and surveillance. To guarantee a high safety level in the UAV operation, it is crucial to have continuous monitoring of the state of the vehicle. Currently, the most standard techniques deployed for pose estimation are Visual-Inertial Odometry (VIO) [1, 2] and Simultaneous Localization and Mapping (SLAM) [3-5].


A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation

arXiv.org Machine Learning

Q-learning with neural network function approximation (neural Q-learning for short) is among the most prevalent deep reinforcement learning algorithms. Despite its empirical success, the non-asymptotic convergence rate of neural Q-learning remains virtually unknown. In this paper, we present a finite-time analysis of a neural Q-learning algorithm, where the data are generated from a Markov decision process and the action-value function is approximated by a deep ReLU neural network. We prove that neural Q-learning finds the optimal policy with $O(1/\sqrt{T})$ convergence rate if the neural function approximator is sufficiently overparameterized, where $T$ is the number of iterations. To our best knowledge, our result is the first finite-time analysis of neural Q-learning under non-i.i.d. data assumption.


Reducing Catastrophic Forgetting in Modular Neural Networks by Dynamic Information Balancing

arXiv.org Machine Learning

Lifelong learning is a very important step toward realizing robust autonomous artificial agents. Neural networks are the main engine of deep learning, which is the current state-of-the-art technique in formulating adaptive artificial intelligent systems. However, neural networks suffer from catastrophic forgetting when stressed with the challenge of continual learning. We investigate how to exploit modular topology in neural networks in order to dynamically balance the information load between different modules by routing inputs based on the information content in each module so that information interference is minimized. Our dynamic information balancing (DIB) technique adapts a reinforcement learning technique to guide the routing of different inputs based on a reward signal derived from a measure of the information load in each module. Our empirical results show that DIB combined with elastic weight consolidation (EWC) regularization outperforms models with similar capacity and EWC regularization across different task formulations and datasets.


Solving Forward and Inverse Problems Using Autoencoders

arXiv.org Machine Learning

This work develops a model-aware autoencoder networks as a new method for solving scientific forward and inverse problems. Autoencoders are unsupervised neural networks that are able to learn new representations of data through appropriately selected architecture and regularization. The resulting mappings to and from the latent representation can be used to encode and decode the data. In our work, we set the data space to be the parameter space of a parameter of interest we wish to invert for. Further, as a way to encode the underlying physical model into the autoencoder, we enforce the latent space of an autoencoder to be the space of observations of physically-governed phenomena. In doing so, we leverage the well known capability of a deep neural network as a universal function operator to simultaneously obtain both the parameter-to-observation and observation-to-parameter map. The results suggest that this simultaneous learning interacts synergistically to improve the the inversion capability of the autoencoder.


A priori generalization error for two-layer ReLU neural network through minimum norm solution

arXiv.org Machine Learning

We focus on estimating \emph{a priori} generalization error of two-layer ReLU neural networks (NNs) trained by mean squared error, which only depends on initial parameters and the target function, through the following research line. We first estimate \emph{a priori} generalization error of finite-width two-layer ReLU NN with constraint of minimal norm solution, which is proved by \cite{zhang2019type} to be an equivalent solution of a linearized (w.r.t. parameter) finite-width two-layer NN. As the width goes to infinity, the linearized NN converges to the NN in Neural Tangent Kernel (NTK) regime \citep{jacot2018neural}. Thus, we can derive the \emph{a priori} generalization error of two-layer ReLU NN in NTK regime. The distance between NN in a NTK regime and a finite-width NN with gradient training is estimated by \cite{arora2019exact}. Based on the results in \cite{arora2019exact}, our work proves an \emph{a priori} generalization error bound of two-layer ReLU NNs. This estimate uses the intrinsic implicit bias of the minimum norm solution without requiring extra regularity in the loss function. This \emph{a priori} estimate also implies that NN does not suffer from curse of dimensionality, and a small generalization error can be achieved without requiring exponentially large number of neurons. In addition the research line proposed in this paper can also be used to study other properties of the finite-width network, such as the posterior generalization error.


Analysis of the Optimization Landscapes for Overcomplete Representation Learning

arXiv.org Machine Learning

We study nonconvex optimization landscapes for learning overcomplete representations, including learning (i) sparsely used overcomplete dictionaries and (ii) convolutional dictionaries, where these unsupervised learning problems find many applications in high-dimensional data analysis. Despite the empirical success of simple nonconvex algorithms, theoretical justifications of why these methods work so well are far from satisfactory. In this work, we show these problems can be formulated as $\ell^4$-norm optimization problems with spherical constraint, and study the geometric properties of their nonconvex optimization landscapes. For both problems, we show the nonconvex objectives have benign (global) geometric structures, in the sense that every local minimizer is close to one of the target solutions and every saddle point exhibits negative curvature. This discovery enables the development of guaranteed global optimization methods using simple initializations. For both problems, we show the nonconvex objectives have benign geometric structures -- every local minimizer is close to one of the target solutions and every saddle point exhibits negative curvature -- either in the entire space or within a sufficiently large region. This discovery ensures local search algorithms (such as Riemannian gradient descent) with simple initializations approximately find the target solutions. Finally, numerical experiments justify our theoretical discoveries.


Domain-independent Dominance of Adaptive Methods

arXiv.org Machine Learning

From a simplified analysis of adaptive methods, we derive AvaGrad, a new optimizer which outperforms SGD on vision tasks when its adaptability is properly tuned. We observe that the power of our method is partially explained by a decoupling of learning rate and adaptability, greatly simplifying hyperparameter search. In light of this observation, we demonstrate that, against conventional wisdom, Adam can also outperform SGD on vision tasks, as long as the coupling between its learning rate and adaptability is taken into account. In practice, AvaGrad matches the best results, as measured by generalization accuracy, delivered by any existing optimizer (SGD or adaptive) across image classification (CIFAR, ImageNet) and character-level language modelling (Penn Treebank) tasks. This later observation, alongside of AvaGrad's decoupling of hyperparameters, could make it the preferred optimizer for deep learning, replacing both SGD and Adam.


Blockchain Intelligence: When Blockchain Meets Artificial Intelligence

arXiv.org Artificial Intelligence

Blockchain is gaining extensive attention due to its provision of secure and decentralized resource sharing manner. However, the incumbent blockchain systems also suffer from a number of challenges in operational maintenance, quality assurance of smart contracts and malicious behaviour detection of blockchain data. The recent advances in artificial intelligence bring the opportunities in overcoming the above challenges. The integration of blockchain with artificial intelligence can be beneficial to enhance current blockchain systems. This article presents an introduction of the convergence of blockchain and artificial intelligence (namely blockchain intelligence). This article also gives a case study to further demonstrate the feasibility of blockchain intelligence and point out the future directions.