Goto

Collaborating Authors

 Country


Incorporating Expert Prior Knowledge into Experimental Design via Posterior Sampling

arXiv.org Machine Learning

Scientific experiments are usually expensive due to complex experimental preparation and processing. Experimental design is therefore involved with the task of finding the optimal experimental input that results in the desirable output by using as few experiments as possible. Experimenters can often acquire the knowledge about the location of the global optimum. However, they do not know how to exploit this knowledge to accelerate experimental design. In this paper, we adopt the technique of Bayesian optimization for experimental design since Bayesian optimization has established itself as an efficient tool for optimizing expensive black-box functions. Again, it is unknown how to incorporate the expert prior knowledge about the global optimum into Bayesian optimization process. To address it, we represent the expert knowledge about the global optimum via placing a prior distribution on it and we then derive its posterior distribution. An efficient Bayesian optimization approach has been proposed via posterior sampling on the posterior distribution of the global optimum. We theoretically analyze the convergence of the proposed algorithm and discuss the robustness of incorporating expert prior. We evaluate the efficiency of our algorithm by optimizing synthetic functions and tuning hyperparameters of classifiers along with a real-world experiment on the synthesis of short polymer fiber. The results clearly demonstrate the advantages of our proposed method.


An Optimal Statistical and Computational Framework for Generalized Tensor Estimation

arXiv.org Machine Learning

This paper describes a flexible framework for generalized low-rank tensor estimation problems that includes many important instances arising from applications in computational imaging, genomics, and network analysis. The proposed estimator consists of finding a low-rank tensor fit to the data under generalized parametric models. To overcome the difficulty of non-convexity in these problems, we introduce a unified approach of projected gradient descent that adapts to the underlying low-rank structure. Under mild conditions on the loss function, we establish both an upper bound on statistical error and the linear rate of computational convergence through a general deterministic analysis. Then we further consider a suite of generalized tensor estimation problems, including sub-Gaussian tensor denoising, tensor regression, and Poisson and binomial tensor PCA. We prove that the proposed algorithm achieves the minimax optimal rate of convergence in estimation error. Finally, we demonstrate the superiority of the proposed framework via extensive experiments on both simulated and real data.


Supervised Categorical Metric Learning with Schatten p-Norms

arXiv.org Machine Learning

Metric learning has been successful in learning new metrics adapted to numerical datasets. However, its development on categorical data still needs further exploration. In this paper, we propose a method, called CPML for \emph{categorical projected metric learning}, that tries to efficiently~(i.e. less computational time and better prediction accuracy) address the problem of metric learning in categorical data. We make use of the Value Distance Metric to represent our data and propose new distances based on this representation. We then show how to efficiently learn new metrics. We also generalize several previous regularizers through the Schatten $p$-norm and provides a generalization bound for it that complements the standard generalization bound for metric learning. Experimental results show that our method provides


Attacks Which Do Not Kill Training Make Adversarial Learning Stronger

arXiv.org Machine Learning

Adversarial training based on the minimax formulation is necessary for obtaining adversarial robustness of trained models. However, it is conservative or even pessimistic so that it sometimes hurts the natural generalization. In this paper, we raise a fundamental question---do we have to trade off natural generalization for adversarial robustness? We argue that adversarial training is to employ confident adversarial data for updating the current model. We propose a novel approach of friendly adversarial training (FAT): rather than employing most adversarial data maximizing the loss, we search for least adversarial (i.e., friendly adversarial) data minimizing the loss, among the adversarial data that are confidently misclassified. Our novel formulation is easy to implement by just stopping the most adversarial data searching algorithms such as PGD (projected gradient descent) early, which we call early-stopped PGD. Theoretically, FAT is justified by an upper bound of the adversarial risk. Empirically, early-stopped PGD allows us to answer the earlier question negatively---adversarial robustness can indeed be achieved without compromising the natural generalization.


Deep Learning and Statistical Models for Time-Critical Pedestrian Behaviour Prediction

arXiv.org Machine Learning

The time it takes for a classifier to make an accurate prediction can be crucial in many behaviour recognition problems. For example, an autonomous vehicle should detect hazardous pedestrian behaviour early enough for it to take appropriate measures. In this context, we compare the switching linear dynamical system (SLDS) and a three-layered bi-directional long short-term memory (LSTM) neural network, which are applied to infer pedestrian behaviour from motion tracks. We show that, though the neural network model achieves an accuracy of 80%, it requires long sequences to achieve this (100 samples or more). The SLDS, has a lower accuracy of 74%, but it achieves this result with short sequences (10 samples). To our knowledge, such a comparison on sequence length has not been considered in the literature before. The results provide a key intuition of the suitability of the models in time-critical problems.


Device Heterogeneity in Federated Learning: A Superquantile Approach

arXiv.org Machine Learning

The proliferation of mobile phones, wearables and edge devices has led to an unprecedented growth in the generation of user interaction data. Systems which tap into the power of this rich data while respecting the privacy of users are geared to lead the next generation of intelligent applications and devices. The leading distributed learning framework in this setting is federated learning [30]. In federated learning, a number of client devices with privacy-sensitive data collaboratively learn a machine learning model under the orchestration of a central server, while keeping their data decentralized. This is achieved by pushing the actual computation to the devices while the server coordinates with the devices for aggregation of model updates.


Convex Geometry and Duality of Over-parameterized Neural Networks

arXiv.org Machine Learning

We develop a convex analytic framework for ReLU neural networks which elucidates the inner workings of hidden neurons and their function space characteristics. We show that neural networks with rectified linear units act as convex regularizers, where simple solutions are encouraged via extreme points of a certain convex set. For one dimensional regression and classification, as well as rank-one data matrices, we prove that finite two-layer ReLU networks with norm regularization yield linear spline interpolation. We characterize the classification decision regions in terms of a closed form kernel matrix and minimum L1 norm solutions. This is in contrast to Neural Tangent Kernel which is unable to explain neural network predictions with finitely many neurons. Our convex geometric description also provides intuitive explanations of hidden neurons as auto-encoders. In higher dimensions, we show that the training problem for two-layer networks can be cast as a convex optimization problem with infinitely many constraints. We then provide a family of convex relaxations to approximate the solution, and a cutting-plane algorithm to improve the relaxations. We derive conditions for the exactness of the relaxations and provide simple closed form formulas for the optimal neural network weights in certain cases. We also establish a connection to $\ell_0$-$\ell_1$ equivalence for neural networks analogous to the minimal cardinality solutions in compressed sensing. Extensive experimental results show that the proposed approach yields interpretable and accurate models.


EmbPred30: Assessing 30-days Readmission for Diabetic Patients using Categorical Embeddings

arXiv.org Machine Learning

Diabetes is a disease-causing high level of blood sugar. In type 1 Diabetes, body doesn't produce insulin, but if injected from external sources, will use it and in type 2, the body doesn't produce as well as use insulin. It is estimated that 30.3 million people of all ages in the US are suffering from Diabetes as of 2015, out of which 7.2 million are unaware[1]. As of 2016, it is ranked seventh in the list of global causes of mortality. Diabetes can be an underlying cause for many cardiovascular diseases, retinopathy, and nephropathy leading to frequent readmission in the hospital. The Centers for Medicare and Medicaid Services(CMS) labeled a 30-day readmission rate as a measure of healthcare quality offered by the hospital in order to provide the best inpatient care and improve the healthcare quality. Hospitals with high readmission rates will be penalized as per the Patient Protection and Affordable Care Act(ACA) of 2010[2]. During the recent studies[19], it was observed that a 30-day readmission rate for patients with Diabetes ranges between 14.4%-22.7%,


Reliable Estimation of Kullback-Leibler Divergence by Controlling Discriminator Complexity in the Reproducing Kernel Hilbert Space

arXiv.org Machine Learning

Several scalable methods to compute the Kullback Leibler (KL) divergence between two distributions using their samples have been proposed and applied in large-scale machine learning models. While they have been found to be unstable, the theoretical root cause of the problem is not clear. In this paper, we study in detail a generative adversarial network based approach that uses a neural network discriminator to estimate KL divergence. We argue that, in such case, high fluctuations in the estimates are a consequence of not controlling the complexity of the discriminator function space. We provide a theoretical underpinning and remedy for this problem through the following contributions. First, we construct a discriminator in the Reproducing Kernel Hilbert Space (RKHS). This enables us to leverage sample complexity and mean embedding to theoretically relate the error probability bound of the KL estimates to the complexity of the neural-net discriminator. Based on this theory, we then present a scalable way to control the complexity of the discriminator for a consistent estimation of KL divergence. We support both our proposed theory and method to control the complexity of the RKHS discriminator in controlled experiments.


Information Directed Sampling for Linear Partial Monitoring

arXiv.org Machine Learning

Partial monitoring is a rich framework for sequential decision making under uncertainty that generalizes many well known bandit models, including linear, combinatorial and dueling bandits. We introduce information directed sampling (IDS) for stochastic partial monitoring with a linear reward and observation structure. IDS achieves adaptive worst-case regret rates that depend on precise observability conditions of the game. Moreover, we prove lower bounds that classify the minimax regret of all finite games into four possible regimes. IDS achieves the optimal rate in all cases up to logarithmic factors, without tuning any hyper-parameters. We further extend our results to the contextual and the kernelized setting, which significantly increases the range of possible applications.