Goto

Collaborating Authors

 Country


Sample-based Distributional Policy Gradient

arXiv.org Machine Learning

Distributional reinforcement learning (DRL) is a recent reinforcement learning framework whose success has been supported by various empirical studies. It relies on the key idea of replacing the expected return with the return distribution, which captures the intrinsic randomness of the long term rewards. Most of the existing literature on DRL focuses on problems with discrete action space and value based methods. In this work, motivated by applications in robotics with continuous action space control settings, we propose sample-based distributional policy gradient (SDPG) algorithm. It models the return distribution using samples via a reparameterization technique widely used in generative modeling and inference. We compare SDPG with the state-of-art policy gradient method in DRL, distributed distributional deterministic policy gradients (D4PG), which has demonstrated state-of-art performance. We apply SDPG and D4PG to multiple OpenAI Gym environments and observe that our algorithm shows better sample efficiency as well as higher reward for most tasks.


iDLG: Improved Deep Leakage from Gradients

arXiv.org Machine Learning

It is widely believed that sharing gradients will not leak private training data in distributed learning systems such as Collaborative Learning and Federated Learning, etc. Recently, Zhu et al. presented an approach which shows the possibility to obtain private training data from the publicly shared gradients. In their Deep Leakage from Gradient (DLG) method, they synthesize the dummy data and corresponding labels with the supervision of shared gradients. However, DLG has difficulty in convergence and discovering the ground-truth labels consistently. In this paper, we find that sharing gradients definitely leaks the ground-truth labels. We propose a simple but reliable approach to extract accurate data from the gradients. Particularly, our approach can certainly extract the ground-truth labels as opposed to DLG, hence we name it Improved DLG (iDLG). Our approach is valid for any differentiable model trained with cross-entropy loss over one-hot labels. We mathematically illustrate how our method can extract ground-truth labels from the gradients and empirically demonstrate the advantages over DLG.


A Group Norm Regularized LRR Factorization Model for Spectral Clustering

arXiv.org Machine Learning

Spectral clustering is a very important and classic graph clustering method. Its clustering results are heavily dependent on affine matrix produced by data. Solving Low-Rank Representation~(LRR) problems is a very effective method to obtain affine matrix. This paper proposes LRR factorization model based on group norm regularization and uses Augmented Lagrangian Method~(ALM) algorithm to solve this model. We adopt group norm regularization to make the columns of the factor matrix sparse, thereby achieving the purpose of low rank. And no Singular Value Decomposition~(SVD) is required, computational complexity of each step is great reduced. We get the affine matrix by different LRR model and then perform cluster testing on synthetic noise data and real data~(Hopkin155 and EYaleB) respectively. Compared to traditional models and algorithms, ours are faster to solve affine matrix and more robust to noise. The final clustering results are better. And surprisingly, the numerical results show that our algorithm converges very fast, and the convergence condition is satisfied in only about ten steps. Group norm regularized LRR factorization model with the algorithm designed for it is effective and fast to obtain a better affine matrix.


Limited Angle Tomography for Transmission X-Ray Microscopy Using Deep Learning

arXiv.org Machine Learning

In transmission X-ray microscopy (TXM) systems, the rotation of a scanned sample might be restricted to a limited angular range to avoid collision to other system parts or high attenuation at certain tilting angles. Image reconstruction from such limited angle data suffers from artifacts due to missing data. In this work, deep learning is applied to limited angle reconstruction in TXMs for the first time. With the challenge to obtain sufficient real data for training, training a deep neural network from synthetic data is investigated. Particularly, the U-Net, the state-of-the-art neural network in biomedical imaging, is trained from synthetic ellipsoid data and multi-category data to reduce artifacts in filtered back-projection (FBP) reconstruction images. The proposed method is evaluated on synthetic data and real scanned chlorella data in $100^\circ$ limited angle tomography. For synthetic test data, the U-Net significantly reduces root-mean-square error (RMSE) from $2.55 \times 10^{-3}$ {\mu}m$^{-1}$ in the FBP reconstruction to $1.21 \times 10^{-3}$ {\mu}m$^{-1}$ in the U-Net reconstruction, and also improves structural similarity (SSIM) index from 0.625 to 0.920. With penalized weighted least square denoising of measured projections, the RMSE and SSIM are further improved to $1.16 \times 10^{-3}$ {\mu}m$^{-1}$ and 0.932, respectively. For real test data, the proposed method remarkably improves the 3-D visualization of the subcellular structures in the chlorella cell, which indicates its important value for nano-scale imaging in biology, nanoscience and materials science.


Taylor Moment Expansion for Continuous-Discrete Gaussian Filtering and Smoothing

arXiv.org Machine Learning

The paper is concerned with non-linear Gaussian filtering and smoothing in continuous-discrete state-space models, where the dynamic model is formulated as an It\^{o} stochastic differential equation (SDE), and the measurements are obtained at discrete time instants. We propose novel Taylor moment expansion (TME) Gaussian filter and smoother which approximate the moments of the SDE with a temporal Taylor expansion. Differently from classical linearisation or It\^{o}--Taylor approaches, the Taylor expansion is formed for the moment functions directly and in time variable, not by using a Taylor expansion on the non-linear functions in the model. We analyse the theoretical properties, including the positive definiteness of the covariance estimate and stability of the TME Gaussian filter and smoother. By numerical experiments, we demonstrate that the proposed TME Gaussian filter and smoother significantly outperform the state-of-the-art methods in terms of estimation accuracy and numerical stability.


Contextual Constrained Learning for Dose-Finding Clinical Trials

arXiv.org Machine Learning

Clinical trials in the medical domain are constrained by budgets. The number of patients that can be recruited is therefore limited. When a patient population is heterogeneous, this creates difficulties in learning subgroup specific responses to a particular drug and especially for a variety of dosages. In addition, patient recruitment can be difficult by the fact that clinical trials do not aim to provide a benefit to any given patient in the trial. In this paper, we propose C3T-Budget, a contextual constrained clinical trial algorithm for dose-finding under both budget and safety constraints. The algorithm aims to maximize drug efficacy within the clinical trial while also learning about the drug being tested. C3T-Budget recruits patients with consideration of the remaining budget, the remaining time, and the characteristics of each group, such as the population distribution, estimated expected efficacy, and estimation credibility. In addition, the algorithm aims to avoid unsafe dosages. These characteristics are further illustrated in a simulated clinical trial study, which corroborates the theoretical analysis and demonstrates an efficient budget usage as well as a balanced learning-treatment trade-off.


To Transfer or Not to Transfer: Misclassification Attacks Against Transfer Learned Text Classifiers

arXiv.org Machine Learning

Transfer learning --- transferring learned knowledge --- has brought a paradigm shift in the way models are trained. The lucrative benefits of improved accuracy and reduced training time have shown promise in training models with constrained computational resources and fewer training samples. Specifically, publicly available text-based models such as GloVe and BERT that are trained on large corpus of datasets have seen ubiquitous adoption in practice. In this paper, we ask, "can transfer learning in text prediction models be exploited to perform misclassification attacks?" As our main contribution, we present novel attack techniques that utilize unintended features learnt in the teacher (public) model to generate adversarial examples for student (downstream) models. To the best of our knowledge, ours is the first work to show that transfer learning from state-of-the-art word-based and sentence-based teacher models increase the susceptibility of student models to misclassification attacks. First, we propose a novel word-score based attack algorithm for generating adversarial examples against student models trained using context-free word-level embedding model. On binary classification tasks trained using the GloVe teacher model, we achieve an average attack accuracy of 97% for the IMDB Movie Reviews and 80% for the Fake News Detection. For multi-class tasks, we divide the Newsgroup dataset into 6 and 20 classes and achieve an average attack accuracy of 75% and 41% respectively. Next, we present length-based and sentence-based misclassification attacks for the Fake News Detection task trained using a context-aware BERT model and achieve 78% and 39% attack accuracy respectively. Thus, our results motivate the need for designing training techniques that are robust to unintended feature learning, specifically for transfer learned models.


A Nonparametric Offpolicy Policy Gradient

arXiv.org Machine Learning

A Nonparametric Off-Policy Policy GradientSamuele Tosatto 1 Jo ao Carvalho 1 Hany Abdulsamad 1 Jan Peters 1,2 1 Technische Universit at Darmstadt 2 Max Planck Institute for Intelligent Systems Abstract Reinforcement learning (RL) algorithms still suffer from high sample complexity despite outstanding recent successes. The need for intensive interactions with the environment is especially observed in many widely popular policy gradient algorithms that perform updates using on-policy samples. The price of such inefficiency becomes evident in real world scenarios such as interaction-driven robot learning, where the success of RL has been rather limited. We address this issue by building on the general sample efficiency of off-policy algorithms. With nonparametric regression and density estimation methods we construct a nonparametric Bellman equation in a principled manner, which allows us to obtain closed-form estimates of the value function, and to analytically express the full policy gradient. We provide a theoretical analysis of our estimate to show that it is consistent under mild smoothness assumptions and empirically show that our approach has better sample efficiency than state-of-the-art policy gradient methods. 1 Introduction Reinforcement learning has made overwhelming progress in recent years (Mnih et al., 2015; Haarnoja et al., 2018; Schulman et al., 2015). However, the vast majority of reinforcement learning approaches are on-policy algorithms with limited applicability to real world scenarios, due to high sample complexity. In contrast, off-policy techniques are theoretically more sample efficient, because they decouple the proceduresPreliminary work. TG NOPG-D DPG TG NOPG-S PWIS Figure 1: Example showing the bias of offline-DPG (left) and the variance of PWIS-G(PO)MDP (right) in the policy-parameter space of a 2d-LQR setting. Both algorithms diverge while they move away from the "on-policy" region.


Gradient Boosting on Decision Trees for Mortality Prediction in Transcatheter Aortic Valve Implantation

arXiv.org Machine Learning

Current prognostic risk scores in cardiac surgery are based on statistics and do not yet benefit from machine learning. Statistical predictors are not robust enough to correctly identify patients who would benefit from Transcatheter Aortic Valve Implantation (TAVI). This research aims to create a machine learning model to predict one-year mortality of a patient after TAVI. We adopt a modern gradient boosting on decision trees algorithm, specifically designed for categorical features. In combination with a recent technique for model interpretations, we developed a feature analysis and selection stage, enabling to identify the most important features for the prediction. We base our prediction model on the most relevant features, after interpreting and discussing the feature analysis results with clinical experts. We validated our model on 270 TAVI cases, reaching an AUC of 0.83. Our approach outperforms several widespread prognostic risk scores, such as logistic EuroSCORE II, the STS risk score and the TAVI2-score, which are broadly adopted by cardiologists worldwide.


On a Generalization of the Average Distance Classifier

arXiv.org Machine Learning

In high dimension, low sample size (HDLSS) settings, the simple average distance classifier based on the Euclidean distance performs poorly if differences between the locations get masked by the scale differences. To rectify this issue, modifications to the average distance classifier was proposed by Chan and Hall (2009). However, the existing classifiers cannot discriminate when the populations differ in other aspects than locations and scales. In this article, we propose some simple transformations of the average distance classifier to tackle this issue. The resulting classifiers perform quite well even when the underlying populations have the same location and scale. The high-dimensional behavior of the proposed classifiers is studied theoretically. Numerical experiments with a variety of simulated as well as real data sets exhibit the usefulness of the proposed methodology. 1 INTRODUCTION Let us consider a classification problem involving two unknown multivariate distribution functions F 1 and F 2 on R D .