Goto

Collaborating Authors

 Country


Kernel Bi-Linear Modeling for Reconstructing Data on Manifolds: The Dynamic-MRI Case

arXiv.org Machine Learning

This paper establishes a kernel-based framework for reconstructing data on manifolds, tailored to fit the dynamic-(d)MRIdata recovery problem. The proposed methodology exploits simple tangent-space geometries of manifolds in reproducing kernel Hilbert spaces, and follows classical kernel-approximation arguments to form the data-recovery task as a bilinear inverse problem. Departing from mainstream approaches, the proposed methodology uses no training data, employs no graph Laplacian matrix to penalize the optimization task, uses no costly (kernel) preimaging step to map feature points back to the input space, and utilizes complex-valued kernel functions to account for k-space data. The framework is validated on synthetically generated dMRI data, where comparisons against state-of-the-art schemes highlight the rich potential of the proposed approach in data-recovery problems. I. INTRODUCTION High-quality and artifact-free image reconstruction is a perennial task in a plethora of imaging technologies, such as dMRI; a high-fidelity visualization technology with widespread applications in cardiac-cine, dynamic-contrast enhanced and neuroimaging [1], [2].


State-only Imitation with Transition Dynamics Mismatch

arXiv.org Machine Learning

Imitation Learning (IL) is a popular paradigm for training agents to achieve complicated goals by leveraging expert behavior, rather than dealing with the hardships of designing a correct reward function. With the environment modeled as a Markov Decision Process (MDP), most of the existing IL algorithms are contingent on the availability of expert demonstrations in the same MDP as the one in which a new imitator policy is to be learned. This is uncharacteristic of many real-life scenarios where discrepancies between the expert and the imitator MDPs are common, especially in the transition dynamics function. Furthermore, obtaining expert actions may be costly or infeasible, making the recent trend towards state-only IL (where expert demonstrations constitute only states or observations) ever so promising. Building on recent adversarial imitation approaches that are motivated by the idea of divergence minimization, we present a new state-only IL algorithm in this paper. It divides the overall optimization objective into two subproblems by introducing an indirection step and solves the subproblems iteratively. We show that our algorithm is particularly effective when there is a transition dynamics mismatch between the expert and imitator MDPs, while the baseline IL methods suffer from performance degradation. To analyze this, we construct several interesting MDPs by modifying the configuration parameters for the MuJoCo locomotion tasks from OpenAI Gym 1 .


Optimality and Stability in Non-Convex-Non-Concave Min-Max Optimization

arXiv.org Machine Learning

The existence of a saddle point in convex-concave games follows from the celebrated minimax theorem (von Neumann, 1928; Sion et al., 1958) and numerical algorithms for finding it has a long history in optimization (Dem'yanov and Malozemov, 1974; Nemirovsky and Yudin, 1983; Zhang et al., 2019; Lin et al., 2020). Recent success in generative adversarial networks (GANs) (Goodfellow et al., 2014; Heusel et al., 2017) and reinforcement learning (Sutton et al., 1998; Du et al., 2017; Dai et al., 2018) has imposed new challenges for non-convex-non-concave (NCNC) settings. An important question is: what is a local optimal point in min-max optimization? Daskalakis and Panageas (2018) used a local version of saddle points to define local optimality and studied the local convergence behavior of gradient descent (GD) (Arrow et al., 1958) and optimistic gradient descent (OGD) (Popov, 1980; Daskalakis et al., 2018). Following this work, Jin et al. (2019) proposed a new definition of local optimality called local min-max points, and studied how GD converges to such points. One of the difficulties in the NCNC settings is the absence of strong duality (e.g. Boyd and Vandenberghe (2004)), which implies an intrinsic order for min-max optimization, known as Stackelberg games (von Stackelberg, 1934) where a leader takes action first and a follower acts upon the leader's strategy. The global solution to Stackelberg games is well-defined, which we call a global min-max point (a.k.a.


Tensor Decompositions in Deep Learning

arXiv.org Machine Learning

The paper surveys the topic of tensor decompositions in modern machine learning applications. It focuses on three active research topics of significant relevance for the community. After a brief review of consolidated works on multi-way data analysis, we consider the use of tensor decompositions in compressing the parameter space of deep learning models. Lastly, we discuss how tensor methods can be leveraged to yield richer adaptive representations of complex data, including structured information. The paper concludes with a discussion on interesting open research challenges.


Representation Learning Through Latent Canonicalizations

arXiv.org Machine Learning

We seek to learn a representation on a large annotated data source that generalizes to a target domain using limited new supervision. Many prior approaches to this problem have focused on learning "disentangled" representations so that as individual factors vary in a new domain, only a portion of the representation need be updated. In this work, we seek the generalization power of disentangled representations, but relax the requirement of explicit latent disentanglement and instead encourage linearity of individual factors of variation by requiring them to be manipulable by learned linear transformations. We dub these transformations latent canonicalizers, as they aim to modify the value of a factor to a pre-determined (but arbitrary) canonical value (e.g., recoloring the image foreground to black). Assuming a source domain with access to meta-labels specifying the factors of variation within an image, we demonstrate experimentally that our method helps reduce the number of observations needed to generalize to a similar target domain when compared to a number of supervised baselines.


Improving Robustness of Deep-Learning-Based Image Reconstruction

arXiv.org Machine Learning

Deep-learning-based methods for different applications have been shown vulnerable to adversarial examples. These examples make deployment of such models in safety-critical tasks questionable. Use of deep neural networks as inverse problem solvers has generated much excitement for medical imaging including CT and MRI, but recently a similar vulnerability has also been demonstrated for these tasks. We show that for such inverse problem solvers, one should analyze and study the effect of adversaries in the measurement-space, instead of the signal-space as in previous work. In this paper, we propose to modify the training strategy of end-to-end deep-learning-based inverse problem solvers to improve robustness. We introduce an auxiliary network to generate adversarial examples, which is used in a min-max formulation to build robust image reconstruction networks. Theoretically, we show for a linear reconstruction scheme the min-max formulation results in a singular-value(s) filter regularized solution, which suppresses the effect of adversarial examples occurring because of ill-conditioning in the measurement matrix. We find that a linear network using the proposed min-max learning scheme indeed converges to the same solution. In addition, for non-linear Compressed Sensing (CS) reconstruction using deep networks, we show significant improvement in robustness using the proposed approach over other methods. We complement the theory by experiments for CS on two different datasets and evaluate the effect of increasing perturbations on trained networks. We find the behavior for ill-conditioned and well-conditioned measurement matrices to be qualitatively different.


Streaming Active Deep Forest for Evolving Data Stream Classification

arXiv.org Machine Learning

In recent years, Deep Neural Networks (DNNs) have gained progressive momentum in many areas of machine learning. The layer-by-layer process of DNNs has inspired the development of many deep models, including deep ensembles. The most notable deep ensemble-based model is Deep Forest, which can achieve highly competitive performance while having much fewer hyper-parameters comparing to DNNs. In spite of its huge success in the batch learning setting, no effort has been made to adapt Deep Forest to the context of evolving data streams. In this work, we introduce the Streaming Deep Forest (SDF) algorithm, a high-performance deep ensemble method specially adapted to stream classification. We also present the Augmented Variable Uncertainty (AVU) active learning strategy to reduce the labeling cost in the streaming context. We compare the proposed methods to state-of-the-art streaming algorithms in a wide range of datasets. The results show that by following the AVU active learning strategy, SDF with only 70\% of labeling budget significantly outperforms other methods trained with all instances.


Uncertainty Quantification for Sparse Deep Learning

arXiv.org Machine Learning

Deep learning methods continue to have a decided impact on machine learning, both in theory and in practice. Statistical theoretical developments have been mostly concerned with approximability or rates of estimation when recovering infinite dimensional objects (curves or densities). Despite the impressive array of available theoretical results, the literature has been largely silent about uncertainty quantification for deep learning. This paper takes a step forward in this important direction by taking a Bayesian point of view. We study Gaussian approximability of certain aspects of posterior distributions of sparse deep ReLU architectures in non-parametric regression. Building on tools from Bayesian non-parametrics, we provide semi-parametric Bernstein-von Mises theorems for linear and quadratic functionals, which guarantee that implied Bayesian credible regions have valid frequentist coverage. Our results provide new theoretical justifications for (Bayesian) deep learning with ReLU activation functions, highlighting their inferential potential.


Memory-Constrained No-Regret Learning in Adversarial Bandits

arXiv.org Machine Learning

An adversarial bandit problem with memory constraints is studied where only the statistics of a subset of arms can be stored. A hierarchical learning policy that requires only a sublinear order of memory space in terms of the number of arms is developed. Its sublinear regret orders with respect to the time horizon are established for both weak regret and shifting regret. This work appears to be the first on memory-constrained bandit problems under the adversarial setting.


Disentangling Adaptive Gradient Methods from Learning Rates

arXiv.org Machine Learning

We investigate several confounding factors in the evaluation of optimization algorithms for deep learning. Primarily, we take a deeper look at how adaptive gradient methods interact with the learning rate schedule, a notoriously difficult-to-tune hyperparameter which has dramatic effects on the convergence and generalization of neural network training. We introduce a "grafting" experiment which decouples an update's magnitude from its direction, finding that many existing beliefs in the literature may have arisen from insufficient isolation of the implicit schedule of step sizes. Alongside this contribution, we present some empirical and theoretical retrospectives on the generalization of adaptive gradient methods, aimed at bringing more clarity to this space.