Goto

Collaborating Authors

 Country


Exact asymptotics for phase retrieval and compressed sensing with random generative priors

arXiv.org Machine Learning

We consider the problem of compressed sensing and of (real-valued) phase retrieval with random measurement matrix. We derive sharp asymptotics for the information-theoretically optimal performance and for the best known polynomial algorithm for an ensemble of generative priors consisting of fully connected deep neural networks with random weight matrices and arbitrary activations. We compare the performance to sparse separable priors and conclude that generative priors might be advantageous in terms of algorithmic performance. In particular, while sparsity does not allow to perform compressive phase retrieval efficiently close to its information-theoretic limit, it is found that under the random generative prior compressed phase retrieval becomes tractable.


A probability theoretic approach to drifting data in continuous time domains

arXiv.org Machine Learning

December 5, 2019 Abstract The notion of drift refers to the phenomenon that the distribution, which is underlying the observed data, changes over time. Albeit many attempts were made to deal with drift, formal notions of drift are application-dependent and formulated in various degrees of abstraction and mathematical coherence. In this contribution, we provide a probability theoretical framework, that allows a formalization of drift in continuous time, which subsumes popular notions of drift. It gives rise to a new characterization of drift in terms of stochastic dependency between data and time. This particularly intuitive formalization enables us to design a new, efficient drift detection method. Further, it induces a technology, to decompose observed data into a drifting and a non-drifting part. Keywords: Online learning, learning theory, stochastic processes, learning with drift, continuous time models, drift decomposition 1 INTRODUCTION One fundamental assumption in classical machine learning is the fact that observed data are i.i.d. Yet, this assumption is often violated as soon as machine learning faces real world problems: models are subject to seasonal changes, changed demands of individual costumers, ageing of sensors, etc. In such settings, lifelong model adaptation rather than classical batch learning is required for optimum performance. Since drift, i.e. the fact that data is no longer identically distributed, is a major issue in many real-world applications of machine learning, many attempts were made to deal with this setting (Ditzler et al., 2015). Depending on the domain of data and application, the presence of drift is modelled in different ways. As an example, covariate shift refers to the situation of training and test set having different marginal distributions (Gretton et al., 2009). Learning for data streams extends this setting to an unlimited (but usually countable) stream of observed data, mostly in supervised learning scenarios (Gama et al., 2014). Learning technologies for such situations often rely on windowing techniques, and adapt the model based on the characteristics of the data in an observed time window. Active methods explicitly detect drift, usually referring to drift of the classification error, and trigger model adaptation this way, while passive methods continuously adjust the model (Ditzler et al., 2015).


Quantum-Inspired Hamiltonian Monte Carlo for Bayesian Sampling

arXiv.org Machine Learning

Hamiltonian Monte Carlo (HMC) is an efficient Bayesian sampling method that can make distant proposals in the parameter space by simulating a Hamiltonian dynamical system. Despite its popularity in machine learning and data science, HMC is inefficient to sample from spiky and multimodal distributions. Motivated by the energy-time uncertainty relation from quantum mechanics, we propose a Quantum-Inspired Hamiltonian Monte Carlo algorithm (QHMC). This algorithm allows a particle to have a random mass with a probability distribution rather than a fixed mass. We prove the convergence property of QHMC in the spatial domain and in the time sequence. We further show why such a random mass can improve the performance when we sample a broad class of distributions. In order to handle the big training data sets in large-scale machine learning, we develop a stochastic gradient version of QHMC using Nos\'e-Hoover thermostat called QSGNHT, and we also provide theoretical justifications about its steady-state distributions. Finally in the experiments, we demonstrate the effectiveness of QHMC and QSGNHT on synthetic examples, bridge regression, image denoising and neural network pruning. The proposed QHMC and QSGNHT can indeed achieve much more stable and accurate sampling results on the test cases.


Deep Distributional Sequence Embeddings Based on a Wasserstein Loss

arXiv.org Machine Learning

Deep metric learning employs deep neural networks to embed instances into a metric space such that distances between instances of the same class are small and distances between instances from different classes are large. In most existing deep metric learning techniques, the embedding of an instance is given by a feature vector produced by a deep neural network and Euclidean distance or cosine similarity defines distances between these vectors. In this paper, we study deep distributional embeddings of sequences, where the embedding of a sequence is given by the distribution of learned deep features across the sequence. This has the advantage of capturing statistical information about the distribution of patterns within the sequence in the embedding. When embeddings are distributions rather than vectors, measuring distances between embeddings involves comparing their respective distributions. We propose a distance metric based on Wasserstein distances between the distributions and a corresponding loss function for metric learning, which leads to a novel end-to-end trainable embedding model. We empirically observe that distributional embeddings outperform standard vector embeddings and that training with the proposed Wasserstein metric outperforms training with other distance functions.


Active Learning of SVDD Hyperparameter Values

arXiv.org Machine Learning

Support Vector Data Description is a popular method for outlier detection. However, its usefulness largely depends on selecting good hyperparameter values -- a difficult problem that has received significant attention in literature. Existing methods to estimate hyperparameter values are purely heuristic, and the conditions under which they work well are unclear. In this article, we propose LAMA (Local Active Min-Max Alignment), the first principled approach to estimate SVDD hyperparameter values by active learning. The core idea bases on kernel alignment, which we adapt to active learning with small sample sizes. In contrast to many existing approaches, LAMA provides estimates for both SVDD hyperparameters. These estimates are evidence-based, i.e., rely on actual class labels, and come with a quality score. This eliminates the need for manual validation, an issue with current heuristics. LAMA outperforms state-of-the-art competitors in extensive experiments on real-world data. In several cases, LAMA even yields results close to the empirical upper bound.


Distribution-induced Bidirectional Generative Adversarial Network for Graph Representation Learning

arXiv.org Machine Learning

Graph representation learning aims to encode all nodes of a graph into low-dimensional vectors that will serve as input of many compute vision tasks. However, most existing algorithms ignore the existence of inherent data distribution and even noises. This may significantly increase the phenomenon of over-fitting and deteriorate the testing accuracy. In this paper, we propose a Distribution-induced Bidirectional Generative Adversarial Network (named DBGAN) for graph representation learning. Instead of the widely used normal distribution assumption, the prior distribution of latent representation in our DBGAN is estimated in a structure-aware way, which implicitly bridges the graph and feature spaces by prototype learning. Thus discriminative and robust representations are generated for all nodes. Furthermore, to improve their generalization ability while preserving representation ability, the sample-level and distribution-level consistency is well balanced via a bidirectional adversarial learning framework. An extensive group of experiments are then carefully designed and presented, demonstrating that our DBGAN obtains remarkably more favorable trade-off between representation and robustness, and meanwhile is dimension-efficient, over currently available alternatives in various tasks.


ADEPOS: A Novel Approximate Computing Framework for Anomaly Detection Systems and its Implementation in 65nm CMOS

arXiv.org Machine Learning

To overcome the energy and bandwidth limitations of traditional IoT systems, edge computing or information extraction at the sensor node has become popular. However, now it is important to create very low energy information extraction or pattern recognition systems. In this paper, we present an approximate computing method to reduce the computation energy of a specific type of IoT system used for anomaly detection (e.g. in predictive maintenance, epileptic seizure detection, etc). Termed as Anomaly Detection Based Power Savings (ADEPOS), our proposed method uses low precision computing and low complexity neural networks at the beginning when it is easy to distinguish healthy data. However, on the detection of anomalies, the complexity of the network and computing precision are adaptively increased for accurate predictions. We show that ensemble approaches are well suited for adaptively changing network size. To validate our proposed scheme, a chip has been fabricated in UMC65nm process that includes an MSP430 microprocessor along with an on-chip switching mode DC-DC converter for dynamic voltage and frequency scaling. Using NASA bearing dataset for machine health monitoring, we show that using ADEPOS we can achieve 8.95X saving of energy along the lifetime without losing any detection accuracy. The energy savings are obtained by reducing the execution time of the neural network on the microprocessor.


A Variational Perturbative Approach to Planning in Graph-based Markov Decision Processes

arXiv.org Machine Learning

Coordinating multiple interacting agents to achieve a common goal is a difficult task with huge applicability. This problem remains hard to solve, even when limiting interactions to be mediated via a static interaction-graph. We present a novel approximate solution method for multi-agent Markov decision problems on graphs, based on variational perturbation theory. We adopt the strategy of planning via inference, which has been explored in various prior works. We employ a non-trivial extension of a novel high-order variational method that allows for approximate inference in large networks and has been shown to surpass the accuracy of existing variational methods. To compare our method to two state-of-the-art methods for multi-agent planning on graphs, we apply the method different standard GMDP problems. We show that in cases, where the goal is encoded as a non-local cost function, our method performs well, while state-of-the-art methods approach the performance of random guess. In a final experiment, we demonstrate that our method brings significant improvement for synchronization tasks.


Handwriting-Based Gender Classification Using End-to-End Deep Neural Networks

arXiv.org Machine Learning

Handwriting-based gender classification is a well-researched problem that has been approached mainly by traditional machine learning techniques. In this paper, we propose a novel deep learning-based approach for this task. Specifically, we present a convolutional neural network (CNN), which performs automatic feature extraction from a given handwritten image, followed by classification of the writer's gender. Also, we introduce a new dataset of labeled handwritten samples, in Hebrew and English, of 405 participants. Comparing the gender classification accuracy on this dataset against human examiners, our results show that the proposed deep learning-based approach is substantially more accurate than that of humans.


Learning with Multiplicative Perturbations

arXiv.org Machine Learning

Adversarial Training (AT) and Virtual Adversarial Training (VAT) are the regularization techniques that train Deep Neural Networks (DNNs) with adversarial examples generated by adding small but worst-case perturbations to input examples. In this paper, we propose xAT and xVAT, new adversarial training algorithms, that generate multiplicative perturbations to input examples for robust training of DNNs. Such perturbations are much more perceptible and interpretable than their additive counterparts exploited by AT and VAT. Furthermore, the multiplicative perturbations can be generated transductively or inductively while the standard AT and VAT only support a transductive implementation. W e conduct a series of experiments that analyze the behavior of the multiplicative perturbations and demonstrate that xAT and xVAT match or outperform state-of-the-art classification accuracies across multiple established benchmarks while being about 30% faster than their additive counterparts. Furthermore, the resulting DNNs also demonstrate distinct weight distributions. 1. Introduction Over the past few years, Deep Neural Networks (DNNs) have achieved state-of-the-art performance on a wide range of learning tasks. However, the success of DNNs has a high reliance on large sets of labeled examples; when trained on small datasets, DNNs plague to overfitting if not regularized properly. For many practical applications, collecting a large amount of labeled examples is very expensive and/or time-consuming. To address this issue, researchers have investigated a host of techniques, such as Dropout [24], A T [4, 25], V A T [14], and Mixup [29], to regularize the training of DNNs.