Goto

Collaborating Authors

 Genre


Optimal Discriminant Functions Based On Sampled Distribution Distance for Modulation Classification

arXiv.org Machine Learning

In this letter, we derive the optimal discriminant functions for modulation classification based on the sampled distribution distance. The proposed method classifies various candidate constellations using a low complexity approach based on the distribution distance at specific testpoints along the cumulative distribution function. This method, based on the Bayesian decision criteria, asymptotically provides the minimum classification error possible given a set of testpoints. Testpoint locations are also optimized to improve classification performance. The method provides significant gains over existing approaches that also use the distribution of the signal features.


Preference-Based Unawareness

arXiv.org Artificial Intelligence

Unawareness refers to the lack of conception rather than the lack of information. There is a fundamental difference between not knowing about which events obtain and the inability to conceive of some events. Unawareness is an interdisciplinary topic that fascinates economists, computer scientists, logicians, and philosophers alike. Traditionally, computer scientists, logicians and philosophers are interested in epistemic models while most economists are mainly interested in the behavioral implications. In the literature, unawareness has been defined epistemically using syntactic and semantic approaches.


No More Pesky Learning Rates

arXiv.org Machine Learning

The performance of stochastic gradient descent (SGD) depends critically on how learning rates are tuned and decreased over time. We propose a method to automatically adjust multiple learning rates so as to minimize the expected error at any one time. The method relies on local gradient variations across samples. In our approach, learning rates can increase as well as decrease, making it suitable for non-stationary problems. Using a number of convex and non-convex learning tasks, we show that the resulting algorithm matches the performance of SGD or other adaptive approaches with their best settings obtained through systematic search, and effectively removes the need for learning rate tuning.


Canonical dual solutions to nonconvex radial basis neural network optimization problem

arXiv.org Machine Learning

Radial Basis Functions Neural Networks (RBFNNs) are tools widely used in regression problems. One of their principal drawbacks is that the formulation corresponding to the training with the supervision of both the centers and the weights is a highly non-convex optimization problem, which leads to some fundamentally difficulties for traditional optimization theory and methods. This paper presents a generalized canonical duality theory for solving this challenging problem. We demonstrate that by sequential canonical dual transformations, the nonconvex optimization problem of the RBFNN can be reformulated as a canonical dual problem (without duality gap). Both global optimal solution and local extrema can be classified. Several applications to one of the most used Radial Basis Functions, the Gaussian function, are illustrated. Our results show that even for one-dimensional case, the global minimizer of the nonconvex problem may not be the best solution to the RBFNNs, and the canonical dual theory is a promising tool for solving general neural networks training problems.


Gaussian Process Vine Copulas for Multivariate Dependence

arXiv.org Machine Learning

Copulas allow to learn marginal distributions separately from the multivariate dependence structure (copula) that links them together into a density function. Vine factorizations ease the learning of high-dimensional copulas by constructing a hierarchy of conditional bivariate copulas. However, to simplify inference, it is common to assume that each of these conditional bivariate copulas is independent from its conditioning variables. In this paper, we relax this assumption by discovering the latent functions that specify the shape of a conditional copula given its conditioning variables. We learn these functions by following a Bayesian approach based on sparse Gaussian processes with expectation propagation for scalable, approximate inference. Experiments on real-world datasets show that, when modeling all conditional dependencies, we obtain better estimates of the underlying copula of the data.


Clustering validity based on the most similarity

arXiv.org Machine Learning

One basic requirement of many studies is the necessity of classifying data. Clustering is a proposed method for summarizing networks. Clustering methods can be divided into two categories named model-based approaches and algorithmic approaches. Since the most of clustering methods depend on their input parameters, it is important to evaluate the result of a clustering algorithm with its different input parameters, to choose the most appropriate one. There are several clustering validity techniques based on inner density and outer density of clusters that represent different metrics to choose the most appropriate clustering independent of the input parameters. According to dependency of previous methods on the input parameters, one challenge in facing with large systems, is to complete data incrementally that effects on the final choice of the most appropriate clustering. Those methods define the existence of high intensity in a cluster, and low intensity among different clusters as the measure of choosing the optimal clustering. This measure has a tremendous problem, not availing all data at the first stage. In this paper, we introduce an efficient measure in which maximum number of repetitions for various initial values occurs.


Layer-wise learning of deep generative models

arXiv.org Machine Learning

When using deep, multi-layered architectures to build generative models of data, it is difficult to train all layers at once. We propose a layer-wise training procedure admitting a performance guarantee compared to the global optimum. It is based on an optimistic proxy of future performance, the best latent marginal. We interpret auto-encoders in this setting as generative models, by showing that they train a lower bound of this criterion. We test the new learning procedure against a state of the art method (stacked RBMs), and find it to improve performance. Both theory and experiments highlight the importance, when training deep architectures, of using an inference model (from data to hidden variables) richer than the generative model (from hidden variables to data).


Density Ratio Hidden Markov Models

arXiv.org Machine Learning

Masashi Sugiyama Department of Computer Science Tokyo Institute of Technology Tokyo 152-8552, Japan sugi@cs.titech.ac.jp Abstract Hidden Markov models and their variants are the predominant sequential classification method in such domains as speech recognition, bioinformatics and natural language processing. Being generative rather than discriminative models, however, their classification performance is a drawback. In this paper we apply ideas from the field of density ratio estimation to bypass the difficult step of learning likelihood functions in HMMs. By reformulating inference and model fitting in terms of density ratios and applying a fast kernel-based estimation method, we show that it is possible to obtain a striking increase in discriminative performance while retaining the probabilistic qualities of the HMM. We demonstrate experimentally that this formulation makes more efficient use of training data than alternative approaches. 1 Introduction Inference of a sequence of estimated classes from a sequence of noisy observations is fundamental in many applications. The hidden Markov model (HMM) and its variants are the usual methods employed to do this, and have been used with conspicuous success in such domains as speech recognition, bioinformatics and natural language processing. As well as being computationally efficient, they are a popular choice due to their intuitive probabilistic interpretation.


MAD-Bayes: MAP-based Asymptotic Derivations from Bayes

arXiv.org Machine Learning

The classical mixture of Gaussians model is related to K-means via small-variance asymptotics: as the covariances of the Gaussians tend to zero, the negative log-likelihood of the mixture of Gaussians model approaches the K-means objective, and the EM algorithm approaches the K-means algorithm. Kulis & Jordan (2012) used this observation to obtain a novel K-means-like algorithm from a Gibbs sampler for the Dirichlet process (DP) mixture. We instead consider applying small-variance asymptotics directly to the posterior in Bayesian nonparametric models. This framework is independent of any specific Bayesian inference algorithm, and it has the major advantage that it generalizes immediately to a range of models beyond the DP mixture. To illustrate, we apply our framework to the feature learning setting, where the beta process and Indian buffet process provide an appropriate Bayesian nonparametric prior. We obtain a novel objective function that goes beyond clustering to learn (and penalize new) groupings for which we relax the mutual exclusivity and exhaustivity assumptions of clustering. We demonstrate several other algorithms, all of which are scalable and simple to implement. Empirical results demonstrate the benefits of the new framework.


Augment-and-Conquer Negative Binomial Processes

arXiv.org Machine Learning

By developing data augmentation methods unique to the negative binomial (NB) distribution, we unite seemingly disjoint count and mixture models under the NB process framework. We develop fundamental properties of the models and derive efficient Gibbs sampling inference. We show that the gamma-NB process can be reduced to the hierarchical Dirichlet process with normalization, highlighting its unique theoretical, structural and computational advantages. A variety of NB processes with distinct sharing mechanisms are constructed and applied to topic modeling, with connections to existing algorithms, showing the importance of inferring both the NB dispersion and probability parameters.