Genre
Generalized Beta Divergence
Divergences and distributions are deeply related concepts studied extensively in various fields. This paper is another attempt that casts their relations specifically into that of beta divergences and dispersion models and studies accordingly. The main consequence of this study is that beta divergence and (half of) statistical deviance are represented identical equations and they are, therefore, equivalent measures. In this respect, formulation of beta divergence is generalized and thus is extended beyond its Tweedie related classical forms [1], [2], [3], [4] and is aligned with exponential dispersion models. This is achieved by defining beta divergences as a function of so-called variance functions of exponential dispersion models. One immediate consequence is that we can compute beta divergence for non-Tweedie models such as negative binomial or hyperbolic secant distribution.
A class of random fields on complete graphs with tractable partition function
The aim of this short note is to draw attention to a method by which the partition function and marginal probabilities for a certain class of random fields on complete graphs can be computed in polynomial time. This class includes Ising models with homogeneous pairwise potentials but arbitrary (inhomogeneous) unary potentials. Similarly, the partition function and marginal probabilities can be computed in polynomial time for random fields on complete bipartite graphs, provided they have homogeneous pairwise potentials. We expect that these tractable classes of large scale random fields can be very useful for the evaluation of approximation algorithms by providing exact error estimates.
Joint estimation of sparse multivariate regression and conditional graphical models
Multivariate regression model is a natural generalization of the classical univari- ate regression model for fitting multiple responses. In this paper, we propose a high- dimensional multivariate conditional regression model for constructing sparse estimates of the multivariate regression coefficient matrix that accounts for the dependency struc- ture among the multiple responses. The proposed method decomposes the multivariate regression problem into a series of penalized conditional log-likelihood of each response conditioned on the covariates and other responses. It allows simultaneous estimation of the sparse regression coefficient matrix and the sparse inverse covariance matrix. The asymptotic selection consistency and normality are established for the diverging dimension of the covariates and number of responses. The effectiveness of the pro- posed method is also demonstrated in a variety of simulated examples as well as an application to the Glioblastoma multiforme cancer data.
The Rise and Fall of Semantic Rule Updates Based on SE-Models
Logic programs under the stable model semantics, or answer-set programs, provide an expressive rule-based knowledge representation framework, featuring a formal, declarative and well-understood semantics. However, handling the evolution of rule bases is still a largely open problem. The AGM framework for belief change was shown to give inappropriate results when directly applied to logic programs under a non-monotonic semantics such as the stable models. The approaches to address this issue, developed so far, proposed update semantics based on manipulating the syntactic structure of programs and rules. More recently, AGM revision has been successfully applied to a significantly more expressive semantic characterisation of logic programs based on SE-models. This is an important step, as it changes the focus from the evolution of a syntactic representation of a rule base to the evolution of its semantic content. In this paper, we borrow results from the area of belief update to tackle the problem of updating (instead of revising) answer-set programs. We prove a representation theorem which makes it possible to constructively define any operator satisfying a set of postulates derived from Katsuno and Mendelzon's postulates for belief update. We define a specific operator based on this theorem, examine its computational complexity and compare the behaviour of this operator with syntactic rule update semantics from the literature. Perhaps surprisingly, we uncover a serious drawback of all rule update operators based on Katsuno and Mendelzon's approach to update and on SE-models.
Stability of Multi-Task Kernel Regression Algorithms
Audiffren, Julien, Kadri, Hachem
We study the stability properties of nonlinear multi-task regression in reproducing Hilbert spaces with operator-valued kernels. Such kernels, a.k.a. multi-task kernels, are appropriate for learning prob- lems with nonscalar outputs like multi-task learning and structured out- put prediction. We show that multi-task kernel regression algorithms are uniformly stable in the general case of infinite-dimensional output spaces. We then derive under mild assumption on the kernel generaliza- tion bounds of such algorithms, and we show their consistency even with non Hilbert-Schmidt operator-valued kernels . We demonstrate how to apply the results to various multi-task kernel regression methods such as vector-valued SVR and functional ridge regression.
Spherical perceptron as a storage memory with limited errors
It has been known for a long time that the classical spherical perceptrons can be used as storage memories. Seminal work of Gardner, \cite{Gar88}, started an analytical study of perceptrons storage abilities. Many of the Gardner's predictions obtained through statistical mechanics tools have been rigorously justified. Among the most important ones are of course the storage capacities. The first rigorous confirmations were obtained in \cite{SchTir02,SchTir03} for the storage capacity of the so-called positive spherical perceptron. These were later reestablished in \cite{TalBook} and a bit more recently in \cite{StojnicGardGen13}. In this paper we consider a variant of the spherical perceptron that operates as a storage memory but allows for a certain fraction of errors. In Gardner's original work the statistical mechanics predictions in this directions were presented sa well. Here, through a mathematically rigorous analysis, we confirm that the Gardner's predictions in this direction are in fact provable upper bounds on the true values of the storage capacity. Moreover, we then present a mechanism that can be used to lower these bounds. Numerical results that we present indicate that the Garnder's storage capacity predictions may, in a fairly wide range of parameters, be not that far away from the true values.
Discrete perceptrons
Perceptrons have been known for a long time as a promising tool within the neural networks theory. The analytical treatment for a special class of perceptrons started in seminal work of Gardner \cite{Gar88}. Techniques initially employed to characterize perceptrons relied on a statistical mechanics approach. Many of such predictions obtained in \cite{Gar88} (and in a follow-up \cite{GarDer88}) were later on established rigorously as mathematical facts (see, e.g. \cite{SchTir02,SchTir03,TalBook,StojnicGardGen13,StojnicGardSphNeg13,StojnicGardSphErr13}). These typically related to spherical perceptrons. A lot of work has been done related to various other types of perceptrons. Among the most challenging ones are what we will refer to as the discrete perceptrons. An introductory statistical mechanics treatment of such perceptrons was given in \cite{GutSte90}. Relying on results of \cite{Gar88}, \cite{GutSte90} characterized many of the features of several types of discrete perceptrons. We in this paper, consider a similar subclass of discrete perceptrons and provide a mathematically rigorous set of results related to their performance. As it will turn out, many of the statistical mechanics predictions obtained for discrete predictions will in fact appear as mathematically provable bounds. This will in a way emulate a similar type of behavior we observed in \cite{StojnicGardGen13,StojnicGardSphNeg13,StojnicGardSphErr13} when studying spherical perceptrons.
A Behavioural Foundation for Natural Computing and a Programmability Test
What does it mean to claim that a physical or natural system computes? One answer, endorsed here, is that computing is about programming a system to behave in different ways. This paper offers an account of what it means for a physical system to compute based on this notion. It proposes a behavioural characterisation of computing in terms of a measure of programmability, which reflects a system's ability to react to external stimuli. The proposed measure of programmability is useful for classifying computers in terms of the apparent algorithmic complexity of their evolution in time. I make some specific proposals in this connection and discuss this approach in the context of other behavioural approaches, notably Turing's test of machine intelligence. I also anticipate possible objections and consider the applicability of these proposals to the task of relating abstract computation to nature-like computation.
Spectral Experts for Estimating Mixtures of Linear Regressions
Chaganty, Arun Tejasvi, Liang, Percy
Discriminative latent-variable models are typically learned using EM or gradient-based optimization, which suffer from local optima. In this paper, we develop a new computationally efficient and provably consistent estimator for a mixture of linear regressions, a simple instance of a discriminative latent-variable model. Our approach relies on a low-rank linear regression to recover a symmetric tensor, which can be factorized into the parameters using a tensor power method. We prove rates of convergence for our estimator and provide an empirical evaluation illustrating its strengths relative to local optimization (EM).
Bayesian test of significance for conditional independence: The multinomial model
Andrade, Pablo de Morais, Stern, Julio Michael, Pereira, Carlos Alberto de Bragança
Conditional independence tests (CI tests) have received special attention lately in Machine Learning and Computational Intelligence related literature as an important indicator of the relationship among the variables used by their models. In the field of Probabilistic Graphical Models (PGM)--which includes Bayesian Networks (BN) models--CI tests are especially important for the task of learning the PGM structure from data. In this paper, we propose the Full Bayesian Significance Test (FBST) for tests of conditional independence for discrete datasets. FBST is a powerful Bayesian test for precise hypothesis, as an alternative to frequentist's significance tests (characterized by the calculation of the \emph{p-value}).