Goto

Collaborating Authors

 Genre


On some interrelations of generalized $q$-entropies and a generalized Fisher information, including a Cram\'er-Rao inequality

arXiv.org Machine Learning

In this communication, we describe some interrelations between generalized $q$-entropies and a generalized version of Fisher information. In information theory, the de Bruijn identity links the Fisher information and the derivative of the entropy. We show that this identity can be extended to generalized versions of entropy and Fisher information. More precisely, a generalized Fisher information naturally pops up in the expression of the derivative of the Tsallis entropy. This generalized Fisher information also appears as a special case of a generalized Fisher information for estimation problems. Indeed, we derive here a new Cram\'er-Rao inequality for the estimation of a parameter, which involves a generalized form of Fisher information. This generalized Fisher information reduces to the standard Fisher information as a particular case. In the case of a translation parameter, the general Cram\'er-Rao inequality leads to an inequality for distributions which is saturated by generalized $q$-Gaussian distributions. These generalized $q$-Gaussians are important in several areas of physics and mathematics. They are known to maximize the $q$-entropies subject to a moment constraint. The Cram\'er-Rao inequality shows that the generalized $q$-Gaussians also minimize the generalized Fisher information among distributions with a fixed moment. Similarly, the generalized $q$-Gaussians also minimize the generalized Fisher information among distributions with a given $q$-entropy.


Some results on a $\chi$-divergence, an~extended~Fisher information and~generalized~Cram\'er-Rao inequalities

arXiv.org Machine Learning

We propose a modified $\chi^{\beta}$-divergence, give some of its properties, and show that this leads to the definition of a generalized Fisher information. We give generalized Cram\'er-Rao inequalities, involving this Fisher information, an extension of the Fisher information matrix, and arbitrary norms and power of the estimation error. In the case of a location parameter, we obtain new characterizations of the generalized $q$-Gaussians, for instance as the distribution with a given moment that minimizes the generalized Fisher information. Finally we indicate how the generalized Fisher information can lead to new uncertainty relations.


Integrating Prior Knowledge Into Prognostic Biomarker Discovery based on Network Structure

arXiv.org Machine Learning

Background: Predictive, stable and interpretable gene signatures are generally seen as an important step towards a better personalized medicine. During the last decade various methods have been proposed for that purpose. However, one important obstacle for making gene signatures a standard tool in clinics is the typical low reproducibility of these signatures combined with the difficulty to achieve a clear biological interpretation. For that purpose in the last years there has been a growing interest in approaches that try to integrate information from molecular interaction networks. Results: We propose a novel algorithm, called FrSVM, which integrates protein-protein interaction network information into gene selection for prognostic biomarker discovery. Our method is a simple filter based approach, which focuses on central genes with large differences in their expression. Compared to several other competing methods our algorithm reveals a significantly better prediction performance and higher signature stability. More- over, obtained gene lists are highly enriched with known disease genes and drug targets. We extendd our approach further by integrating information on candidate disease genes and targets of disease associated Transcript Factors (TFs).


Adapting the Stochastic Block Model to Edge-Weighted Networks

arXiv.org Machine Learning

We generalize the stochastic block model to the important case in which edges are annotated with weights drawn from an exponential family distribution. This generalization introduces several technical difficulties for model estimation, which we solve using a Bayesian approach. We introduce a variational algorithm that efficiently approximates the model's posterior distribution for dense graphs. In specific numerical experiments on edge-weighted networks, this weighted stochastic block model outperforms the common approach of first applying a single threshold to all weights and then applying the classic stochastic block model, which can obscure latent block structure in networks. This model will enable the recovery of latent structure in a broader range of network data than was previously possible.


Characterizing A Database of Sequential Behaviors with Latent Dirichlet Hidden Markov Models

arXiv.org Machine Learning

This paper proposes a generative model, the latent Dirichlet hidden Markov models (LDHMM), for characterizing a database of sequential behaviors (sequences). LDHMMs posit that each sequence is generated by an underlying Markov chain process, which are controlled by the corresponding parameters (i.e., the initial state vector, transition matrix and the emission matrix). These sequence-level latent parameters for each sequence are modelled as latent Dirichlet random variables and parameterized by a set of deterministic database-level hyper-parameters. Through this way, we expect to model the sequence in two levels: the database level by deterministic hyper-parameters and the sequence-level by latent parameters. To learn the deterministic hyper-parameters and approximate posteriors of parameters in LDHMMs, we propose an iterative algorithm under the variational EM framework, which consists of E and M steps. We examine two different schemes, the fullyfactorized and partially-factorized forms, for the framework, based on different assumptions. We present empirical results of behavior modeling and sequence classification on three real-world data sets, and compare them to other related models. The experimental results prove that the proposed LDHMMs produce better generalization performance in terms of log-likelihood and deliver competitive results on the sequence classification problem.


Learning Topic Models and Latent Bayesian Networks Under Expansion Constraints

arXiv.org Machine Learning

It is widely recognized that incorporating latent or hidden variables is a crucial aspect of modeling. Latent variables can provide a succinct representation of the observed data through dimensionality reduction; the possibly many observed variables are summarized by fewer hidden effects. Further, they are central to predicting causal relationships and interpreting the hidden effects as unobservable concepts. For instance in sociology, human behavior is affected by abstract notions such as social attitudes, beliefs, goals and plans. As another example, medical knowledge is organized into casual hierarchies of invading organisms, physical disorders, pathological states and symptoms, and only the symptoms are observed. In addition to incorporating latent variables, it is also important to model the complex dependencies among the variables. A popular class of models for incorporating such dependencies are the Bayesian networks, also known as belief networks. They incorporate a set of causal and conditional independence relationships through directed acyclic graphs (DAG) [49]. They have widespread applicability in artificial intelligence [19, 25, 41, 42], in the social sciences [13, 18, 40, 50, 51, 64], and as structural equation models in economics [12, 18, 33, 51, 60, 65].


Density-sensitive semisupervised inference

arXiv.org Machine Learning

Semisupervised methods are techniques for using labeled data $(X_1,Y_1),\ldots,(X_n,Y_n)$ together with unlabeled data $X_{n+1},\ldots,X_N$ to make predictions. These methods invoke some assumptions that link the marginal distribution $P_X$ of X to the regression function f(x). For example, it is common to assume that f is very smooth over high density regions of $P_X$. Many of the methods are ad-hoc and have been shown to work in specific examples but are lacking a theoretical foundation. We provide a minimax framework for analyzing semisupervised methods. In particular, we study methods based on metrics that are sensitive to the distribution $P_X$. Our model includes a parameter $\alpha$ that controls the strength of the semisupervised assumption. We then use the data to adapt to $\alpha$.


A Primal Condition for Approachability with Partial Monitoring

arXiv.org Machine Learning

In approachability with full monitoring there are two types of conditions that are known to be equivalent for convex sets: a primal and a dual condition. The primal one is of the form: a set C is approachable if and only all containing half-spaces are approachable in the one-shot game; while the dual one is of the form: a convex set C is approachable if and only if it intersects all payoff sets of a certain form. We consider approachability in games with partial monitoring. In previous works (Perchet 2011; Mannor et al. 2011) we provided a dual characterization of approachable convex sets; we also exhibited efficient strategies in the case where C is a polytope. In this paper we provide primal conditions on a convex set to be approachable with partial monitoring. They depend on a modified reward function and lead to approachability strategies, based on modified payoff functions, that proceed by projections similarly to Blackwell's (1956) strategy; this is in contrast with previously studied strategies in this context that relied mostly on the signaling structure and aimed at estimating well the distributions of the signals received. Our results generalize classical results by Kohlberg 1975 (see also Mertens et al. 1994) and apply to games with arbitrary signaling structure as well as to arbitrary convex sets.


A Supervised Neural Autoregressive Topic Model for Simultaneous Image Classification and Annotation

arXiv.org Machine Learning

Hugo Larochelle D epartment d'Informatique Universit e de Sherbrooke, Sherbrooke (QC), Canada, J1K 2R1 hugo.larochelle@usherbrooke.ca March 22, 2018 Abstract Topic modeling based on latent Dirichlet allocation (LDA) has been a framework of choice to perform scene recognition and annotation. Recently, a new type of topic model called the Document Neural Autoregressive Distribution Estimator (DocNADE) was proposed and demonstrated state-of-the-art performance for document modeling. In this work, we show how to successfully apply and extend this model to the context of visual scene modeling. Specifically, we propose SupDocNADE, a supervised extension of DocNADE, that increases the discriminative power of the hidden topic features by incorporating label information into the training objective of the model. We also describe how to leverage information about the spatial position of the visual words and how to embed additional image annotations, so as to simultaneously perform image classification and annotation. We test our model on the Scene15, LabelMe and UIUC-Sports datasets and show that it compares favorably to other topic models such as the supervised variant of LDA. 1 Introduction Image classification and annotation are two important tasks in computer vision. In image classification, one tries to describe the image globally with a single descriptive label (such as coast, outdoor, inside city, etc.), while annotation focuses on tagging the local content within the image (such as whether it contains "sky", a "car ", a "tree ", etc.). Since these two problems are related, it is natural to attempt to solve them jointly. For example, an image labeled asstreet is more likely to be annotated with " car ", "pedestrian " or "building" than with "beach " or "see water ".


PAWL-Forced Simulated Tempering

arXiv.org Machine Learning

In this short note, we show how the parallel adaptive Wang-Landau (PAWL) algorithm of Bornn et al. (2013) can be used to automate and improve simulated tempering algorithms. While Wang-Landau and other stochastic approximation methods have frequently been applied within the simulated tempering framework, this note demonstrates through a simple example the additional improvements brought about by parallelization, adaptive proposals and automated bin splitting. The central idea underlying Wang-Landau ([6]) and related algorithms is that instead of generating samples from a target density π, it is sometimes more efficient to instead sample a strategically biased density π. Algorithm 1 provides psuedo-code for the algorithm. Here c is a user-specified threshold.