Goto

Collaborating Authors

 Genre


Sequential Monte Carlo Bandits

arXiv.org Machine Learning

In this paper we propose a flexible and efficient framework for handling multi-armed bandits, combining sequential Monte Carlo algorithms with hierarchical Bayesian modeling techniques. The framework naturally encompasses restless bandits, contextual bandits, and other bandit variants under a single inferential model. Despite the model's generality, we propose efficient Monte Carlo algorithms to make inference scalable, based on recent developments in sequential Monte Carlo methods. Through two simulation studies, the framework is shown to outperform other empirical methods, while also naturally scaling to more complex problems for which existing approaches can not cope. Additionally, we successfully apply our framework to online video-based advertising recommendation, and show its increased efficacy as compared to current state of the art bandit algorithms.


Spectral Clustering with Epidemic Diffusion

arXiv.org Machine Learning

Spectral clustering is widely used to partition graphs into distinct modules or communities. Existing methods for spectral clustering use the eigenvalues and eigenvectors of the graph Laplacian, an operator that is closely associated with random walks on graphs. We propose a new spectral partitioning method that exploits the properties of epidemic diffusion. An epidemic is a dynamic process that, unlike the random walk, simultaneously transitions to all the neighbors of a given node. We show that the replicator, an operator describing epidemic diffusion, is equivalent to the symmetric normalized Laplacian of a reweighted graph with edges reweighted by the eigenvector centralities of their incident nodes. Thus, more weight is given to edges connecting more central nodes. We describe a method that partitions the nodes based on the componentwise ratio of the replicator's second eigenvector to the first, and compare its performance to traditional spectral clustering techniques on synthetic graphs with known community structure. We demonstrate that the replicator gives preference to dense, clique-like structures, enabling it to more effectively discover communities that may be obscured by dense intercommunity linking.


Labeled Directed Acyclic Graphs: a generalization of context-specific independence in directed graphical models

arXiv.org Artificial Intelligence

Directed acyclic graphs have gained widespread popularity as representations of complex multivariate systems (Koski and Noble (2009); Koller and Friedman (2009)). Despite their advantageous properties for representing dependencies among variables in a modular fashion, several proposals for making them more flexible and parsimonious have been presented (Boutilier et al (1996); Friedman and Goldszmidt (1996); Chickering et al (1997); Eriksen (1999); Poole and Zhang (2003); Koller and Friedman (2009)). In particular, an important notion is to allow the dependencies to have local structures, such that a node need not explicitly depend on all the combinations of values of its parents. This leads to contextspecific independence which can substantially reduce the parametric dimensionality of a network model and lead to a more expressive interpretation of the dependence structure (Boutilier et al (1996); Friedman and Goldszmidt (1996); Poole and Zhang (2003); Koller and Friedman (2009)). Contextspecific independencies have also been seemingly separately considered for undirected graphical models by multiple authors (Corander (2003); Hรธjsgaard (2003, 2004)).


Multivariate regression and fit function uncertainty

arXiv.org Machine Learning

This article describes a multivariate polynomial regression method where the uncertainty of the input parameters are approximated with Gaussian distributions, derived from the central limit theorem for large weighted sums, directly from the training sample. The estimated uncertainties can be propagated into the optimal fit function, as an alternative to the statistical bootstrap method. This uncertainty can be propagated further into a loss function like quantity, with which it is possible to calculate the expected loss function, and allows to select the optimal polynomial degree with statistical significance. Combined with simple phase space splitting methods, it is possible to model most features of the training data even with low degree polynomials or constants.


Developments in the theory of randomized shortest paths with a comparison of graph node distances

arXiv.org Machine Learning

There have lately been several suggestions for parametrized distances on a graph that generalize the shortest path distance and the commute time or resistance distance. The need for developing such distances has risen from the observation that the above-mentioned common distances in many situations fail to take into account the global structure of the graph. In this article, we develop the theory of one family of graph node distances, known as the randomized shortest path dissimilarity, which has its foundation in statistical physics. We show that the randomized shortest path dissimilarity can be easily computed in closed form for all pairs of nodes of a graph. Moreover, we come up with a new definition of a distance measure that we call the free energy distance. The free energy distance can be seen as an upgrade of the randomized shortest path dissimilarity as it defines a metric, in addition to which it satisfies the graph-geodetic property. The derivation and computation of the free energy distance are also straightforward. We then make a comparison between a set of generalized distances that interpolate between the shortest path distance and the commute time, or resistance distance. This comparison focuses on the applicability of the distances in graph node clustering and classification. The comparison, in general, shows that the parametrized distances perform well in the tasks. In particular, we see that the results obtained with the free energy distance are among the best in all the experiments.


Prior-free and prior-dependent regret bounds for Thompson Sampling

arXiv.org Machine Learning

We consider the stochastic multi-armed bandit problem with a prior distribution on the reward distributions. We are interested in studying prior-free and prior-dependent regret bounds, very much in the same spirit as the usual distribution-free and distribution-dependent bounds for the non-Bayesian stochastic bandit. Building on the techniques of Audibert and Bubeck [2009] and Russo and Roy [2013] we first show that Thompson Sampling attains an optimal prior-free bound in the sense that for any prior distribution its Bayesian regret is bounded from above by $14 \sqrt{n K}$. This result is unimprovable in the sense that there exists a prior distribution such that any algorithm has a Bayesian regret bounded from below by $\frac{1}{20} \sqrt{n K}$. We also study the case of priors for the setting of Bubeck et al. [2013] (where the optimal mean is known as well as a lower bound on the smallest gap) and we show that in this case the regret of Thompson Sampling is in fact uniformly bounded over time, thus showing that Thompson Sampling can greatly take advantage of the nice properties of these priors.


Learning Lambek grammars from proof frames

arXiv.org Artificial Intelligence

In addition to their limpid interface with semantics, categorial grammars enjoy another important property: learnability. This was first noticed by Buskowsky and Penn and further studied by Kanazawa, for Bar-Hillel categorial grammars. What about Lambek categorial grammars? In a previous paper we showed that product free Lambek grammars where learnable from structured sentences, the structures being incomplete natural deductions. These grammars were shown to be unlearnable from strings by Foret and Le Nir. In the present paper we show that Lambek grammars, possibly with product, are learnable from proof frames that are incomplete proof nets. After a short reminder on grammatical inference \`a la Gold, we provide an algorithm that learns Lambek grammars with product from proof frames and we prove its convergence. We do so for 1-valued also known as rigid Lambek grammars with product, since standard techniques can extend our result to $k$-valued grammars. Because of the correspondence between cut-free proof nets and normal natural deductions, our initial result on product free Lambek grammars can be recovered. We are sad to dedicate the present paper to Philippe Darondeau, with whom we started to study such questions in Rennes at the beginning of the millennium, and who passed away prematurely. We are glad to dedicate the present paper to Jim Lambek for his 90 birthday: he is the living proof that research is an eternal learning process.


Online Learning of Dynamic Parameters in Social Networks

arXiv.org Machine Learning

This paper addresses the problem of online learning in a dynamic setting. We consider a social network in which each individual observes a private signal about the underlying state of the world and communicates with her neighbors at each time period. Unlike many existing approaches, the underlying state is dynamic, and evolves according to a geometric random walk. We view the scenario as an optimization problem where agents aim to learn the true state while suffering the smallest possible loss. Based on the decomposition of the global loss function, we introduce two update mechanisms, each of which generates an estimate of the true state. We establish a tight bound on the rate of change of the underlying state, under which individuals can track the parameter with a bounded variance. Then, we characterize explicit expressions for the steady state mean-square deviation(MSD) of the estimates from the truth, per individual. We observe that only one of the estimators recovers the optimal MSD, which underscores the impact of the objective function decomposition on the learning quality. Finally, we provide an upper bound on the regret of the proposed methods, measured as an average of errors in estimating the parameter in a finite time.


Joint Bayesian estimation of close subspaces from noisy measurements

arXiv.org Machine Learning

In this letter, we consider two sets of observations defined as subspace signals embedded in noise and we wish to analyze the distance between these two subspaces. The latter entails evaluating the angles between the subspaces, an issue reminiscent of the well-known Procrustes problem. A Bayesian approach is investigated where the subspaces of interest are considered as random with a joint prior distribution (namely a Bingham distribution), which allows the closeness of the two subspaces to be adjusted. Within this framework, the minimum mean-square distance estimator of both subspaces is formulated and implemented via a Gibbs sampler. A simpler scheme based on alternative maximum a posteriori estimation is also presented. The new schemes are shown to provide more accurate estimates of the angles between the subspaces, compared to singular value decomposition based independent estimation of the two subspaces.


On statistics, computation and scalability

arXiv.org Machine Learning

When coupled with the requirement that an answer to an inferential question be delivered within a certain time budget, this question has significant repercussions for the field of statistics. With the goal of identifying "time-data tradeoffs," we investigate some of the statistical consequences of computational perspectives on scability, in particular divide-and-conquer methodology and hierarchies of convex relaxations. The fields of computer science and statistics have undergone mostly separate evolutions during their respective histories. This is changing, due in part to the phenomenon of "Big Data." Indeed, science and technology are currently generating very large datasets and the gatherers of these data have increasingly ambitious inferential goals, trends which point towards a future in which statistics will be forced to deal with problems of scale in order to remain relevant. Currently the field seems little prepared to meet this challenge.