Country
Distribution of Mutual Information
The mutual information of two random variables z and J with joint probabilities {7rij} is commonly used in learning Bayesian nets as well as in many other fields. The chances 7rij are usually estimated by the empirical sampling frequency nij In leading to a point estimate J(nijIn) for the mutual information. To answer questions like "is J (nij In) consistent with zero?" or "what is the probability that the true mutual information is much larger than the point estimate?"
The Concave-Convex Procedure (CCCP)
Yuille, Alan L., Rangarajan, Anand
We introduce the Concave-Convex procedure (CCCP) which constructs discretetime iterative dynamical systems which are guaranteed to monotonically decrease global optimization/energy functions. It can be applied to (almost) any optimization problem and many existing algorithms can be interpreted in terms of CCCP. In particular, we prove relationships to some applications of Legendre transform techniques. We then illustrate CCCP by applications to Potts models, linear assignment, EM algorithms, and Generalized Iterative Scaling (GIS). CCCP can be used both as a new way to understand existing optimization algorithms and as a procedure for generating new algorithms. 1 Introduction There is a lot of interest in designing discrete time dynamical systems for inference and learning (see, for example, [10], [3], [7], [13]).
Small-World Phenomena and the Dynamics of Information
The problem of searching for information in networks like the World Wide Web can be approached in a variety of ways, ranging from centralized indexing schemes to decentralized mechanisms that navigate the underlying network without knowledge of its global structure. The decentralized approach appears in a variety of settings: in the behavior of users browsing the Web by following hyperlinks; in the design of focused crawlers [4, 5, 8] and other agents that explore the Web's links to gather information; and in the search protocols underlying decentralized peer-to-peer systems suchas Gnutella [10], Freenet [7], and recent research prototypes [21, 22, 23], through which users can share resources without a central server. In recent work, we have been investigating the problem of decentralized search in large information networks [14, 15]. Our initial motivation was an experiment that dealt directly with the search problem in a decidedly pre-Internet context: Stanley Milgram's famous study of the small-world phenomenon [16, 17]. Milgram was seeking to determine whether most pairs of people in society were linked by short chains of acquaintances, and for this purpose he recruited individuals to try forwarding a letter to a designated "target" through people they knew on a firstname basis.The starting individuals were given basic information about the target -- his name, address, occupation, and a few other personal details -- and had to choose a single acquaintance to send the letter to, with goal of reaching the target as quickly as possible; subsequent recipients followed the same procedure, and the chain closed in on its destination. Of the chains that completed, the median number of steps required was six -- a result that has since entered popular culture as the "six degrees of separation" principle [11]. Milgram's experiment contains two striking discoveries -- that short chains are pervasive, and that people are able to find them. This latter point is concerned precisely with a type of decentralized navigation in a social network, consisting of people as nodes and links joining pairs of people who know each other. From an algorithmic perspective, it is an interesting question to understand the structure of networks in which this phenomenon emerges -- in which message-passing with purely local information can be efficient.
Means, Correlations and Bounds
Leisink, Martijn, Kappen, Bert
The partition function for a Boltzmann machine can be bounded from above and below. We can use this to bound the means and the correlations. For networks with small weights, the values of these statistics can be restricted to nontrivial regions (i.e. a subset of [-1, 1]). Experimental results show that reasonable bounding occurs for weight sizes where mean field expansions generally give good results. 1 Introduction Over the last decade, bounding techniques have become a popular tool to deal with graphical models that are too complex for exact computation. A nice property of bounds is that they give at least some information you can rely on.
Learning a Gaussian Process Prior for Automatically Generating Music Playlists
Platt, John C., Burges, Christopher J. C., Swenson, Steven, Weare, Christopher, Zheng, Alice
This paper presents AutoDJ: a system for automatically generating music playlistsbased on one or more seed songs selected by a user. AutoDJ uses Gaussian Process Regression to learn a user preference function over songs. This function takes music metadata as inputs. This paper further introduces Kernel Meta-Training, which is a method of learning a Gaussian Process kernel from a distribution of functions that generates the learned function. For playlist generation, AutoDJ learns a kernel from a large set of albums. This learned kernel is shown to be more effective at predicting users' playlists than a reasonable hand-designed kernel.
Predictive Representations of State
Littman, Michael L., Sutton, Richard S.
We show that states of a dynamical system can be usefully represented bymulti-step, action-conditional predictions of future observations. Staterepresentations that are grounded in data in this way may be easier to learn, generalize better, and be less dependent onaccurate prior models than, for example, POMDP state representations. Building on prior work by Jaeger and by Rivest and Schapire, in this paper we compare and contrast a linear specialization ofthe predictive approach with the state representations used in POMDPs and in k-order Markov models. Ours is the first specific formulation of the predictive idea that includes both stochasticity and actions (controls). We show that any system has a linear predictive state representation with number of predictions no greater than the number of states in its minimal POMDP model.
Latent Dirichlet Allocation
Blei, David M., Ng, Andrew Y., Jordan, Michael I.
We propose a generative model for text and other collections of discrete datathat generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams [6], and Hofmann's aspectmodel, also known as probabilistic latent semantic indexing (pLSI) [3]. In the context of text modeling, our model posits that each document is generated as a mixture of topics, where the continuous-valued mixture proportions are distributed as a latent Dirichlet random variable. Inference and learning are carried out efficiently via variational algorithms. We present empirical resultson applications of this model to problems in text modeling, collaborative filtering, and text classification. 1 Introduction Recent years have seen the development and successful application of several latent factor models for discrete data. One notable example, Hofmann's pLSI/aspect model [3], has received the attention of many researchers, and applications have emerged in text modeling [3], collaborative filtering [7], and link analysis [1].
An Efficient, Exact Algorithm for Solving Tree-Structured Graphical Games
Littman, Michael L., Kearns, Michael J., Singh, Satinder P.
The algorithm is the first to compute equilibria both efficiently and exactly for a nontrivial class of graphical games. 1 Introduction Seeking to replicate the representational and computational benefits that graphical modelshave provided to probabilistic inference, several recent works have introduced graph-theoretic frameworks for the study of multi-agent systems (LaMura 2000; Koller and Milch 2001; Kearns et al. 2001). In the simplest of these formalisms, each vertex represents a single agent, and the edges represent pairwise interaction between agents. As with many familiar network models, the macroscopic behavior of a large system is thus implicitly described by its local interactions, andthe computational challenge is to extract the global states of interest. Classical game theory is typically used to model multi-agent interactions, and the global states of interest are thus the so-called Nash equilibria, in which no agent has a unilateral incentive to deviate. In a recent paper (Kearns et al. 2001), we introduced such a graphical formalism for multi-agent game theory, and provided two algorithms for computing Nash equilibria whenthe underlying graph is a tree (or is sufficiently sparse).