Statistical Learning
Efficient Bayesian Inference for Generalized Bradley-Terry Models
Caron, Francois, Doucet, Arnaud
The Bradley-Terry model is a popular approach to describe probabilities of the possible outcomes when elements of a set are repeatedly compared with one another in pairs. It has found many applications including animal behaviour, chess ranking and multiclass classification. Numerous extensions of the basic model have also been proposed in the literature including models with ties, multiple comparisons, group comparisons and random graphs. From a computational point of view, Hunter (2004) has proposed efficient iterative MM (minorization-maximization) algorithms to perform maximum likelihood estimation for these generalized Bradley-Terry models whereas Bayesian inference is typically performed using MCMC (Markov chain Monte Carlo) algorithms based on tailored Metropolis-Hastings (M-H) proposals. We show here that these MM\ algorithms can be reinterpreted as special instances of Expectation-Maximization (EM) algorithms associated to suitable sets of latent variables and propose some original extensions. These latent variables allow us to derive simple Gibbs samplers for Bayesian inference. We demonstrate experimentally the efficiency of these algorithms on a variety of applications.
A state-space mixed membership blockmodel for dynamic network tomography
Xing, Eric P., Fu, Wenjie, Song, Le
In a dynamic social or biological environment, the interactions between the actors can undergo large and systematic changes. In this paper we propose a model-based approach to analyze what we will refer to as the dynamic tomography of such time-evolving networks. Our approach offers an intuitive but powerful tool to infer the semantic underpinnings of each actor, such as its social roles or biological functions, underlying the observed network topologies. Our model builds on earlier work on a mixed membership stochastic blockmodel for static networks, and the state-space model for tracking object trajectory. It overcomes a major limitation of many current network inference techniques, which assume that each actor plays a unique and invariant role that accounts for all its interactions with other actors; instead, our method models the role of each actor as a time-evolving mixed membership vector that allows actors to behave differently over time and carry out different roles/functions when interacting with different peers, which is closer to reality. We present an efficient algorithm for approximate inference and learning using our model; and we applied our model to analyze a social network between monks (i.e., the Sampson's network), a dynamic email communication network between the Enron employees, and a rewiring gene interaction network of fruit fly collected during its full life cycle. In all cases, our model reveals interesting patterns of the dynamic roles of the actors.
Compressive Spectral Clustering — Error Analysis
Hunter, Blake A (University of California, Davis) | Strohmer, Thomas (University of California, Davis)
Compressive spectral clustering combines the distance preserving measurements of compressed sensing with the power of spectral clustering. Our analysis provides rigorous bounds on how small errors in the affinity matrix can affect the spectral coordinates and clusterability. This work generalizes the current perturbation results of two-class spectral clustering to incorporate multiclass clustering using k eigenvectors.
Isometric Correction for Manifold Learning
Behmardi, Behrouz (Oregon State University) | Raich, Raviv (Oregon State University)
In this paper, we present a method for isometric correction of manifold learning techniques. We first present an isometric nonlinear dimension reduction method. Our proposed method overcomes the issues associated with well-known isometric embedding techniques such as ISOMAP and maximum variance unfolding (MVU), i.e., computational complexity and the geodesic convexity requirement. Based on the proposed algorithm, we derive our isometric correction method. Our approach follows an isometric solution to the problem of local tangent space alignment. We provide a derivation of a fast iterative solution. The performance of our algorithm is illustrated on both synthetic and real datasets compared to other methods.
Invited Talk Abstracts
Ma, Yi (University of Illinois at Urbana-Champaign) | Sha, Fei (University of Southern California) | Carin, Lawrence (Duke University) | Lerman, Gilad (University of Minnesota) | Lawrence, Neil (University of Manchester)
Both Lawrence Carin tools utilize the same transformed Robust PCA model for the visual data: D A E, and use practically the same Hierarchical Bayesian methods are employed to learn a reversible algorithm for extracting the low-rank structures A from the statistical embedding. The proposed embedding visual data D, despite image domain transformation T and procedure is connected to spectral embedding methods (e.g., corruptions E. We will show how these two seemingly simple diffusion maps and Isomap), yielding a new statistical spectral tools can help unleash tremendous information in images framework. The proposed approach allows one to discard and videos that we used to struggle to get. We believe these the training data when embedding new data, allows synthesis new tools will bring disruptive changes to many challenging of high-dimensional data from the embedding space, tasks in computer vision and image processing, including and provides accurate estimation of the latent-space dimensionality.
A Discriminative Model for Understanding Natural Language Route Directions
Kollar, Thomas (Massachusetts Institute of Technology) | Tellex, Stefanie (Massachusetts Institute of Technology) | Roy, Nicholas (Massachusetts Institute of Technology)
To be useful teammates to human partners, robots must be able to follow spoken instructions given in natural language. However, determining the correct sequence of actions in response to a set of spoken instructions is a complex decision-making problem. There is a "semantic gap" between the high-level symbolic models of the world that people use, and the low-level models of geometry, state dynamics, and perceptions that robots use. In this paper, we show how this gap can be bridged by inferring the best sequence of actions from a linguistic description and environmental features. This work improves upon previous work in three ways. First, by using a conditional random field (CRF), we learn the relative weight of environmental and linguistic features, enabling the system to learn the meanings of words and reducing the modeling effort in learning how to follow commands. Second, a number of long-range features are added, which help the system to use additional structure in the problem. Finally, given a natural language command, we infer both the referred path and landmark directly, thereby requiring the algorithm to pick a landmark by which it should navigate. The CRF is demonstrated to have 15% error on a held-out dataset, when compared with 39% error for a Markov random field (MRF). Finally, by analyzing the additional annotations necessary for this work, we find that natural language route directions map sequentially onto the corresponding path and landmarks 99.6% of the time. In addition, the size of the referred landmark varies from 0m 2 to 1964m 2 and the length of the referred path varies from 0 m to 40.83 m .
Dictionary Optimization for Block-Sparse Representations
Rosenblum, Kevin (Technion - Israel Institute of Technology) | Zelnik-Manor, Lihi (Technion - Israel Institute of Technology) | Eldar, Yonina C. (Technion - Israel Institute of Technology)
Recent work has demonstrated that using a carefully designed dictionary instead of a predefined one, can improve the sparsity in jointly representing a class of signals. This has motivated the derivation of learning methods for designing a dictionary which leads to the sparsest representation for a given set of signals. In some applications, the signals of interest can have further structure, so that they can be well approximated by a union of a small number of subspaces (e.g., face recognition and motion segmentation). This implies the existence of a dictionary which enables block-sparse representations of the input signals once its atoms are properly sorted into blocks. In this paper, we propose an algorithm for learning a block-sparsifying dictionary of a given set of signals. We do not require prior knowledge on the association of signals into groups (subspaces). Instead, we develop a method that automatically detects the underlying block structure. This is achieved by iteratively alternating between updating the block structure of the dictionary and updating the dictionary atoms to better fit the data. Our experiments show that for block-sparse data the proposed algorithm significantly improves the dictionary recovery ability and lowers the representation error compared to dictionary learning methods that do not employ block structure.
Hierarchical Clustering Via Localized Diffusion Folders
David, Gil (Yale University) | Averbuch, Amir (Tel-Aviv University) | Coifman, Ronald R. (Yale University)
Data clustering is a common technique for statistical data analysis. It is used in many fields including machine learning, data mining, customer segmentation, trend analysis, pattern recognition and image analysis. The proposed Localized Diffusion Folders methodology performs hierarchical clustering of high-dimensional datasets. The diffusion folders are multi-level data partitioning into local neighborhoods that are generated by several random selections of data points and folders in a diffusion graph and by defining local diffusion distances between them. This multi-level partitioning defines an improved localized geometry of the data and a localized Markov transition matrix that is used for the next time step in the diffusion process. The result of this clustering method is a bottom-up hierarchical clustering of the data while each level in the hierarchy contains localized diffusion folders of folders from the lower levels. This methodology preserves the local neighborhood of each point while eliminating noisy connections between distinct points and areas in the graph. The performance of the algorithms is demonstrated on real data and it is compared to existing methods.
Robots that Learn to Communicate: A Developmental Approach to Personally and Physically Situated Human-Robot Conversations
Iwahashi, Naoto (National Institute of Information and Communications Technology) | Sugiura, Komei (National Institute of Information and Communications Technology) | Taguchi, Ryo (Nagoya Institute of Technology) | Nagai, Takayuki (University of Electyro-Communications) | Taniguchi, Tadahiro (Ritsumeika University)
This paper summarizes the online machine learning method LCore, which enables robots to learn to communicate with users from scratch through verbal and behavioral interaction in the physical world. LCore combines speech, visual, and tactile information obtained through the interaction, and enables robots to learn beliefs regarding speech units, words, the concepts of objects, motions, grammar, and pragmatic and communicative capabilities. The overall belief system is represented by a dynamic graphical model in an integrated way. Experimental results show that through a small, practical number of learning episodes with a user, the robot was eventually able to understand even fragmental and ambiguous utterances, respond to them with confirmation questions and/or actions, generate directive utterances, and answer questions, appropriately for the given situation. This paper discusses the importance of a developmental approach to realize personally and physically situated human-robot conversations.
Extracting Action and Event Semantics from Web Text
Sil, Avirup (Temple University) | Huang, Fei (Temple University) | Yates, Alexander (Temple University)
Most information extraction research identifies the state of the world in text, including the entities and the relationships that exist between them. Much less attention has been paid to the understanding of dynamics, or how the state of the world changes over time. Because intelligent behavior seeks to change the state of the world in rational and utility-maximizing ways, common-sense knowledge about dynamics is essential for intelligent agents. In this paper, we describe a novel system, Prepost , that tackles the problem of extracting the preconditions and effects of actions and events, two important kinds of knowledge for connecting world state and the actions that affect it. In experiments on Web text, Prepost is able to improve by 79% over a baseline technique for identifying the effects of actions (64% improvement for preconditions).