Learning Graphical Models
Extension of SBL Algorithms for the Recovery of Block Sparse Signals with Intra-Block Correlation
Zhang, Zhilin, Rao, Bhaskar D.
We examine the recovery of block sparse signals and extend the framework in two important directions; one by exploiting signals' intra-block correlation and the other by generalizing signals' block structure. We propose two families of algorithms based on the framework of block sparse Bayesian learning (BSBL). One family, directly derived from the BSBL framework, requires knowledge of the block structure. Another family, derived from an expanded BSBL framework, is based on a weaker assumption on the block structure, and can be used when the block structure is completely unknown. Using these algorithms we show that exploiting intra-block correlation is very helpful in improving recovery performance. These algorithms also shed light on how to modify existing algorithms or design new ones to exploit such correlation and improve performance.
Noisy Matrix Completion under Sparse Factor Models
Soni, Akshay, Jain, Swayambhoo, Haupt, Jarvis, Gonella, Stefano
This paper examines a general class of noisy matrix completion tasks where the goal is to estimate a matrix from observations obtained at a subset of its entries, each of which is subject to random noise or corruption. Our specific focus is on settings where the matrix to be estimated is well-approximated by a product of two (a priori unknown) matrices, one of which is sparse. Such structural models - referred to here as "sparse factor models" - have been widely used, for example, in subspace clustering applications, as well as in contemporary sparse modeling and dictionary learning tasks. Our main theoretical contributions are estimation error bounds for sparsity-regularized maximum likelihood estimators for problems of this form, which are applicable to a number of different observation noise or corruption models. Several specific implications are examined, including scenarios where observations are corrupted by additive Gaussian noise or additive heavier-tailed (Laplace) noise, Poisson-distributed observations, and highly-quantized (e.g., one-bit) observations. We also propose a simple algorithmic approach based on the alternating direction method of multipliers for these tasks, and provide experimental evidence to support our error analyses.
Risk Event and Probability Extraction for Modeling Medical Risks
Jochim, Charles (IBM Research – Ireland) | Sacaleanu, Bogdan (IBM Research – Ireland) | Deleris, Léa A. (IBM Research – Ireland)
In this paper we address the task of extracting risk events and probabilities from free text, focusing in particular on the biomedical domain. While our initial motivation is to enable the determination of the parameters of a Bayesian belief network, our approach is not specific to that use case. We are the first to investigate this task as a sequence tagging problem where we label spans of text as events A or B that are then used to construct probability statements of the form P(A|B)=x. We show that our approach significantly outperforms an entity extraction baseline on a new annotated medical risk event corpus. We also explore semi-supervised methods that lead to modest improvement, encouraging further work in this direction.
Extraction of (Key,Value) Pairs from Unstructured Ads
Chakraborty, Sunandan (New York University) | Subramanian, Lakshminarayanan (New York University) | Nyarko, Yaw (New York University)
In this paper, we focus on the problem of extracting structured labeled data from short unstructured ad-postings from online sources like Craigslist, where ads are posted on various topics, such as job postings, rentals, car sales etc. A fundamental challenge in addressing this problem is that most ad-postings are highly unstructured, short-text postings written in an informal manner with no inherent grammar or well-defined dictionary. In this paper, we propose unsupervised and supervised algorithms for extracting structured data from unstructured ads in the form of (key, value) pairs where the keys naturally represent topic-specific features in the ads. The unsupervised algorithm is centered around building an affinity graph, using the words from a topic-specific corpus of such ads where the edge weights represent affinities between words; the (key, value) extraction algorithm identifies specific groups of words in the affinity graph corresponding to different classes of key attributes. The supervised algorithm uses a Conditional Random Field based training algorithm to identify specific structured (key, value) pairs based on pre-defined topic-specific structural data representations of ads. Based on a corpus of car and apartment ad-postings from Craigslist, the unsupervised algorithm reported an accuracy of 67.74% and 68.74% for car and apartment ads respectively. The supervised algorithm demonstrated an improved performance with accuracies of 74.07% and 72.59% respectively.
Spectral Methods meet EM: A Provably Optimal Algorithm for Crowdsourcing
Zhang, Yuchen, Chen, Xi, Zhou, Dengyong, Jordan, Michael I.
Crowdsourcing is a popular paradigm for effectively collecting labels at low cost. The Dawid-Skene estimator has been widely used for inferring the true labels from the noisy labels provided by non-expert crowdsourcing workers. However, since the estimator maximizes a non-convex log-likelihood function, it is hard to theoretically justify its performance. In this paper, we propose a two-stage efficient algorithm for multi-class crowd labeling problems. The first stage uses the spectral method to obtain an initial estimate of parameters. Then the second stage refines the estimation by optimizing the objective function of the Dawid-Skene estimator via the EM algorithm. We show that our algorithm achieves the optimal convergence rate up to a logarithmic factor. We conduct extensive experiments on synthetic and real datasets. Experimental results demonstrate that the proposed algorithm is comparable to the most accurate empirical approach, while outperforming several other recently proposed methods.
Discovering and Characterizing Emerging Events in Big Data
Dorr, Bonnie J. (Institute for Human and Machine Cognition (IHMC)) | Petrovic, Milenko (Institute for Human and Machine Cognition (IHMC)) | Allen, James F. (Institute for Human and Machine Cognition (IHMC)) | Teng, Choh Man (Institute for Human and Machine Cognition (IHMC)) | Dalton, Adam (Institute for Human and Machine Cognition (IHMC))
We describe a novel system for discovering and characterizing emerging events. We define event emergence to be a developing situation comprised of a series of sub-events. To detect sub-events from a very large, continuous textual input stream, we use two techniques: (1) frequency-based detection of sub-events that are potentially entailed by an emerging event; and (2) anomaly-based detection of other sub-events that are potentially indicative of an emerging event. Identifying emerging events from detected sub-events involves connecting sub-events to each other and to the relevant emerging events within the event models and estimating the likelihood of possible emerging events. Each sub-event can be part of a number of emerging events and supports various event models to varying degrees. We adopt a coherent and compact model that probabilistically identifies emerging events. The innovative aspect of our work is a well-defined framework where statistical Big Data techniques are informed by event semantics and inference techniques (and vice versa). Our work is strongly grounded in semantics and knowledge representation, which enables us to produce more reliable results than would otherwise be possible with a purely statistical approach.
Hidden Parameter Markov Decision Processes: An Emerging Paradigm for Modeling Families of Related Tasks
Konidaris, George (Duke University) | Doshi-Velez, Finale (Harvard Medical School)
The goal of transfer is to use knowledge obtained by solving one task to improvea robot's (or software agent's) performance in future tasks. In general, we do not expect this to work; for transfer to be feasible, there must be something in common between the source task(s) and goal task(s). The question at the core of the transfer learning enterprise is therefore: what makes two tasks related?, or more generally, how do you define a family of related tasks? Given a precise definition of how a particular family of tasks is related, we can formulate clear optimizationmethods for selecting source tasks and determining what knowledge should be imported from the source task(s), and how it should be used in the target task(s). This paper describes one model that has appeared in several different research scenarios where an agent is faced with afamily of tasks that have similar, but not identical, dynamics (or reward functions). For example, a human learning to play baseball may, over the course of their career,be exposed to several different bats, each with slightly different weights and lengths.A human who has learned to play baseball well with one bat would be expected to be able to pick up any similar bat and use it.Similarly, when learning to drive a car, one may learn in more than one car, and then be expected to be able to drive any make and model of car (within reasonablevariations) with little or no relearning. These examples are instances of exactly the kind of flexible, reliable,and sample-efficient behavior that we should be aiming to achieve in robotics applications. One way to model such a family of tasks is to posit that they are generated by asmall set of latent parameters (e.g., the length and weight of the bat, or parametersdescribing the various physical properties of the car's steering system and clutch) thatare fixed for each problem instance (e.g., for each bat, or car), but are not directlyobservable by the agent. Defining a distributionover these latent parameters results in a family of related tasks, and transferis feasible to the extent that the number of latent variables is small, the task dynamics(or reward function) vary smoothly with them, and to the extent to which they can eitherbe ignored or identified using transition data from the task.This model has appeared under several different names in the literature; we refer to it as a hidden-parameterMarkov decision process (or HIP-MDP).
Discovering Subgoals in Complex Domains
desJardins, Marie (University of Maryland, Baltimore County) | Tembo, Tenji (University of Maryland, Baltimore County) | Topin, Nicholay (University of Maryland, Baltimore County) | Bishoff, Michael (University of Maryland, Baltimore County) | Squire, Shawn (University of Maryland, Baltimore County) | MacGlashan, James (Brown University) | Carignan, Rose (University of Maryland, Baltimore County) | Haltmeyer, Nicholas (University of Maryland, Baltimore County)
We present ongoing research to develop novel option discovery methods for complex domains that are represented as Object-Oriented Markov Decision Processes (OO-MDPs) (Diuk, Cohen, and Littman, 2008). We describe Portable Multi-policy Option Discovery for Automated Learning (P-MODAL), an initial framework that extends Pickett and Barto’s (2002) PolicyBlocks approach to OO-MDPs. We also discuss future work that will use additional representations and techniques to handle scalability and learning challenges.
Affordances as Transferable Knowledge for Planning Agents
Barth-Maron, Gabriel (Brown University) | Abel, David (Brown University) | MacGlashan, James (Brown University) | Tellex, Stefanie (Brown University)
Robotic agents often map perceptual input to simplified representations that do not reflect the complexity and richness of the world. This simplification is due in large part to the limitations of planning algorithms, which fail in large stochastic state spaces on account of the well-known "curse of dimensionality." Existing approaches to address this problem fail to prevent autonomous agents from considering many actions which would be obviously irrelevant to a human solving the same problem. We formalize the notion of affordances as knowledge added to an Markov Decision Process (MDP) that prunes actions in a state- and reward- general way. This pruning significantly reduces the number of state-action pairs the agent needs to evaluate in order to act near-optimally. We demonstrate our approach in the Minecraft domain as a model for robotic tasks, showing significant increase in speed and reduction in state-space exploration during planning. Further, we provide a learning framework that enables an agent to learn affordances through experience, opening the door for agents to learn to adapt and plan through new situations. We provide preliminary results indicating that the learning process effectively produces affordances that help solve an MDP faster, suggesting that affordances serve as an effective, transferable piece of knowledge for planning agents in large state spaces.