Learning Graphical Models
On the underestimation of model uncertainty by Bayesian K-nearest neighbors
Su, Wanhua, Chipman, Hugh, Zhu, Mu
When using the K-nearest neighbors method, one often ignores uncertainty in the choice of K. To account for such uncertainty, Holmes and Adams (2002) proposed a Bayesian framework for K-nearest neighbors (KNN). Their Bayesian KNN (BKNN) approach uses a pseudo-likelihood function, and standard Markov chain Monte Carlo (MCMC) techniques to draw posterior samples. Holmes and Adams (2002) focused on the performance of BKNN in terms of misclassification error but did not assess its ability to quantify uncertainty. We present some evidence to show that BKNN still significantly underestimates model uncertainty.
Creating Relational Data from Unstructured and Ungrammatical Data Sources
Michelson, M., Knoblock, C. A.
In order for agents to act on behalf of users, they will have to retrieve and integrate vast amounts of textual data on the World Wide Web. However, much of the useful data on the Web is neither grammatical nor formally structured, making querying difficult. Examples of these types of data sources are online classifieds like Craigslist and auction item listings like eBay. We call this unstructured, ungrammatical data "posts." The unstructured nature of posts makes query and integration difficult because the attributes are embedded within the text. Also, these attributes do not conform to standardized values, which prevents queries based on a common attribute value. The schema is unknown and the values may vary dramatically making accurate search difficult. Creating relational data for easy querying requires that we define a schema for the embedded attributes and extract values from the posts while standardizing these values. Traditional information extraction (IE) is inadequate to perform this task because it relies on clues from the data, such as structure or natural language, neither of which are found in posts. Furthermore, traditional information extraction does not incorporate data cleaning, which is necessary to accurately query and integrate the source. The two-step approach described in this paper creates relational data sets from unstructured and ungrammatical text by addressing both issues. To do this, we require a set of known entities called a "reference set." The first step aligns each post to each member of each reference set. This allows our algorithm to define a schema over the post and include standard values for the attributes defined by this schema. The second step performs information extraction for the attributes, including attributes not easily represented by reference sets, such as a price. In this manner we create a relational structure over previously unstructured data, supporting deep and accurate queries over the data as well as standard values for integration. Our experimental results show that our technique matches the posts to the reference set accurately and efficiently and outperforms state-of-the-art extraction systems on the extraction task from posts.
First Order Decision Diagrams for Relational MDPs
Wang, C., Joshi, S., Khardon, R.
Markov decision processes capture sequential decision making under uncertainty, where an agent must choose actions so as to optimize long term reward. The paper studies efficient reasoning mechanisms for Relational Markov Decision Processes (RMDP) where world states have an internal relational structure that can be naturally described in terms of objects and relations among them. Two contributions are presented. First, the paper develops First Order Decision Diagrams (FODD), a new compact representation for functions over relational structures, together with a set of operators to combine FODDs, and novel reduction techniques to keep the representation small. Second, the paper shows how FODDs can be used to develop solutions for RMDPs, where reasoning is performed at the abstract level and the resulting optimal policy is independent of domain size (number of objects) or instantiation. In particular, a variant of the value iteration algorithm is developed by using special operations over FODDs, and the algorithm is shown to converge to the optimal policy.
Gesture Salience as a Hidden Variable for Coreference Resolution and Keyframe Extraction
Eisenstein, J., Barzilay, R., Davis, R.
Gesture is a non-verbal modality that can contribute crucial information to the understanding of natural language. But not all gestures are informative, and non-communicative hand motions may confuse natural language processing (NLP) and impede learning. People have little difficulty ignoring irrelevant hand movements and focusing on meaningful gestures, suggesting that an automatic system could also be trained to perform this task. However, the informativeness of a gesture is context-dependent and labeling enough data to cover all cases would be expensive. We present conditional modality fusion, a conditional hidden-variable model that learns to predict which gestures are salient for coreference resolution, the task of determining whether two noun phrases refer to the same semantic entity. Moreover, our approach uses only coreference annotations, and not annotations of gesture salience itself. We show that gesture features improve performance on coreference resolution, and that by attending only to gestures that are salient, our method achieves further significant gains. In addition, we show that the model of gesture salience learned in the context of coreference accords with human intuition, by demonstrating that gestures judged to be salient by our model can be used successfully to create multimedia keyframe summaries of video. These summaries are similar to those created by human raters, and significantly outperform summaries produced by baselines from the literature.
Learning Balanced Mixtures of Discrete Distributions with Small Sample
We study the problem of partitioning a small sample of $n$ individuals from a mixture of $k$ product distributions over a Boolean cube $\{0, 1\}^K$ according to their distributions. Each distribution is described by a vector of allele frequencies in $\R^K$. Given two distributions, we use $\gamma$ to denote the average $\ell_2^2$ distance in frequencies across $K$ dimensions, which measures the statistical divergence between them. We study the case assuming that bits are independently distributed across $K$ dimensions. This work demonstrates that, for a balanced input instance for $k = 2$, a certain graph-based optimization function returns the correct partition with high probability, where a weighted graph $G$ is formed over $n$ individuals, whose pairwise hamming distances between their corresponding bit vectors define the edge weights, so long as $K = \Omega(\ln n/\gamma)$ and $Kn = \tilde\Omega(\ln n/\gamma^2)$. The function computes a maximum-weight balanced cut of $G$, where the weight of a cut is the sum of the weights across all edges in the cut. This result demonstrates a nice property in the high-dimensional feature space: one can trade off the number of features that are required with the size of the sample to accomplish certain tasks like clustering.
CUI Networks: A Graphical Representation for Conditional Utility Independence
We introduce CUI networks, a compact graphical representation of utility functions over multiple attributes. CUI networks model multiattribute utility functions using the well-studied and widely applicable utility independence concept. We show how conditional utility independence leads to an effective functional decomposition that can be exhibited graphically, and how local, compact data at the graph nodes can be used to calculate joint utility. We discuss aspects of elicitation, network construction, and optimization, and contrast our new representation with previous graphical preference modeling.
iBOA: The Incremental Bayesian Optimization Algorithm
Pelikan, Martin, Sastry, Kumara, Goldberg, David E.
This paper proposes the incremental Bayesian optimization algorithm (iBOA), which modifies standard BOA by removing the population of solutions and using incremental updates of the Bayesian network. iBOA is shown to be able to learn and exploit unrestricted Bayesian networks using incremental techniques for updating both the structure as well as the parameters of the probabilistic model. This represents an important step toward the design of competent incremental estimation of distribution algorithms that can solve difficult nearly decomposable problems scalably and reliably.
Planning with Durative Actions in Stochastic Domains
Probabilistic planning problems are typically modeled as a Markov Decision Process (MDP). MDPs, while an otherwise expressive model, allow only for sequential, non-durative actions. This poses severe restrictions in modeling and solving a real world planning problem. We extend the MDP model to incorporate -- 1) simultaneous action execution, 2) durative actions, and 3) stochastic durations. We develop several algorithms to combat the computational explosion introduced by these features. The key theoretical ideas used in building these algorithms are -- modeling a complex problem as an MDP in extended state/action space, pruning of irrelevant actions, sampling of relevant actions, using informed heuristics to guide the search, hybridizing different planners to achieve benefits of both, approximating the problem and replanning. Our empirical evaluation illuminates the different merits in using various algorithms, viz., optimality, empirical closeness to optimality, theoretical error bounds, and speed.
Parameterizations and fitting of bi-directed graph models to categorical data
Lupparelli, Monia, Marchetti, Giovanni M., Bergsma, Wicher P.
We discuss two parameterizations of models for marginal independencies for discrete distributions which are representable by bi-directed graph models, under the global Markov property. Such models are useful data analytic tools especially if used in combination with other graphical models. The first parameterization, in the saturated case, is also known as the multivariate logistic transformation, the second is a variant that allows, in some (but not all) cases, variation independent parameters. An algorithm for maximum likelihood fitting is proposed, based on an extension of the Aitchison and Silvey method.
Sparse Multinomial Logistic Regression via Bayesian L1 Regularisation
Cawley, Gavin C., Talbot, Nicola L., Girolami, Mark
Multinomial logistic regression provides the standard penalised maximumlikelihood solution to multi-class pattern recognition problems. More recently, the development of sparse multinomial logistic regression models has found application in text processing and microarray classification, where explicit identification of the most informative features is of value. In this paper, we propose a sparse multinomial logistic regression method, in which the sparsity arises from the use of a Laplace prior, but where the usual regularisation parameter is integrated out analytically. Evaluation over a range of benchmark datasets reveals this approach results in similar generalisation performance to that obtained using cross-validation, but at greatly reduced computational expense.