Not enough data to create a plot.
Try a different view from the menu above.
North America
Structure Selection from Streaming Relational Data
Mihalkova, Lilyana, Moustafa, Walaa Eldin
Statistical relational learning techniques have been successfully applied in a wide range of relational domains. In most of these applications, the human designers capitalized on their background knowledge by following a trial-and-error trajectory, where relational features are manually defined by a human engineer, parameters are learned for those features on the training data, the resulting model is validated, and the cycle repeats as the engineer adjusts the set of features. This paper seeks to streamline application development in large relational domains by introducing a light-weight approach that efficiently evaluates relational features on pieces of the relational graph that are streamed to it one at a time. We evaluate our approach on two social media tasks and demonstrate that it leads to more accurate models that are learned faster.
Selectivity in Probabilistic Causality: Drawing Arrows from Inputs to Stochastic Outputs
Dzhafarov, Ehtibar N., Kujala, Janne V.
Given a set of several inputs into a system (e.g., independent variables characterizing stimuli) and a set of several stochastically non-independent outputs (e.g., random variables describing different aspects of responses), how can one determine, for each of the outputs, which of the inputs it is influenced by? The problem has applications ranging from modeling pairwise comparisons to reconstructing mental processing architectures to conjoint testing. A necessary and sufficient condition for a given pattern of selective influences is provided by the Joint Distribution Criterion, according to which the problem of "what influences what" is equivalent to that of the existence of a joint distribution for a certain set of random variables. For inputs and outputs with finite sets of values this criterion translates into a test of consistency of a certain system of linear equations and inequalities (Linear Feasibility Test) which can be performed by means of linear programming. The Joint Distribution Criterion also leads to a metatheoretical principle for generating a broad class of necessary conditions (tests) for diagrams of selective influences. Among them is the class of distance-type tests based on the observation that certain functionals on jointly distributed random variables satisfy triangle inequality.
Lifted Graphical Models: A Survey
Mihalkova, Lilyana, Getoor, Lise
This article presents a survey of work on lifted graphical models. We review a general form for a lifted graphical model, a par-factor graph, and show how a number of existing statistical relational representations map to this formalism. We discuss inference algorithms, including lifted inference algorithms, that efficiently compute the answers to probabilistic queries. We also review work in learning lifted graphical models from data. It is our belief that the need for statistical relational models (whether it goes by that name or another) will grow in the coming decades, as we are inundated with data which is a mix of structured and unstructured, with entities and relations extracted in a noisy manner from text, and with the need to reason effectively with this data. We hope that this synthesis of ideas from many different research groups will provide an accessible starting point for new researchers in this expanding field.
Prediction of peptide bonding affinity: kernel methods for nonlinear modeling
Bergeron, Charles, Hepburn, Theresa, Sundling, C. Matthew, Krein, Michael, Katt, Bill, Sukumar, Nagamani, Breneman, Curt M., Bennett, Kristin P.
This paper presents regression models obtained from a process of blind prediction of peptide binding affinity from provided descriptors for several distinct datasets as part of the 2006 Comparative Evaluation of Prediction Algorithms (COEPRA) contest. This paper finds that kernel partial least squares, a nonlinear partial least squares (PLS) algorithm, outperforms PLS, and that the incorporation of transferable atom equivalent features improves predictive capability.
Low-rank Matrix Recovery from Errors and Erasures
Chen, Yudong, Jalali, Ali, Sanghavi, Sujay, Caramanis, Constantine
This paper considers the recovery of a low-rank matrix from an observed version that simultaneously contains both (a) erasures: most entries are not observed, and (b) errors: values at a constant fraction of (unknown) locations are arbitrarily corrupted. We provide a new unified performance guarantee on when the natural convex relaxation of minimizing rank plus support succeeds in exact recovery. Our result allows for the simultaneous presence of random and deterministic components in both the error and erasure patterns. On the one hand, corollaries obtained by specializing this one single result in different ways recover (up to poly-log factors) all the existing works in matrix completion, and sparse and low-rank matrix recovery. On the other hand, our results also provide the first guarantees for (a) recovery when we observe a vanishing fraction of entries of a corrupted matrix, and (b) deterministic matrix completion.
Event in Compositional Dynamic Semantics
We present a framework which constructs an event-style dis- course semantics. The discourse dynamics are encoded in continuation semantics and various rhetorical relations are embedded in the resulting interpretation of the framework. We assume discourse and sentence are distinct semantic objects, that play different roles in meaning evalua- tion. Moreover, two sets of composition functions, for handling different discourse relations, are introduced. The paper first gives the necessary background and motivation for event and dynamic semantics, then the framework with detailed examples will be introduced.
Using Supervised Learning to Improve Monte Carlo Integral Estimation
Tracey, Brendan, Wolpert, David, Alonso, Juan J.
Monte Carlo (MC) techniques are often used to estimate integrals of a multivariate function using randomly generated samples of the function. In light of the increasing interest in uncertainty quantification and robust design applications in aerospace engineering, the calculation of expected values of such functions (e.g. performance measures) becomes important. However, MC techniques often suffer from high variance and slow convergence as the number of samples increases. In this paper we present Stacked Monte Carlo (StackMC), a new method for post-processing an existing set of MC samples to improve the associated integral estimate. StackMC is based on the supervised learning techniques of fitting functions and cross validation. It should reduce the variance of any type of Monte Carlo integral estimate (simple sampling, importance sampling, quasi-Monte Carlo, MCMC, etc.) without adding bias. We report on an extensive set of experiments confirming that the StackMC estimate of an integral is more accurate than both the associated unprocessed Monte Carlo estimate and an estimate based on a functional fit to the MC samples. These experiments run over a wide variety of integration spaces, numbers of sample points, dimensions, and fitting functions. In particular, we apply StackMC in estimating the expected value of the fuel burn metric of future commercial aircraft and in estimating sonic boom loudness measures. We compare the efficiency of StackMC with that of more standard methods and show that for negligible additional computational cost significant increases in accuracy are gained.
Submodular Optimization for Efficient Semi-supervised Support Vector Machines
Emara, Wael, Kantardzic, Mehmed
In this work we present a quadratic programming approximation of the Semi-Supervised Support Vector Machine (S3VM) problem, namely approximate QP-S3VM, that can be efficiently solved using off the shelf optimization packages. We prove that this approximate formulation establishes a relation between the low density separation and the graph-based models of semi-supervised learning (SSL) which is important to develop a unifying framework for semi-supervised learning methods. Furthermore, we propose the novel idea of representing SSL problems as submodular set functions and use efficient submodular optimization algorithms to solve them. Using this new idea we develop a representation of the approximate QP-S3VM as a maximization of a submodular set function which makes it possible to optimize using efficient greedy algorithms. We demonstrate that the proposed methods are accurate and provide significant improvement in time complexity over the state of the art in the literature.
Detection and emergence
Bonabeau, Eric, Dessalles, Jean-Louis
Two different conceptions of emergence are reconciled as two instances of the phenomenon of detection. In the process of comparing these two conceptions, we find that the notions of complexity and detection allow us to form a unified definition of emergence that clearly delineates the role of the observer.
Promoting scientific thinking with robots
Carbajal, Juan Pablo, Assaf, Dorit, Benker, Emanuel
This article describes an exemplary robot exercise which was conducted in a class for mechatronics students. The goal of this exercise was to engage students in scientific thinking and reasoning, activities which do not always play an important role in their curriculum. The robotic platform presented here is simple in its construction and is customizable to the needs of the teacher. Therefore, it can be used for exercises in many different fields of science, not necessarily related to robotics. Here we present a situation where the robot is used like an alien creature from which we want to understand its behavior, resembling an ethological research activity. This robot exercise is suited for a wide range of courses, from general introduction to science, to hardware oriented lectures.