Government
Prediction and Fault Detection of Environmental Signals with Uncharacterised Faults
Osborne, Michael Alan (University of Oxford) | Garnett, Roman (Carnegie Mellon University) | Swersky, Kevin (University of Toronto) | Freitas, Nando de (University of British Columbia)
Many signals of interest are corrupted by faults of anunknown type. We propose an approach that uses Gaus-sian processes and a general โfault bucketโ to capturea priori uncharacterised faults, along with an approxi-mate method for marginalising the potential faultinessof all observations. This gives rise to an efficient, flexible algorithm for the detection and automatic correction of faults. Our method is deployed in the domain of water monitoring and management, where it is able to solve several fault detection, correction, and prediction problems. The method works well despite the fact that the data is plagued with numerous difficulties, including missing observations, multiple discontinuities, nonlinearity and many unanticipated types of fault.
Global Climate Model Tracking Using Geospatial Neighborhoods
McQuade, Scott (The George Washington University) | Monteleoni, Claire (The George Washington University)
A key problem in climate science is how to combine the predictions of the multi-model ensemble of global climate models. Recent work in machine learning (Monteleoni et al. 2011) showed the promise of an algorithm for online learning with experts for this task.We extend the Tracking Climate Models (TCM) approach to (1) take into account climate model predictions at higher spatial resolutions and (2) to model geospatial neighborhood influence between regions. Our algorithm enables neighborhood influence by modifying the transition dynamics of the Hidden Markov Model used by TCM, allowing the performance of spatial neighbors to influence the temporal switching probabilities for the best expert (climate model) at a given location. In experiments on historical data at a variety of spatial resolutions, our algorithm demonstrates improvements over TCM, when tracking global temperature anomalies.
Dynamically Switching between Synergistic Work๏ฌows for Crowdsourcing
Lin, Christopher H. (University of Washington) | Mausam, Mausam (University of Washington) | Weld, Daniel S. (University of Washington)
To ensure quality results from unreliable crowdsourced workers, task designers often construct complex workflows and aggregate worker responses from redundant runs. Frequently, they experiment with several alternative workflows to accomplish the task, and eventually deploy the one that achieves the best performance during early trials. Surprisingly, this seemingly natural design paradigm does not achieve the full potential of crowdsourcing. In particular, using a single workflow (even the best) to accomplish a task is suboptimal. We show that alternative workflows can compose synergistically to yield much higher quality output. We formalize the insight with a novel probabilistic graphical model. Based on this model, we design and implement AGENTHUNT, a POMDP-based controller that dynamically switches between these workflows to achieve higher returns on investment. Additionally, we design offline and online methods for learning model parameters. Live experiments on Amazon Mechanical Turk demonstrate the superiority of AGENTHUNT for the task of generating NLP training data, yielding up to 50% error reduction and greater net utility compared to previous methods.
ET-LDA: Joint Topic Modeling for Aligning Events and their Twitter Feedback
Hu, Yuheng (Arizona State University) | John, Ajita (Avaya Labs) | Wang, Fei (IBM T. J. Watson Research Lab) | Kambhampati, Subbarao (Arizona State University)
During broadcast events such as the Superbowl, the U.S. Presidential and Primary debates, etc., Twitter has become the de facto platform for crowds to share perspectives and commentaries about them. Given an event and an associated large-scale collection of tweets, there are two fundamental research problems that have been receiving increasing attention in recent years. One is to extract the topics covered by the event and the tweets; the other is to segment the event. So far these problems have been viewed separately and studied in isolation. In this work, we argue that these problems are in fact inter-dependent and should be addressed together. We develop a joint Bayesian model that performs topic modeling and event segmentation in one unified framework. We evaluate the proposed model both quantitatively and qualitatively on two large-scale tweet datasets associated with two events from different domains to show that it improves significantly over baseline models.
Querying Linked Ontological Data through Distributed Summarization
Fokoue, Achille (IBM T. J. Watson Research Center) | Meneguzzi, Felipe (Carnegie Mellon University) | Sensoy, Murat (University of Aberdeen) | Pan, Jeff Z. (University of Aberdeen)
As the semantic web expands, ontological data becomes distributed over a large network of data sources on the Web. Consequently, evaluating queries that aim to tap into this distributed semantic database necessitates the ability to consult multiple data sources efficiently. In this paper, we propose methods and heuristics to efficiently query distributed ontological data based on a series of properties of summarized data. In our approach, each source summarizes its data as another RDF graph, and relevant section of these summaries are merged and analyzed at query evaluation time. We show how the analysis of these summaries enables more efficient source selection, query pruning and transformation of expensive distributed joins into local joins.
Automorphism Groups of Graphical Models and Lifted Variational Inference
Bui, Hung Hai, Huynh, Tuyen N., Riedel, Sebastian
Classical approaches to probabilistic inference - an area now reasonably well understood - have traditionally exploited low tree-width and sparsity of the graphical model for efficient exact and approximate inference. A more recent approach known as lifted inference [2, 12, 6, 7] has demonstrated the possibility to perform very efficient inference in highly-connected, but symmetric models such as those arising in the context of relational (or first-order) probabilistic models. While it is clear that symmetry is the essential element in lifted inference, there is currently no formally defined notion of symmetry of a probabilistic model, and thus no formal account of what "exploiting symmetry" means in lifted inference. The mathematical formulation of symmetry of an object is typically defined via a set of transformations that preserve the object of interest. Since this set forms a mathematical group (so-called the automorphism group of that object), the theory of groups and group action are essential in the study of symmetry. In this paper, we first introduce the concept of the automorphism group of an exponential family or a graphical model, thus formalizing the notion of symmetry of a general graphical model. This automorphism group provides a precise mathematical framework for lifted inference in graphical models.
An Integrated, Conditional Model of Information Extraction and Coreference with Applications to Citation Matching
Wellner, Ben, McCallum, Andrew, Peng, Fuchun, Hay, Michael
Although information extraction and coreference resolution appear together in many applications, most current systems perform them as independent steps. This paper describes an approach to integrated inference for extraction and coreference based on conditionally-trained undirected graphical models. We discuss the advantages of conditional probability training, and of a coreference model structure based on graph partitioning. On a data set of research paper citations, we show significant reduction in error by using extraction uncertainty to improve coreference citation matching accuracy, and using coreference to improve the accuracy of the extracted fields.
Applying Discrete PCA in Data Analysis
Buntine, Wray L., Jakulin, Aleks
Methods for analysis of principal components in discrete data have existed for some time under various names such as grade of membership modelling, probabilistic latent semantic analysis, and genotype inference with admixture. In this paper we explore a number of extensions to the common theory, and present some application of these methods to some common statistical tasks. We show that these methods can be interpreted as a discrete version of ICA. We develop a hierarchical version yielding components at different levels of detail, and additional techniques for Gibbs sampling. We compare the algorithms on a text prediction task using support vector machines, and to information retrieval.
Dynamic Programming for Structured Continuous Markov Decision Problems
Feng, Zhengzhu, Dearden, Richard, Meuleau, Nicolas, Washington, Richard
We describe an approach for exploiting structure in Markov Decision Processes with continuous state variables. At each step of the dynamic programming, the state space is dynamically partitioned into regions where the value function is the same throughout the region. We first describe the algorithm for piecewise constant representations. We then extend it to piecewise linear representations, using techniques from POMDPs to represent and reason about linear surfaces efficiently. We show that for complex, structured problems, our approach exploits the natural structure so that optimal solutions can be computed efficiently.
Super-Mixed Multiple Attribute Group Decision Making Method Based on Hybrid Fuzzy Grey Relation Approach Degree
A multiple attribute decision making (MADM), in which attributes are real number, interval real number, linguistic and uncertain linguistic value, has been already applied in practice such as the evaluation of enterprise effect, the selection of investment project, the selection of person, the research of military equipment scheme, the evaluation of strategy effect, the reliability assessment and the maintainability assessment, etc (Yongqi Xia, 2004, Dang Luo, Sifeng Liu, 2005, Yongqing Wei, Peide Liu, 2009). Extended TOPSIS Method with Interval-Valued Intuitionistic Fuzzy Numbers for Virtual Enterprise Partner Selection has been researched by Fei Ye(2010). Chuanming Ding (2007,a) defined a new similarity degree for various types of attribute and normalized the calculation of similarity degree of the attribute value of each type in unified metric space. Also, by this similarity degree, the comparison of each plan with ideal plan was performed and decision making method was given. Chuanming (2007,b), based on the TOPSIS (Technique for Order Preference by Similarity to Ideal Solution), transformed the attribute value of plan into four-dimensional attribute value, unified various types of attribute value, defined a fourdimensional approach degree, and by this approach degree, solved the multiple attribute mixed-type decision-making problem associated with real number, interval real number, linguistic and uncertain linguistic value. Yongqi Xia (2004) studied a method considering insufficiency degree of information and preference to danger on the basis of the grey-fuzzy comprehensive evaluation method of interval value preference. In the method, they represent the weight and the attribute value by two interval number pair by considering membership and grey degree at the same time. Sifeng Liu, Yaoguo Dang, Jiangling Wang, Zhengpeng Wu (2009), based on the definitions of entropy, proposed a method of getting weight that considers the character of grey cluster decision-making and 2-tuple linguistic assessment, and proposed the method of 2-tuple linguistic assessment based on grey cluster. Zhen Zhang, Chonghui Guo (2012) transformed uncertain linguistic evaluation information of each decision maker to trapezoidal fuzzy numbers, and then denoted, by solving two optimization models, the collective evaluation of the alternatives by trapezoidal fuzzy numbers.