Industry
Automorphism Groups of Graphical Models and Lifted Variational Inference
Bui, Hung Hai, Huynh, Tuyen N., Riedel, Sebastian
Classical approaches to probabilistic inference - an area now reasonably well understood - have traditionally exploited low tree-width and sparsity of the graphical model for efficient exact and approximate inference. A more recent approach known as lifted inference [2, 12, 6, 7] has demonstrated the possibility to perform very efficient inference in highly-connected, but symmetric models such as those arising in the context of relational (or first-order) probabilistic models. While it is clear that symmetry is the essential element in lifted inference, there is currently no formally defined notion of symmetry of a probabilistic model, and thus no formal account of what "exploiting symmetry" means in lifted inference. The mathematical formulation of symmetry of an object is typically defined via a set of transformations that preserve the object of interest. Since this set forms a mathematical group (so-called the automorphism group of that object), the theory of groups and group action are essential in the study of symmetry. In this paper, we first introduce the concept of the automorphism group of an exponential family or a graphical model, thus formalizing the notion of symmetry of a general graphical model. This automorphism group provides a precise mathematical framework for lifted inference in graphical models.
Hierarchical Clustering using Randomly Selected Similarities
The problem of hierarchical clustering items from pairwise similarities is found across various scientific disciplines, from biology to networking. Often, applications of clustering techniques are limited by the cost of obtaining similarities between pairs of items. While prior work has been developed to reconstruct clustering using a significantly reduced set of pairwise similarities via adaptive measurements, these techniques are only applicable when choice of similarities are available to the user. In this paper, we examine reconstructing hierarchical clustering under similarity observations at-random. We derive precise bounds which show that a significant fraction of the hierarchical clustering can be recovered using fewer than all the pairwise similarities. We find that the correct hierarchical clustering down to a constant fraction of the total number of items (i.e., clusters sized O(N)) can be found using only O(N log N) randomly selected pairwise similarities in expectation.
Models of Disease Spectra
Rezek, Iead, Beckmann, Christian
Case vs control comparisons have been the classical approach to the study of neurological diseases. However, most patients will not fall cleanly into either group. Instead, clinicians will typically find patients that cannot be classified as having clearly progressed into the disease state. For those subjects, very little can be said about their brain function on the basis of analyses of group differences. To describe the intermediate brain function requires models that interpolate between the disease states. We have chosen Gaussian Processes (GP) regression to obtain a continuous spectrum of brain activation and to extract the unknown disease progression profile. Our models incorporate spatial distribution of measures of activation, e.g. the correlation of an fMRI trace with an input stimulus, and so constitute ultra-high multi-variate GP regressors. We applied GPs to model fMRI image phenotypes across Alzheimer's Disease (AD) behavioural measures, e.g. MMSE, ACE etc. scores, and obtained predictions at non-observed MMSE/ACE values. The overall model confirmed the known reduction in the spatial extent of activity in response to reading versus false-font stimulation. The predictive uncertainty indicated the worsening confidence intervals at behavioural scores distance from those used for GP training. Thus, the model indicated the type of patient (what behavioural score) that would need to included in the training data to improve models predictions.
Recovering Epipolar Geometry from Images of Smooth Surfaces
We present four methods for recovering the epipolar geometry from images of smooth surfaces. In the existing methods for recovering epipolar geometry corresponding feature points are used that cannot be found in such images. The first method is based on finding corresponding characteristic points created by illumination (ICPM - illumination characteristic points' method (PM)). The second method is based on correspondent tangency points created by tangents from epipoles to outline of smooth bodies (OTPM - outline tangent PM). These two methods are exact and give correct results for real images, because positions of the corresponding illumination characteristic points and corresponding outline are known with small errors. But the second method is limited either to special type of scenes or to restricted camera motion. We also consider two more methods which are termed CCPM (curve characteristic PM) and CTPM (curve tangent PM), for searching epipolar geometry for images of smooth bodies based on a set of level curves with constant illumination intensity. The CCPM method is based on searching correspondent points on isophoto curves with the help of correlation of curvatures between these lines. The CTPM method is based on property of the tangential to isophoto curve epipolar line to map into the tangential to correspondent isophoto curves epipolar line. The standard method (SM) based on knowledge of pairs of the almost exact correspondent points. The methods have been implemented and tested by SM on pairs of real images. Unfortunately, the last two methods give us only a finite subset of solutions including "good" solution. Exception is "epipoles in infinity". The main reason is inaccuracy of assumption of constant brightness for smooth bodies. But outline and illumination characteristic points are not influenced by this inaccuracy. So, the first pair of methods gives exact results.
Expectation-Propagation for Likelihood-Free Inference
Barthelmé, Simon, Chopin, Nicolas
Many models of interest in the natural and social sciences have no closed-form likelihood function, which means that they cannot be treated using the usual techniques of statistical inference. In the case where such models can be efficiently simulated, Bayesian inference is still possible thanks to the Approximate Bayesian Computation (ABC) algorithm. Although many refinements have been suggested, ABC inference is still far from routine. ABC is often excruciatingly slow due to very low acceptance rates. In addition, ABC requires introducing a vector of "summary statistics", the choice of which is relatively arbitrary, and often require some trial and error, making the whole process quite laborious for the user. We introduce in this work the EP-ABC algorithm, which is an adaptation to the likelihood-free context of the variational approximation algorithm known as Expectation Propagation (Minka, 2001). The main advantage of EP-ABC is that it is faster by a few orders of magnitude than standard algorithms, while producing an overall approximation error which is typically negligible. A second advantage of EP-ABC is that it replaces the usual global ABC constraint on the vector of summary statistics computed on the whole dataset, by n local constraints of the form that apply separately to each data-point. As a consequence, it is often possible to do away with summary statistics entirely. In that case, EP-ABC approximates directly the evidence (marginal likelihood) of the model. Comparisons are performed in three real-world applications which are typical of likelihood-free inference, including one application in neuroscience which is novel, and possibly too challenging for standard ABC techniques.
Unachievable Region in Precision-Recall Space and Its Effect on Empirical Evaluation
Boyd, Kendrick, Costa, Vitor Santos, Davis, Jesse, Page, David
Precision-recall (PR) curves and the areas under them are widely used to summarize machine learning results, especially for data sets exhibiting class skew. They are often used analogously to ROC curves and the area under ROC curves. It is known that PR curves vary as class skew changes. What was not recognized before this paper is that there is a region of PR space that is completely unachievable, and the size of this region depends only on the skew. This paper precisely characterizes the size of that region and discusses its implications for empirical evaluation methodology in machine learning.
Reasoning about Agent Programs using ATL-like Logics
Yadav, Nitin, Sardina, Sebastian
We propose a variant of Alternating-time Temporal Logic (ATL) grounded in the agents' operational know-how, as defined by their libraries of abstract plans. Inspired by ATLES, a variant itself of ATL, it is possible in our logic to explicitly refer to "rational" strategies for agents developed under the Belief-Desire-Intention agent programming paradigm. This allows us to express and verify properties of BDI systems using ATL-type logical frameworks. Keywords: Agent Programming, Reactive plans, ATL, Model Checking.
Ultrametric Model of Mind, II: Application to Text Content Analysis
In a companion paper, Murtagh (2012), we discussed how Matte Blanco's work linked the unrepressed unconscious (in the human) to symmetric logic and thought processes. We showed how ultrametric topology provides a most useful representational and computational framework for this. Now we look at the extent to which we can find ultrametricity in text. We use coherent and meaningful collections of nearly 1000 texts to show how we can measure inherent ultrametricity. On the basis of our findings we hypothesize that inherent ultrametricty is a basis for further exploring unconscious thought processes.
Automated Inference System for End-To-End Diagnosis of Network Performance Issues in Client-Terminal Devices
Widanapathirana, Chathuranga, Şekercioǧlu, Y. Ahmet, Ivanovich, Milosh V., Fitzpatrick, Paul G., Li, Jonathan C.
Traditional network diagnosis methods of Client-Terminal Device (CTD) problems tend to be laborintensive, time consuming, and contribute to increased customer dissatisfaction. In this paper, we propose an automated solution for rapidly diagnose the root causes of network performance issues in CTD. Based on a new intelligent inference technique, we create the Intelligent Automated Client Diagnostic (IACD) system, which only relies on collection of Transmission Control Protocol (TCP) packet traces. Using soft-margin Support Vector Machine (SVM) classifiers, the system (i) distinguishes link problems from client problems and (ii) identifies characteristics unique to the specific fault to report the root cause. The modular design of the system enables support for new access link and fault types. Experimental evaluation demonstrated the capability of the IACD system to distinguish between faulty and healthy links and to diagnose the client faults with 98% accuracy. The system can perform fault diagnosis independent of the user's specific TCP implementation, enabling diagnosis of diverse range of client devices.
Towards a Self-Organized Agent-Based Simulation Model for Exploration of Human Synaptic Connections
Gürcan, Önder, Bernon, Carole, Türker, Kemal S.
In this paper, the early design of our self-organized agent-based simulation model for exploration of synaptic connections that faithfully generates what is observed in natural situation is given. While we take inspiration from neuroscience, our intent is not to create a veridical model of processes in neurodevelopmental biology, nor to represent a real biological system. Instead, our goal is to design a simulation model that learns acting in the same way of human nervous system by using findings on human subjects using reflex methodologies in order to estimate unknown connections.