Goto

Collaborating Authors

 Mathematical & Statistical Methods


MAP Estimation, Message Passing, and Perfect Graphs

arXiv.org Artificial Intelligence

Efficiently finding the maximum a posteriori (MAP) configuration of a graphical model is an important problem which is often implemented using message passing algorithms. The optimality of such algorithms is only well established for singly-connected graphs and other limited settings. This article extends the set of graphs where MAP estimation is in P and where message passing recovers the exact solution to so-called perfect graphs. This result leverages recent progress in defining perfect graphs (the strong perfect graph theorem), linear programming relaxations of MAP estimation and recent convergent message passing schemes. The article converts graphical models into nand Markov random fields which are straightforward to relax into linear programs. Therein, integrality can be established in general by testing for graph perfection. This perfection test is performed efficiently using a polynomial time algorithm. Alternatively, known decomposition tools from perfect graph theory may be used to prove perfection for certain families of graphs. Thus, a general graph framework is provided for determining when MAP estimation in any graphical model is in P, has an integral linear programming relaxation and is exactly recoverable by message passing.


Approximate Computation and Implicit Regularization for Very Large-scale Data Analysis

arXiv.org Machine Learning

Database theory and database practice are typically the domain of computer scientists who adopt what may be termed an algorithmic perspective on their data. This perspective is very different than the more statistical perspective adopted by statisticians, scientific computers, machine learners, and other who work on what may be broadly termed statistical data analysis. In this article, I will address fundamental aspects of this algorithmic-statistical disconnect, with an eye to bridging the gap between these two very different approaches. A concept that lies at the heart of this disconnect is that of statistical regularization, a notion that has to do with how robust is the output of an algorithm to the noise properties of the input data. Although it is nearly completely absent from computer science, which historically has taken the input data as given and modeled algorithms discretely, regularization in one form or another is central to nearly every application domain that applies algorithms to noisy data. By using several case studies, I will illustrate, both theoretically and empirically, the nonobvious fact that approximate computation, in and of itself, can implicitly lead to statistical regularization. This and other recent work suggests that, by exploiting in a more principled way the statistical properties implicit in worst-case algorithms, one can in many cases satisfy the bicriteria of having algorithms that are scalable to very large-scale databases and that also have good inferential or predictive properties.


Beneath the valley of the noncommutative arithmetic-geometric mean inequality: conjectures, case-studies, and consequences

arXiv.org Machine Learning

Randomized algorithms that base iteration-level decisions on samples from some pool are ubiquitous in machine learning and optimization. Examples include stochastic gradient descent and randomized coordinate descent. This paper makes progress at theoretically evaluating the difference in performance between sampling with- and without-replacement in such algorithms. Focusing on least means squares optimization, we formulate a noncommutative arithmetic-geometric mean inequality that would prove that the expected convergence rate of without-replacement sampling is faster than that of with-replacement sampling. We demonstrate that this inequality holds for many classes of random matrices and for some pathological examples as well. We provide a deterministic worst-case bound on the gap between the discrepancy between the two sampling models, and explore some of the impediments to proving this inequality in full generality. We detail the consequences of this inequality for stochastic gradient descent and the randomized Kaczmarz algorithm for solving linear systems.


Maximum Covariance Unfolding : Manifold Learning for Bimodal Data

Neural Information Processing Systems

We propose maximum covariance unfolding (MCU), a manifold learning algorithm for simultaneous dimensionality reduction of data from different input modalities. Given high dimensional inputs from two different but naturally aligned sources, MCU computes a common low dimensional embedding that maximizes the cross-modal (inter-source) correlations while preserving the local (intra-source) distances. In this paper, we explore two applications of MCU. First we use MCU to analyze EEG-fMRI data, where an important goal is to visualize the fMRI voxels that are most strongly correlated with changes in EEG traces. To perform this visualization, we augment MCU with an additional step for metric learning in the high dimensional voxel space. Second, we use MCU to perform cross-modal retrieval of matched image and text samples from Wikipedia. To manage large applications of MCU, we develop a fast implementation based on ideas from spectral graph theory. These ideas transform the original problem for MCU, one of semidefinite programming, into a simpler problem in semidefinite quadratic linear programming.


Computable de Finetti measures

arXiv.org Machine Learning

We prove a computable version of de Finetti's theorem on exchangeable sequences of real random variables. As a consequence, exchangeable stochastic processes expressed in probabilistic functional programming languages can be automatically rewritten as procedures that do not modify non-local state. Along the way, we prove that a distribution on the unit interval is computable if and only if its moments are uniformly computable.


A Graph Theory Approach for Generating Multiple Choice Exams

AAAI Conferences

It is costly and time consuming to develop Multiple Choice Questions (MCQ) by hand. Using web-based resources to automate components of MCQ development would greatly benefit the education community through reducing reduplication of effort. Similar to many areas of Natural Language Processing (NLP), human-judged data is needed to train automated systems, but the majority of such data is proprietary. We present a graph-based representation for gathering training data from existing, web-based resources that increases access to such data and better directs the development of good questions.


Infinitely exchangeable random graphs generated from a Poisson point process on monotone sets and applications to cluster analysis for networks

arXiv.org Machine Learning

We construct an infinitely exchangeable process on the set $\cate$ of subsets of the power set of the natural numbers $\mathbb{N}$ via a Poisson point process with mean measure $\Lambda$ on the power set of $\mathbb{N}$. Each $E\in\cate$ has a least monotone cover in $\catf$, the collection of monotone subsets of $\cate$, and every monotone subset maps to an undirected graph $G\in\catg$, the space of undirected graphs with vertex set $\mathbb{N}$. We show a natural mapping $\cate\rightarrow\catf\rightarrow\catg$ which induces an infinitely exchangeable measure on the projective system $\catg^{\rest}$ of graphs $\catg$ under permutation and restriction mappings given an infinitely exchangeable family of measures on the projective system $\cate^{\rest}$ of subsets with permutation and restriction maps. We show potential connections of this process to applications in cluster analysis, machine learning, classification and Bayesian inference.


A Combinatorial Optimisation Approach to Designing Dual-Parented Long-Reach Passive Optical Networks

arXiv.org Artificial Intelligence

We present an application focused on the design of resilient long-reach passive optical networks. We specifically consider dual-parented networks whereby each customer must be connected to two metro sites via local exchange sites. An important property of such a placement is resilience to single metro node failure. The objective of the application is to determine the optimal position of a set of metro nodes such that the total optical fibre length is minimized. We prove that this problem is NP-Complete. We present two alternative combinatorial optimisation approaches to finding an optimal metro node placement using: a mixed integer linear programming (MIP) formulation of the problem; and, a hybrid approach that uses clustering as a preprocessing step. We consider a detailed case-study based on a network for Ireland. The hybrid approach scales well and finds solutions that are close to optimal, with a runtime that is two orders-of-magnitude better than the MIP model.


Robust Kernel Density Estimation

arXiv.org Machine Learning

We propose a method for nonparametric density estimation that exhibits robustness to contamination of the training sample. This method achieves robustness by combining a traditional kernel density estimator (KDE) with ideas from classical $M$-estimation. We interpret the KDE based on a radial, positive semi-definite kernel as a sample mean in the associated reproducing kernel Hilbert space. Since the sample mean is sensitive to outliers, we estimate it robustly via $M$-estimation, yielding a robust kernel density estimator (RKDE). An RKDE can be computed efficiently via a kernelized iteratively re-weighted least squares (IRWLS) algorithm. Necessary and sufficient conditions are given for kernelized IRWLS to converge to the global minimizer of the $M$-estimator objective function. The robustness of the RKDE is demonstrated with a representer theorem, the influence function, and experimental results for density estimation and anomaly detection.


Addressing Execution and Observation Error in Security Games

AAAI Conferences

Attacker-defender Stackelberg games have become a popular game-theoretic approach for security with deployments for LAX Police, the FAMS and the TSA. Unfortunately, most of the existing solution approaches do not model two key uncertainties of the real-world: there may be noise in the defender’s execution of the suggested mixed strategy and/or the observations made by an attacker can be noisy. In this paper, we analyze a framework to model these uncertainties, and demonstrate that previous strategies perform poorly in such uncertain settings. We also analyze RECON, a novel algorithm that computes strategies for the defender that are robust to such uncertainties, and explore heuristics that further improve RECON’s efficiency.