Goto

Collaborating Authors

 Mathematical & Statistical Methods


Computable de Finetti measures

arXiv.org Machine Learning

We prove a computable version of de Finetti's theorem on exchangeable sequences of real random variables. As a consequence, exchangeable stochastic processes expressed in probabilistic functional programming languages can be automatically rewritten as procedures that do not modify non-local state. Along the way, we prove that a distribution on the unit interval is computable if and only if its moments are uniformly computable. Key words: de Finetti's theorem, exchangeability, computable probability theory, probabilistic programming languages, mutation 2010 MSC: 03D78, 60G09, 68Q10, 03F60, 68N18 1. Introduction The classical de Finetti theorem states that an exchangeable sequence of real random variables is a mixture of independent and identically distributed (i.i.d.) sequences of random variables. Moreover, there is an (almost surely unique) measure-valued random variable, called the directing random measure, conditioned on which the random sequence is i.i.d. The distribution of the directing random measure is called the de Finetti measure or the mixing measure. This paper examines the computable probability theory of exchangeable sequences of real-valued random variables. We prove a computable version of de Finetti's theorem: the distribution of an exchangeable sequence of real random variables is computable if and only if its de Finetti measure is computable. The classical proofs do not readily effectivize; instead, we show how to directly compute the de Finetti measure (as characterized by the classical theorem) in terms of a computable representation of the distribution of the exchangeable sequence. A key step in the proof is to describe the de Finetti measure in terms of the moments of a set of random variables derived from the exchangeable sequence. When the directing random measure is (almost surely) continuous, we can show that these moments are computable, which suffices to complete the proof of the main theorem in this case.


Scenario trees and policy selection for multistage stochastic programming using machine learning

arXiv.org Machine Learning

Stochastic optimization using scenario trees has proven to be a powerful algorithmic strategy, but has suffered from the rapid growth in the size of scenario trees as the number of stages grows (Birge and Louveaux, 1997; Shapiro et al., 2009). A number of authors have undertaken research to limit the size of the scenario tree, but problem size still grows exponentially with the number of stages (Frauendorfer, 1996; Dupacova et al., 2000; Høyland and Wallace, 2001; Pennanen, 2009; Heitsch and Römisch, 2009). As a result, most authors either severely limit the number of decision stages or sharply limit the number of scenarios per stage (Birge, 1997; Wallace and Ziemba, 2005; Dempster et al., 2008; Kallrath et al., 2009). Such approximations make it possible to optimize first-stage decisions with a stochastic look-ahead, but without tight guarantees on the value of the computed decisions for the true multistage problem (as a matter of fact, bounding techniques also tend to break down on problems with many stages). Some authors have proposed to assess the quality of scenario-tree based methods by outof-sample validation (Kouwenberg, 2001; Chiralaksanakul, 2003; Hilli and Pennanen, 2008). The validation scheme consists of solving the multistage program posed on a scenario tree spanning the planning horizon T, implementing the decision relative to time step 1, sampling the realization of the stochastic process at time 1, updating the conditional distributions of the stochastic process from time 2 to time T, rebuilding a scenario tree spanning time periods 2 to T, solving the new multistage program over the remaining horizon (with previously 1 implemented decisions fixed to their value), and continuing this process until the last decision at time T has been found.


A Graph Theory Approach for Generating Multiple Choice Exams

AAAI Conferences

It is costly and time consuming to develop Multiple Choice Questions (MCQ) by hand. Using web-based resources to automate components of MCQ development would greatly benefit the education community through reducing reduplication of effort. Similar to many areas of Natural Language Processing (NLP), human-judged data is needed to train automated systems, but the majority of such data is proprietary. We present a graph-based representation for gathering training data from existing, web-based resources that increases access to such data and better directs the development of good questions.


Infinitely exchangeable random graphs generated from a Poisson point process on monotone sets and applications to cluster analysis for networks

arXiv.org Machine Learning

We construct an infinitely exchangeable process on the set $\cate$ of subsets of the power set of the natural numbers $\mathbb{N}$ via a Poisson point process with mean measure $\Lambda$ on the power set of $\mathbb{N}$. Each $E\in\cate$ has a least monotone cover in $\catf$, the collection of monotone subsets of $\cate$, and every monotone subset maps to an undirected graph $G\in\catg$, the space of undirected graphs with vertex set $\mathbb{N}$. We show a natural mapping $\cate\rightarrow\catf\rightarrow\catg$ which induces an infinitely exchangeable measure on the projective system $\catg^{\rest}$ of graphs $\catg$ under permutation and restriction mappings given an infinitely exchangeable family of measures on the projective system $\cate^{\rest}$ of subsets with permutation and restriction maps. We show potential connections of this process to applications in cluster analysis, machine learning, classification and Bayesian inference.


A Combinatorial Optimisation Approach to Designing Dual-Parented Long-Reach Passive Optical Networks

arXiv.org Artificial Intelligence

We present an application focused on the design of resilient long-reach passive optical networks. We specifically consider dual-parented networks whereby each customer must be connected to two metro sites via local exchange sites. An important property of such a placement is resilience to single metro node failure. The objective of the application is to determine the optimal position of a set of metro nodes such that the total optical fibre length is minimized. We prove that this problem is NP-Complete. We present two alternative combinatorial optimisation approaches to finding an optimal metro node placement using: a mixed integer linear programming (MIP) formulation of the problem; and, a hybrid approach that uses clustering as a preprocessing step. We consider a detailed case-study based on a network for Ireland. The hybrid approach scales well and finds solutions that are close to optimal, with a runtime that is two orders-of-magnitude better than the MIP model.


Robust Kernel Density Estimation

arXiv.org Machine Learning

We propose a method for nonparametric density estimation that exhibits robustness to contamination of the training sample. This method achieves robustness by combining a traditional kernel density estimator (KDE) with ideas from classical $M$-estimation. We interpret the KDE based on a radial, positive semi-definite kernel as a sample mean in the associated reproducing kernel Hilbert space. Since the sample mean is sensitive to outliers, we estimate it robustly via $M$-estimation, yielding a robust kernel density estimator (RKDE). An RKDE can be computed efficiently via a kernelized iteratively re-weighted least squares (IRWLS) algorithm. Necessary and sufficient conditions are given for kernelized IRWLS to converge to the global minimizer of the $M$-estimator objective function. The robustness of the RKDE is demonstrated with a representer theorem, the influence function, and experimental results for density estimation and anomaly detection.


Computing Randomized Security Strategies in Networked Domains

AAAI Conferences

Traditionally, security decisions have been made without explicitly accounting for adaptive, intelligent attackers. Recent game theoretic security models have explicitly included attacker response in computing randomized security policies. Techniques to date, however, generally fail to explicitly account for interdependence between the targets to be secured, which is of vital importance in a variety of domains, including cyber, supply chain, and critical infrastructure security. We introduce a novel framework for computing optimal randomized security policies in networked domains which extends previous approaches in two ways. First, we extend previous linear programming techniques for Stackelberg security games to incorporate benefits and costs of arbitrary security configurations on individual assets. Second, we offer a principled model of failure cascades that allows us to capture both the direct and indirect value of assets. Finally, we use our framework to analyze four models, two based on random graph generation models, a simple model of interdependence between critical infrastructure and key resource sectors, and a model of the Fedwire interbank payment network.


Addressing Execution and Observation Error in Security Games

AAAI Conferences

Attacker-defender Stackelberg games have become a popular game-theoretic approach for security with deployments for LAX Police, the FAMS and the TSA. Unfortunately, most of the existing solution approaches do not model two key uncertainties of the real-world: there may be noise in the defender’s execution of the suggested mixed strategy and/or the observations made by an attacker can be noisy. In this paper, we analyze a framework to model these uncertainties, and demonstrate that previous strategies perform poorly in such uncertain settings. We also analyze RECON, a novel algorithm that computes strategies for the defender that are robust to such uncertainties, and explore heuristics that further improve RECON’s efficiency.


Fast Newton-CG Method for Batch Learning of Conditional Random Fields

AAAI Conferences

We propose a fast batch learning method for linear-chain Conditional Random Fields (CRFs) based on Newton-CG methods. Newton-CG methods are a variant of Newton method for high-dimensional problems. They only require the Hessian-vector products instead of the full Hessian matrices. To speed up Newton-CG methods for the CRF learning, we derive a novel dynamic programming procedure for the Hessian-vector products of the CRF objective function. The proposed procedure can reuse the byproducts of the time-consuming gradient computation for the Hessian-vector products to drastically reduce the total computation time of the Newton-CG methods. In experiments with tasks in natural language processing, the proposed method outperforms a conventional quasi-Newton method. Remarkably, the proposed method is competitive with online learning algorithms that are fast but unstable.


A Fast Dual Projected Newton Method for L1-Regularized Least Squares

AAAI Conferences

L1-regularized least squares, with the ability of discovering sparse representations, is quite prevalent in the field of machine learning, statistics and signal processing. In this paper, we propose a novel algorithm called Dual Projected Newton Method (DPNM) to solve the L1-regularized least squares problem. In DPNM, we first derive a new dual problem as a box constrained quadratic programming. Then, a projected Newton method is utilized to solve the dual problem, achieving a quadratic convergence rate . Moreover, we propose to utilize some practical techniques, thus it greatly reduces the computational cost and makes DPNM more efficient. Experimental results on six real-world data sets indicate that DPNM is very efficient for solving the L1-regularized least squares problem, by comparing it with state of the art methods.