Mathematical & Statistical Methods
Algorithms for Stochastic Physical Search on General Graphs
Brown, Daniel S. (Air Force Research Laboratories) | Hudack, Jeffrey (Air Force Research Laboratories) | Banerjee, Bikramjit (University of Southern Mississippi)
Stochastic Physical Search (SPS) refers to the search for an item in a physical environment where the item's price is stochastic, and where the cost to obtain the item includes both travel and purchase costs. This type of problem models task planning scenarios where the cost of completing an objective at a location is drawn from a probability distribution, reflecting the influence of unknown factors. Prior work on this domain has focused on solutions where the expected cost is minimized. Recently, SPS problems with other objectives have been proposed and theoretically analyzed, in particular when either the budget or the desired probability of success is fixed. However, general optimal solvers for these new variants do not yet exist. We present algorithms for optimal solution of these variants on general graphs. We formulate them as mixed integer linear programming problems, and solve them using an off-the-shelf MILP solver. We then develop custom branch and bound algorithms which result in a dramatic reduction in computation speed. Using these algorithms, we generate empirical insights into the hardness landscape of the fixed budget and fixed probability of success SPS variants.
Spectral Sparsification of Random-Walk Matrix Polynomials
Cheng, Dehua, Cheng, Yu, Liu, Yan, Peng, Richard, Teng, Shang-Hua
We consider a fundamental algorithmic question in spectral graph theory: Compute a spectral sparsifier of random-walk matrix-polynomial $$L_\alpha(G)=D-\sum_{r=1}^d\alpha_rD(D^{-1}A)^r$$ where $A$ is the adjacency matrix of a weighted, undirected graph, $D$ is the diagonal matrix of weighted degrees, and $\alpha=(\alpha_1...\alpha_d)$ are nonnegative coefficients with $\sum_{r=1}^d\alpha_r=1$. Recall that $D^{-1}A$ is the transition matrix of random walks on the graph. The sparsification of $L_\alpha(G)$ appears to be algorithmically challenging as the matrix power $(D^{-1}A)^r$ is defined by all paths of length $r$, whose precise calculation would be prohibitively expensive. In this paper, we develop the first nearly linear time algorithm for this sparsification problem: For any $G$ with $n$ vertices and $m$ edges, $d$ coefficients $\alpha$, and $\epsilon > 0$, our algorithm runs in time $O(d^2m\log^2n/\epsilon^{2})$ to construct a Laplacian matrix $\tilde{L}=D-\tilde{A}$ with $O(n\log n/\epsilon^{2})$ non-zeros such that $\tilde{L}\approx_{\epsilon}L_\alpha(G)$. Matrix polynomials arise in mathematical analysis of matrix functions as well as numerical solutions of matrix equations. Our work is particularly motivated by the algorithmic problems for speeding up the classic Newton's method in applications such as computing the inverse square-root of the precision matrix of a Gaussian random field, as well as computing the $q$th-root transition (for $q\geq1$) in a time-reversible Markov model. The key algorithmic step for both applications is the construction of a spectral sparsifier of a constant degree random-walk matrix-polynomials introduced by Newton's method. Our algorithm can also be used to build efficient data structures for effective resistances for multi-step time-reversible Markov models, and we anticipate that it could be useful for other tasks in network analysis.
Reconstruction in the Labeled Stochastic Block Model
Lelarge, Marc, Massouliรฉ, Laurent, Xu, Jiaming
The labeled stochastic block model is a random graph model representing networks with community structure and interactions of multiple types. In its simplest form, it consists of two communities of approximately equal size, and the edges are drawn and labeled at random with probability depending on whether their two endpoints belong to the same community or not. It has been conjectured in \cite{Heimlicher12} that correlated reconstruction (i.e.\ identification of a partition correlated with the true partition into the underlying communities) would be feasible if and only if a model parameter exceeds a threshold. We prove one half of this conjecture, i.e., reconstruction is impossible when below the threshold. In the positive direction, we introduce a weighted graph to exploit the label information. With a suitable choice of weight function, we show that when above the threshold by a specific constant, reconstruction is achieved by (1) minimum bisection, (2) a semidefinite relaxation of minimum bisection, and (3) a spectral method combined with removal of edges incident to vertices of high degree. Furthermore, we show that hypothesis testing between the labeled stochastic block model and the labeled Erd\H{o}s-R\'enyi random graph model exhibits a phase transition at the conjectured reconstruction threshold.
From Predictive to Prescriptive Analytics
Bertsimas, Dimitris, Kallus, Nathan
In this paper, we combine ideas from machine learning (ML) and operations research and management science (OR/MS) in developing a framework, along with specific methods, for using data to prescribe decisions in OR/MS problems. In a departure from other work on data-driven optimization and reflecting our practical experience with the data available in applications of OR/MS, we consider data consisting, not only of observations of quantities with direct effect on costs/revenues, such as demand or returns, but predominantly of observations of associated auxiliary quantities. The main problem of interest is a conditional stochastic optimization problem, given imperfect observations, where the joint probability distributions that specify the problem are unknown. We demonstrate that our proposed solution methods are generally applicable to a wide range of decision problems. We prove that they are computationally tractable and asymptotically optimal under mild conditions even when data is not independent and identically distributed (iid) and even for censored observations. As an analogue to the coefficient of determination $R^2$, we develop a metric $P$ termed the coefficient of prescriptiveness to measure the prescriptive content of data and the efficacy of a policy from an operations perspective. To demonstrate the power of our approach in a real-world setting we study an inventory management problem faced by the distribution arm of an international media conglomerate, which ships an average of 1 billion units per year. We leverage both internal data and public online data harvested from IMDb, Rotten Tomatoes, and Google to prescribe operational decisions that outperform baseline measures. Specifically, the data we collect, leveraged by our methods, accounts for an 88% improvement as measured by our coefficient of prescriptiveness.
The continuum-of-urns scheme, generalized beta and Indian buffet processes, and hierarchies thereof
We describe the combinatorial stochastic process underlying a sequence of conditionally independent Bernoulli processes with a shared beta process hazard measure. As shown by Thibaux and Jordan [TJ07], in the special case when the underlying beta process has a constant concentration function and a finite and nonatomic mean, the combinatorial structure is that of the Indian buffet process (IBP) introduced by Griffiths and Ghahramani [GG05]. By reinterpreting the beta process introduced by Hjort [Hjo90] as a measurable family of Dirichlet processes, we obtain a simple predictive rule for the general case, which can be thought of as a continuum of Blackwell-MacQueen urn schemes (or equivalently, one-parameter Hoppe urn schemes). The corresponding measurable family of Perman-Pitman-Yor processes leads to a continuum of two-parameter Hoppe urn schemes, whose ordinary component is the three-parameter IBP introduced by Teh and G\"or\"ur [TG09], which exhibits power-law behavior, as further studied by Broderick, Jordan, and Pitman [BJP12]. The idea extends to arbitrary measurable families of exchangeable partition probability functions and gives rise to generalizations of the beta process with matching buffet processes. Finally, in the same way that hierarchies of Dirichlet processes were given Chinese restaurant franchise representations by Teh, Jordan, Beal, and Blei [Teh+06], one can construct representations of sequences of Bernoulli processes directed by hierarchies of beta processes (and their generalizations) using the stochastic process we uncover.
Beta-Negative Binomial Process and Exchangeable ๏ฟผRandom Partitions for Mixed-Membership Modeling
The beta-negative binomial process (BNBP), an integer-valued stochastic process, is employed to partition a count vector into a latent random count matrix. As the marginal probability distribution of the BNBP that governs the exchangeable random partitions of grouped data has not yet been developed, current inference for the BNBP has to truncate the number of atoms of the beta process. This paper introduces an exchangeable partition probability function to explicitly describe how the BNBP clusters the data points of each group into a random number of exchangeable partitions, which are shared across all the groups. A fully collapsed Gibbs sampler is developed for the BNBP, leading to a novel nonparametric Bayesian topic model that is distinct from existing ones, with simple implementation, fast convergence, good mixing, and state-of-the-art predictive performance.
Bayesian Sampling Using Stochastic Gradient Thermostats
Ding, Nan, Fang, Youhan, Babbush, Ryan, Chen, Changyou, Skeel, Robert D., Neven, Hartmut
Dynamics-based sampling methods, such as Hybrid Monte Carlo (HMC) and Langevin dynamics (LD), are commonly used to sample target distributions. Recently, such approaches have been combined with stochastic gradient techniques to increase sampling efficiency when dealing with large datasets. An outstanding problem with this approach is that the stochastic gradient introduces an unknown amount of noise which can prevent proper sampling after discretization. To remedy this problem, we show that one can leverage a small number of additional variables in order to stabilize momentum fluctuations induced by the unknown noise. Our method is inspired by the idea of a thermostat in statistical physics and is justified by a general theory.
Feature Cross-Substitution in Adversarial Classification
The success of machine learning, particularly in supervised settings, has led to numerous attempts to apply it in adversarial settings such as spam and malware detection. The core challenge in this class of applications is that adversaries are not static data generators, but make a deliberate effort to evade the classifiers deployed to detect them. We investigate both the problem of modeling the objectives of such adversaries, as well as the algorithmic problem of accounting for rational, objective-driven adversaries. In particular, we demonstrate severe shortcomings of feature reduction in adversarial settings using several natural adversarial objective functions, an observation that is particularly pronounced when the adversary is able to substitute across similar features (for example, replace words with synonyms or replace letters in words). We offer a simple heuristic method for making learning more robust to feature cross-substitution attacks. We then present a more general approach based on mixed-integer linear programming with constraint generation, which implicitly trades off overfitting and feature selection in an adversarial setting using a sparse regularizer along with an evasion model. Our approach is the first method for combining an adversarial classification algorithm with a very general class of models of adversarial classifier evasion. We show that our algorithmic approach significantly outperforms state-of-the-art alternatives.
Beta-Negative Binomial Process and Exchangeable Random Partitions for Mixed-Membership Modeling
The beta-negative binomial process (BNBP), an integer-valued stochastic process, is employed to partition a count vector into a latent random count matrix. As the marginal probability distribution of the BNBP that governs the exchangeable random partitions of grouped data has not yet been developed, current inference for the BNBP has to truncate the number of atoms of the beta process. This paper introduces an exchangeable partition probability function to explicitly describe how the BNBP clusters the data points of each group into a random number of exchangeable partitions, which are shared across all the groups. A fully collapsed Gibbs sampler is developed for the BNBP, leading to a novel nonparametric Bayesian topic model that is distinct from existing ones, with simple implementation, fast convergence, good mixing, and state-of-the-art predictive performance.