Mathematical & Statistical Methods
Fairness constraints can help exact inference in structured prediction
Many inference problems in structured prediction can be modeled as maximizing a score function on a space of labels, where graphs are a natural representation to decompose the total score into a sum of unary (nodes) and pairwise (edges) scores. Given a generative model with an undirected connected graph G and true vector of binary labels y, it has been previously shown that when G has good expansion properties, such as complete graphs or d-regular expanders, one can exactly recover y (with high probability and in polynomial time) from a single noisy observation of each edge and node. We analyze the previously studied generative model by Globerson et al. (2015) under a notion of statistical parity. That is, given a fair binary node labeling, we ask the question whether it is possible to recover the fair assignment, with high probability and in polynomial time, from single edge and node observations. We find that, in contrast to the known trade-offs between fairness and model performance, the addition of the fairness constraint improves the probability of exact recovery. We effectively explain this phenomenon and empirically show how graphs with poor expansion properties, such as grids, are now capable of achieving exact recovery. Finally, as a byproduct of our analysis, we provide a tighter minimum-eigenvalue bound than that which can be derived from Weyl's inequality.
Dynamical mean-field theory for stochastic gradient descent in Gaussian mixture classification Francesca Mignacco
We analyze in a closed form the learning dynamics of stochastic gradient descent (SGD) for a single layer neural network classifying a high-dimensional Gaussian mixture where each cluster is assigned one of two labels. This problem provides a prototype of a non-convex loss landscape with interpolating regimes and a large generalization gap. We define a particular stochastic process for which SGD can be extended to a continuous-time limit that we call stochastic gradient flow. In the full-batch limit, we recover the standard gradient flow. We apply dynamical mean field theory from statistical physics to track the dynamics of the algorithm in the high-dimensional limit via a self-consistent stochastic process. We explore the performance of the algorithm as a function of control parameters shedding light on how it navigates the loss landscape.
On Differentially Private U Statistics
Without privacy constraints, the standard estimators for this task are U-statistics, which commonly arise in a wide range of problems, including nonparametric signed rank tests, symmetry testing, uniformity testing, and subgraph counts in random networks, and are the unique minimum variance unbiased estimators under mild conditions. Despite the recent outpouring of interest in private mean estimation, privatizing U-statistics has received little attention. While existing private mean estimation algorithms can be applied in a black-box manner to obtain confidence intervals, we show that they can lead to suboptimal private error, e.g., constant-factor inflation in the leading term, or even Θ(1/n) rather than O(1/n
Estimation of Skill Distribution from a Tournament
In this paper, we study the problem of learning the skill distribution of a population of agents from observations of pairwise games in a tournament. These games are played among randomly drawn agents from the population. The agents in our model can be individuals, sports teams, or Wall Street fund managers. Formally, we postulate that the likelihoods of outcomes of games are governed by the parametric Bradley-Terry-Luce (or multinomial logit) model, where the probability of an agent beating another is the ratio between its skill level and the pairwise sum of skill levels, and the skill parameters are drawn from an unknown, non-parametric skill density of interest. The problem is, in essence, to learn a distribution from noisy, quantized observations.
Supplementary Material: Meta-Learning Stationary Stochastic Process Prediction with Convolutional Neural Processes
We first review the notation introduced in the main body for convenience. S denote a context and target set respectively. Later, as is common in recent meta-learning approaches, we will consider predicting the target set from the context set Garnelo et al. [3, 4]. The measurable sets of Σ are those which can be specified by the values of the function at a countable subset I X of its input locations. Since in practice we only ever observe data at a finite number of points, this is sufficient for our purposes. Hence we may think of these stochastic processes as defined by their finite-dimensional marginals. We now define what it means to condition on observations of the stochastic process P. Let p(y|X) denote the density with respect to Lebesgue measure of the finite marginal of P with index set X (we assume these densities always exist). Strictly speaking, this is non-standard terminology, since P is the law of a stochastic process.
Modeling Continuous Stochastic Processes with Dynamic Normalizing Flows Supplementary Materials Marcus A. Brubaker
Equation 7 in Section 4 is the log density of the distribution obtained by applying the normalizing flow models to the finite-dimensional distribution of Wiener process on a given time grid. We refer the reader to Chapter 2 of [5] for more details. We drop the subscript of π for the simplicity of notation. We base the justification on the following two propositions. Work developed during an internship at Borealis AI. We describe the details on synthetic dataset generation, real-world dataset pre-processing, model architecture as well as training and evaluation settings in this section.
Modeling Continuous Stochastic Processes with Dynamic Normalizing Flows Marcus A. Brubaker
Normalizing flows transform a simple base distribution into a complex target distribution and have proved to be powerful models for data generation and density estimation. In this work, we propose a novel type of normalizing flow driven by a differential deformation of the Wiener process. As a result, we obtain a rich time series model whose observable process inherits many of the appealing properties of its base process, such as efficient computation of likelihoods and marginals. Furthermore, our continuous treatment provides a natural framework for irregular time series with an independent arrival process, including straightforward interpolation. We illustrate the desirable properties of the proposed model on popular stochastic processes and demonstrate its superior flexibility to variational RNN and latent ODE baselines in a series of experiments on synthetic and realworld data.