Goto

Collaborating Authors

 Machine Learning


Supplementary Material for VDE and GCFN A Theoretical Details and Proofs denotes expectation with respect to the density q and E

Neural Information Processing Systems

Notation We use the expectation operator in different contexts in the proof. When the density function or the random variable are clear from the context, we drop the subscript and use E. A.1 The general IV causal graph with covariates/observed confounders This is sometimes called a conditional instrument. All our proofs and results carry over to the situation with covariates after conditioning all estimables and distributions on x. We derive the lower bound for the general case where there are both observed and unobserved confounders. A simple lower bound can be obtained by using H(แบ‘ | ษ›, x) H(แบ‘ | ษ›, t, x), but this cannot be made tight unless ษ› completely determines t. Therefore, we cannot guarantee independence unless the data at hand is not confounded.


General Control Functions for Causal Effect Estimation from Instrumental Variables

Neural Information Processing Systems

Causal effect estimation relies on separating the variation in the outcome into parts due to the treatment and due to the confounders. To achieve this separation, practitioners often use external sources of randomness that only influence the treatment called instrumental variables (IVs). We study variables constructed from treatment and IV that help estimate effects, called control functions. We characterize general control functions for effect estimation in a meta-identification result. Then, we show that structural assumptions on the treatment process allow the construction of general control functions, thereby guaranteeing identification.


Supplement: Novel Upper Bounds for the Constrained Most Probable Explanation Task

Neural Information Processing Systems

It is well known that any MPE task can be encoded as an integer linear programming (ILP) problem (cf. A popular or widely used formulation is to associate a Boolean variable with each entry in each function of the log-linear model. When the Boolean variable is assigned the value 1, the entry is selected, otherwise it is not. For instance, a type of consistency constraint encodes the restriction that only entry from each function must be selected. A second type of consistency constraint ensures that if two functions share a variable then only entries which assign the shared variable to the same value are selected.


Novel Upper Bounds for the Constrained Most Probable Explanation Task

Neural Information Processing Systems

We propose several schemes for upper bounding the optimal value of the constrained most probable explanation (CMPE) problem. Given a set of discrete random variables, two probabilistic graphical models defined over them and a real number q, this problem involves finding an assignment of values to all the variables such that the probability of the assignment is maximized according to the first model and is bounded by q w.r.t. the second model. In prior work, it was shown that CMPE is a unifying problem with several applications and special cases including the nearest assignment problem, the decision preserving most probable explanation task and robust estimation. It was also shown that CMPE is NP-hard even on tractable models such as bounded treewidth networks and is hard for integer linear programming methods because it includes a dense global constraint. The main idea in our approach is to simplify the problem via Lagrange relaxation and decomposition to yield either a knapsack problem or the unconstrained most probable explanation (MPE) problem, and then solving the two problems, respectively using specialized knapsack algorithms and mini-buckets based upper bounding schemes. We evaluate our proposed scheme along several dimensions including quality of the bounds and computation time required on various benchmark graphical models and how it can be used to find heuristic, near-optimal feasible solutions in an example application pertaining to robust estimation and adversarial attacks on classifiers.


Supplementary material for Regret Bounds for Classification in Sparse Label Regimes

Neural Information Processing Systems

This appendix contains all proofs of the results mentioned in the main body of the paper, plus further results which have been omitted there due to space limits. We recall the following lemma which upper bounds the probability measure of the ball around a point x X that contains its kth nearest neighbors. The proof immediately follows from the multiplicative Chernoff bound (see, e.g., Lemma 3.2 in [28]). Corollary A.2. Suppose that the measure-smoothness assumption (Assumption 5.1) holds with parameters ฮป, C The proof of Theorem 5.2 is split into a series of technical lemmas. This result is in Lemma B.1.


Regret Bounds for Multilabel Classification in Sparse Label Regimes

Neural Information Processing Systems

Multi-label classification (MLC) has wide practical importance, but the theoretical understanding of its statistical properties is still limited. As an attempt to fill this gap, we thoroughly study upper and lower regret bounds for two canonical MLC performance measures, Hamming loss and Precision@ฮบ. We consider two different statistical and algorithmic settings, a non-parametric setting tackled by plug-in classifiers ร  la k-nearest neighbors, and a parametric one tackled by empirical risk minimization operating on surrogate loss functions. For both, we analyze the interplay between a natural MLC variant of the low noise assumption, widely studied in binary classification, and the label sparsity, the latter being a natural property of large-scale MLC problems. We show that those conditions are crucial in improving the bounds, but the way they are tangled is not obvious, and also different across the two settings.




A Dual Form of Bregman Momentum The dual form of Bregman momentum given in (10) can be obtained by first forming the dual Bregman divergence in terms of the dual variables w (t) and w

Neural Information Processing Systems

We first provide a proof for Proposition 1. Then, we prove Theorem 3. Proposition 1. Theorem 3. The constrained CMD update (14) coincides with the reparameterized projected gradient update on the composite loss, The rest of the proof follows similarly by solving for (t) and rearranging the terms. Finally, applying the results of Theorem 2 concludes the proof. In this section, we discuss different strategies for discretizing the CMD updates and provide examples for each case. The most straight-forward discretization of the unconstrained CMD update (1) is the forward Euler (i.e.


Reparameterizing Mirror Descent as Gradient Descent

Neural Information Processing Systems

Most of the recent successful applications of neural networks have been based on training with gradient descent updates. However, for some small networks, other mirror descent updates learn provably more efficiently when the target is sparse. We present a general framework for casting a mirror descent update as a gradient descent update on a different set of parameters. In some cases, the mirror descent reparameterization can be described as training a modified network with standard backpropagation. The reparameterization framework is versatile and covers a wide range of mirror descent updates, even cases where the domain is constrained. Our construction for the reparameterization argument is done for the continuous versions of the updates. Finding general criteria for the discrete versions to closely track their continuous counterparts remains an interesting open problem.