Don't Compress Gradients in Random Reshuffling: Compress Gradient Differences

Neural Information Processing Systems

Gradient compression is a popular technique for improving communication complexity of stochastic first-order methods in distributed training of machine learning models. However, the existing works consider only with-replacement sampling of stochastic gradients. In contrast, it is well-known in practice and recently confirmed in theory that stochastic methods based on without-replacement sampling, e.g., Random Reshuffling (


Mind the Gap: A Causal Perspective on Bias Amplification in Prediction & Decision-Making

Neural Information Processing Systems

As society increasingly relies on AI-based tools for decision-making in socially sensitive domains, investigating fairness and equity of such automated systems has become a critical field of inquiry. Most of the literature in fair machine learning focuses on defining and achieving fairness criteria in the context of prediction, while not explicitly focusing on how these predictions may be used later on in the pipeline. For instance, if commonly used criteria, such as independence or sufficiency, are satisfied for a prediction score S used for binary classification, they need not be satisfied after an application of a simple thresholding operation on S (as commonly used in practice). In this paper, we take an important step to address this issue in numerous statistical and causal notions of fairness. We introduce the notion of a margin complement, which measures how much a prediction score S changes due to a thresholding operation.



Stability of Graph Scattering Transforms

Neural Information Processing Systems

Scattering transforms are non-trainable deep convolutional architectures that exploit the multi-scale resolution of a wavelet filter bank to obtain an appropriate representation of data. More importantly, they are proven invariant to translations, and stable to perturbations that are close to translations. This stability property provides the scattering transform with a robustness to small changes in the metric domain of the data. When considering network data, regular convolutions do not hold since the data domain presents an irregular structure given by the network topology. In this work, we extend scattering transforms to network data by using multiresolution graph wavelets, whose computation can be obtained by means of graph convolutions. Furthermore, we prove that the resulting graph scattering transforms are stable to metric perturbations of the underlying network. This renders graph scattering transforms robust to changes on the network topology, making it particularly useful for cases of transfer learning, topology estimation or time-varying graphs.


references included in this response. GST (as R3 remarked) and that (ii) the Diffusion GST is very similar to the method in [W1] which has been compared

Neural Information Processing Systems

We thank all the reviewers and the AC for their time, effort and constructive feedback. First, we will include an explicit comparison with the GST of [W1]. This is required to control the impact that topology changes have on the eigenvectors. Likewise, since Prop. 2 shows stability of the graph The formal assumption in Prop. 3 indicates that all involved graph filters in the multirresolution wavelet bank have to The hypothesis in Prop. 3 will be changed to reflect this. Theorem 1 states that it does not depend on the spectral norm of the graph.


3ce257b311e5acf849992f5a675188e8-AuthorFeedback.pdf

Neural Information Processing Systems

We thank the reviewers for the positive comments and useful feedback. We provide responses to the main comments. Connections to Cotter et al: There are two main differences between our paper and Cotter et al. (2019a;b): Code: We will make Tensorflow code available. We will include a discussion on surrogates in Section 2. Non-Differentiable Constraints with Applications to Fairness, Recall, Churn, and Other Goals.


Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE) Suraj Srinivas Harvard University

Neural Information Processing Systems

CLIP embeddings have demonstrated remarkable performance across a wide range of multimodal applications. However, these high-dimensional, dense vector representations are not easily interpretable, limiting our understanding of the rich structure of CLIP and its use in downstream applications that require transparency. In this work, we show that the semantic structure of CLIP's latent space can be leveraged to provide interpretability, allowing for the decomposition of representations into semantic concepts. We formulate this problem as one of sparse recovery and propose a novel method, Sparse Linear Concept Embeddings (SpLiCE), for transforming CLIP representations into sparse linear combinations of humaninterpretable concepts. Distinct from previous work, SpLiCE is task-agnostic and can be used, without training, to explain and even replace traditional dense CLIP representations, maintaining high downstream performance while significantly improving their interpretability. We also demonstrate significant use cases of SpLiCE representations including detecting spurious correlations and model editing.


3cc697419ea18cc98d525999665cb94a-AuthorFeedback.pdf

Neural Information Processing Systems

We thank the reviewers for their insightful and constructive comments. Reviewer #1 and shared comments. For latent FML we maximize the mutual information (Sec 3.3). We thank the reviewer for this very comprehensive review, which we really appreciate. We agree this point needs to be reinforced.


Differentially Private Distributed Data Summarization under Covariate Shift

Neural Information Processing Systems

We envision Artificial Intelligence marketplaces to be platforms where consumers, with very less data for a target task, can obtain a relevant model by accessing many private data sources with vast number of data samples. One of the key challenges is to construct a training dataset that matches a target task without compromising on privacy of the data sources. To this end, we consider the following distributed data summarizataion problem.