Goto

Collaborating Authors

 deletion


Shape of Memory: a Geometric Analysis of Machine Unlearning in Second-Order Optimizers

arXiv.org Machine Learning

We argue that current definitions of machine unlearning are underspecified for second-order optimizers. We compare first-order and second-order learners for their ability to handle the data deletion task with varying degrees of eigendecomposition to mimic the loss model memory. While both first and second-order methods realign with the ideal counterfactul in terms of performance and gradient, the second-order optimizer shows significant volatility in the optimizer state. This indicates residual information, supposedly deleted, that isn't detectable by first-order analysis. Various eigendecay treatments show that stability and information loss is regained only under controlled state pertubation where geometric information (or memory) is erased.


Faster Query Times for Fully Dynamic k-Center Clustering with Outliers

Neural Information Processing Systems

Given a point set P M from a metric space (M,d)and numbers k,z N, the metric k-center problem with z outliers is to find a set C P of k points such that the maximum distance of all but at most z outlier points of P to their nearest center in C is minimized. We consider this problem in the fully dynamic model, i.e., under insertions and deletions of points, for the case that the metric space has a bounded doubling dimension dim. We utilize a hierarchical data structure to maintain the points and their neighborhoods, which enables us to efficiently find the clusters. In particular, our data structure can be queried at any time to generate a (3 + ε)-approximate solution for input values of k and z in worst-case query time ε O(dim)klognloglog, where is the ratio between the maximum and minimum distance between two points in P. Moreover, it allows insertion/deletion of a point in worst-case update time ε O(dim) lognlog . Our result achieves a significantly faster query time with respect to k and z than the current state-of-theart by Pellizzoni, Pietracaprina, and Pucci [18], which uses ε O(dim)(k+z)2 log query time to obtain a (3+ε)-approximate solution.


1305_making_sense_of_dependence_eff

Neural Information Processing Systems

In this part, we state the orthogonal decomposition Property, motivate its importance with a pedagogical example, and finally prove Proposition 1, which enables the decomposition property in the context of HSIC attribution method. A.1 Orthogonal Decomposition Property Let x = {x1,..., xn}2Xn be a set of n univariate random input variables. For any subset A = {l1,...,l |A|} { 1,...,n}, we denote xA =( xl1,..., xl|A|) the vector of input variables with indices in A. Let y the random output variable defined by y = f(x), F the RKHS defined by the kernel kA: X|A|! R and G the RKHS defined by the kernel l: Y! R. In [11], the author shows that for any choice of kernel l, if we respect some constraints on the kernel kA, we can construct indices HSIC (xA,y) that satisfy the following decomposition property. The constraints on the kernel kA are detailed in the main document and in the last section of this appendix.


Counting Distinct Elements in the Turnstile Model with Differential Privacy under Continual Observation

Neural Information Processing Systems

Privacy is a central challenge for systems that learn from sensitive data sets, especially when a system's outputs must be continuously updated to reflect changing data. We consider the achievable error for differentially private continual release of a basic statistic--the number of distinct items--in a stream where items may be both inserted and deleted (the turnstile model). With only insertions, existing algorithms have additive error just polylogarithmic in the length of the stream T. We uncover a much richer landscape in the turnstile model, even without considering memory restrictions. We show that every differentially private mechanism that handles insertions and deletions has worst-case additive error at least T1/4 even under a relatively weak, event-level privacy definition. Then, we identify a parameter of the input stream, its maximum flippancy, that is low for natural data streams and for which we give tight parameterized error guarantees. Specifically, the maximum flippancy is the largest number of times that the contribution of a single item to the distinct elements count changes over the course of the stream. We present an item-level differentially private mechanism that, for all turnstile streams with maximum flippancy w, continually outputs the number of distinct elements with an O( p w polylogT) additive error, without requiring prior knowledge of w. We prove that this is the best achievable error bound that depends only on w, for a large range of values of w. When w is small, the error of our mechanism is similar to the polylogarithmic in T error in the insertion-only setting, bypassing the hardness in the turnstile model.


Graph Edit Distance with General Costs Using Neural Set Divergence

Neural Information Processing Systems

Graph Edit Distance (GED) measures the (dis-)similarity between two given graphs in terms of the minimum-cost edit sequence, which transforms one graph to the other.GED is related to other notions of graph similarity, such as graph and subgraph isomorphism, maximum common subgraph, etc. However, the computation of exact GED is NP-Hard, which has recently motivated the design of neural models for GED estimation.However, they do not explicitly account for edit operations with different costs. In response, we propose $\texttt{GraphEdX}$, a neural GED estimator that can work with general costs specified for the four edit operations, viz., edge deletion, edge addition, node deletion, and node addition.We first present GED as a quadratic assignment problem (QAP) that incorporates these four costs.Then, we represent each graph as a set of node and edge embeddings and use them to design a family of neural set divergence surrogates. We replace the QAP terms corresponding to each operation with their surrogates. Computing such neural set divergence requires aligning nodes and edges of the two graphs.We learn these alignments using a Gumbel-Sinkhorn permutation generator, additionally ensuring that the node and edge alignments are consistent with each other. Moreover, these alignments are cognizant of both the presence and absence of edges between node pairs.Through extensive experiments on several datasets, along with a variety of edit cost settings, we show that $\texttt{GraphEdX}$ consistently outperforms state-of-the-art methods and heuristics in terms of prediction error.


Edit Distance Robust Watermarks via Indexing Pseudorandom Codes

Neural Information Processing Systems

Motivated by the problem of detecting AI-generated text, we consider the problem of watermarking the output of language models with provable guarantees. We aim for watermarks which satisfy: (a) undetectability, a cryptographic notion introduced by Christ, Gunn, & Zamir (2023) which stipulates that it is computationally hard to distinguish watermarked language model outputs from the model's actual output distribution; and (b) robustness to channels which introduce a constant fraction of adversarial insertions, substitutions, and deletions to the watermarked text. Earlier schemes could only handle stochastic substitutions and deletions, and thus we are aiming for a more natural and appealing robustness guarantee that holds with respect to edit distance. Our main result is a watermarking scheme which achieves both (a) and (b) when the alphabet size for the language model is allowed to grow as a polynomial in the security parameter. To derive such a scheme, we follow an approach introduced by Christ & Gunn (2024), which proceeds via first constructing pseudorandom codes satisfying undetectability and robustness properties analogous to those above; our codes have the additional benefit of relying on weaker computational assumptions than used in previous work. Then we show that there is a generic transformation from such codes over large alphabets to watermarking schemes for arbitrary language models.


ea3502c3594588f0e9d5142f99c66627-Supplemental.pdf

Neural Information Processing Systems

In this document we provide supplementary materials that we are not able to fit into the main manuscriptduetothepagelimit. The dimensions of the hidden features of the three-layer GCN are set toF, F/2, and F respectively. The dataset is separated into ten parts. We generate ten validation accuracy curves when setting each of parts as the validation one. The ten curves are then averaged.