Goto

Collaborating Authors

 different value


f8928b073ccbec15d35f2a9d39430bfd-Supplemental-Conference.pdf

Neural Information Processing Systems

Our experiments in Section 3 and Section 4 were conducted with an adversary who has side informa-684 tion about the target point. Here, we reduce the amount of background knowledge the adversary has685 about the target, and measure how this affects the reconstruction upper bound and attack success.686 We do this in the following set-up: Given a target z, we initialize our reconstruction from uniform687 noise and optimize with the gradient-based reconstruction attack introduced in Section 2 to produce688 ห†z.


Hierarchical Clustering: O(1)-Approximation for Well-Clustered Graphs

Neural Information Processing Systems

Hierarchical clustering studies a recursive partition of a data set into clusters of successively smaller size, and is a fundamental problem in data analysis. In this work we study the cost function for hierarchical clustering introduced by Dasgupta [12], and present two polynomial-time approximation algorithms: Our first result is an O(1)-approximation algorithm for graphs of high conductance. Our simple construction bypasses complicated recursive routines of finding sparse cuts known in the literature (e.g., [6, 11]). Our second and main result is an O(1)approximation algorithm for a wide family of graphs that exhibit a well-defined structure of clusters. This result generalises the previous state-of-the-art [10], which holds only for graphs generated from stochastic models. The significance of our work is demonstrated by the empirical analysis on both synthetic and real-world data sets, on which our presented algorithm outperforms the previously proposed algorithm for graphs with a well-defined cluster structure [10].


8 max

Neural Information Processing Systems

We proceed to show the sparsistency510 of the estimated parameters. First, suppose that ฮ˜ t;ij 6= 0 for some time tand index (i,j). Due to 0 < ฮณ < 1, the above inequality implies that bฮ˜t;ij = 0521 for every t and (i,j) 6 St, and bฮ˜t;ij bฮ˜t 1;ij = 0 for every t > 0 and (i,j) 6 Dt. The proof is inspired527 by Corollary 1 in [47]. First, we present the following key lemmas.528



CRT_NIPS22

Neural Information Processing Systems

Following from the discussion in Section 3.1, we want to maximize E [zy (x+)]. B.1 Higher Noise Level In the main paper, we conduct experiments on CIFAR-10 using noise level =0 .25 only. Here, we report our main set of results on CIFAR-10 (Table 3) using higher values. In Table 8, we report results using =0 .5 and in Table 9, we report results using =1 .0. B.2 Using ViT [6] In the main paper, we used Convolutional Neural Network (CNN) based architectures.


Table 4 Selected learning rates for all methods . Method Learning rate

Neural Information Processing Systems

Datasets We run all experiments on the standard GLUE benchmark [18] with Creative Commons license (CCBY 4.0) and the SUPERGLUE benchmark [19]. Low-resource fine-tuning For the experiment conducted in 5.6, we set the number of epochs to 1000, 200, 100, 50, 25, for datasets subsampled to size 100, 500, 1000, 2000, and 4000 respectively. Based on our results, this is sufficient to allow the models to converge. We save a checkpoint every 250 steps for all models and report the results for the hyper-parameters performing the best on the validation set for each task. Data pre-processing: Following Raffel et al. [3], we cast all datasets into a sequence-to-sequence format.



Joint M-Best-Diverse Labelings as a Parametric Submodular Minimization

Neural Information Processing Systems

We consider the problem of jointly inferring the $M$-best diverse labelings for a binary (high-order) submodular energy of a graphical model. Recently, it was shown that this problem can be solved to a global optimum, for many practically interesting diversity measures. It was noted that the labelings are, so-called, nested. This nestedness property also holds for labelings of a class of parametric submodular minimization problems, where different values of the global parameter $\gamma$ give rise to different solutions. The popular example of the parametric submodular minimization is the monotonic parametric max-flow problem, which is also widely used for computing multiple labelings.