Instability and Local Minima in GAN Training with Kernel Discriminators
Generative Adversarial Networks (GANs) are a widely-used tool for generative modeling of complex data. Despite their empirical success, the training of GANs is not fully understood due to the min-max optimization of the generator and discriminator. This paper analyzes these joint dynamics when the true samples as well as the generated samples are discrete, finite sets, and the discriminator is kernel-based. A simple yet expressive framework for analyzing training called the Isolated Points Model is introduced. In the proposed model, the distance between true samples greatly exceeds the kernel width, so each generated point is influenced by at most one true point. Our model enables precise characterization of the conditions for convergence, both to good and bad minima. In particular, the analysis explains two common failure modes: (i) an approximate mode collapse and (ii) divergence. Numerical simulations are provided that predictably replicate these behaviors.
Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth Games: Convergence Analysis under Expected Co-coercivity
Two of the most prominent algorithms for solving unconstrained smooth games are the classical stochastic gradient descent-ascent (SGDA) and the recently introduced stochastic consensus optimization (SCO) [Mescheder et al., 2017]. SGDA is known to converge to a stationary point for specific classes of games, but current convergence analyses require a bounded variance assumption. SCO is used successfully for solving large-scale adversarial problems, but its convergence guarantees are limited to its deterministic variant. In this work, we introduce the expected co-coercivity condition, explain its benefits, and provide the first last-iterate convergence guarantees of SGDA and SCO under this condition for solving a class of stochastic variational inequality problems that are potentially non-monotone. We prove linear convergence of both methods to a neighborhood of the solution when they use constant step-size, and we propose insightful stepsize-switching rules to guarantee convergence to the exact solution. In addition, our convergence guarantees hold under the arbitrary sampling paradigm, and as such, we give insights into the complexity of minibatching.
Checklist
For all authors... (a) Do the main claims made in the abstract and introduction accurately reflect the paper's contributions and scope? If you used crowdsourcing or conducted research with human subjects... (a) Did you include the full text of instructions given to participants and screenshots, if applicable? [N/A] (b) Did you describe any potential participant risks, with links to Institutional Review Board (IRB) approvals, if applicable? [N/A] (c) Did you include the estimated hourly wage paid to participants and the total amount spent on participant compensation? Our method proposes to learn efficient data structure for accurate prediction in large-output space. It helps existing large-scale retrieval systems used in various online applications to efficiently produce more accurate results. To the best of our knowledge, this poses no negative impacts on society.
c4ede56bbd98819ae6112b20ac6bf145-AuthorFeedback.pdf
Author Response for: "Inverting Gradients - How easy is it to break privacy in federated learning" General Comments: We thank all reviewers for their valuable feedback and interest in this attack. Some questions arose about the theoretical analysis for fully connected layers. Finally knowledge of the feature representation already enables attacks like Melis et al. This non-uniformity is a significant result for the privacy of gradient batches. Fig.4 of [35] looks better because the attack scenario there is easier.
The Image Local Autoregressive Transformer
Recently, AutoRegressive (AR) models for the whole image generation empowered by transformers have achieved comparable or even better performance compared to Generative Adversarial Networks (GANs). Unfortunately, directly applying such AR models to edit/change local image regions, may suffer from the problems of missing global information, slow inference speed, and information leakage of local guidance. To address these limitations, we propose a novel model - image Local Autoregressive Transformer (iLAT), to better facilitate the locally guided image synthesis. Our iLAT learns the novel local discrete representations, by the newly proposed local autoregressive (LA) transformer of the attention mask and convolution mechanism. Thus iLAT can efficiently synthesize the local image regions by key guidance information. Our iLAT is evaluated on various locally guided image syntheses, such as pose-guided person image synthesis and face editing. Both quantitative and qualitative results show the efficacy of our model.
Explicit Regularisation in Gaussian Noise Injections
We study the regularisation induced in neural networks by Gaussian noise injections (GNIs). Though such injections have been extensively studied when applied to data, there have been few studies on understanding the regularising effect they induce when applied to network activations. Here we derive the explicit regulariser of GNIs, obtained by marginalising out the injected noise, and show that it penalises functions with high-frequency components in the Fourier domain; particularly in layers closer to a neural network's output. We show analytically and empirically that such regularisation produces calibrated classifiers with large classification margins.
SUPPLEMENTARY MATERIAL Deep Reinforcement Learning with Stacked Hierarchical Attention for Text based Games
In the supplementary material, we describe the training details, examples of game interface and interactions used in the paper. We train our model using the Advantage Actor Critic (A2C) method [37] across valid actions. Function to obtain the valid action set is provided by Jericho [20]. Similar to KG-A2C [3], a supervised auxiliary task "valid action prediction" is introduced to assist RL training. You are in attendance at the annual Grue Convention, this year a rather somber affair due to the "adventurer famine" that has gripped gruedom in this isolated corner of the empire.
Algorithm 2 Class prediction and certification, as required for Algorithm 1 Input: Perturbed data x
A.1 Algorithmic details Algorithm 2 supports Algorithm 1 by demonstrating how the class prediction and expectations are calculated. Of note are two minor changes from prior implementations of this certification regime. The first is the addition of the Gumbel-Softmax on line 4, although this step is only required for the'Full' derivative approach. In contrast th'Approximate' techniques able to circumvent this limitation and can be applied directly to the case where the class election is determined by an arg max. Our initial testing revealed that when we employed either Sison-Glaz [38] or Goodman et al. [14] to estimate the multivariate class uncertainties, some Tiny-Imagenet samples devoted more than 95% of their computational time of the process to evaluating the confidence intervals, significantly outweighing even the costly process of model sampling.