Not enough data to create a plot.
Try a different view from the menu above.
A GAN Variants
We apply top-k training to all of the following GAN variants: DC-GAN [38]: A simple, widely used architecture that uses convolutions and deconvolutions for the critic and the generator. WGAN with Gradient Clipping [1]: Attempts to use an approximate Wasserstein distance as the critic loss by clipping the weights of the critic to bound the gradient of the critic with respect to its inputs. Mode-Seeking GAN [29]: Attempts to generate more diverse images by selecting more samples from under-represented modes of the target distribution. Spectral Normalization GAN [32]: Replaces the gradient penalty with a (loose) bound on the spectral norm of the weight matrices of the critic. The'Held-out Digit' is the digit that was held out of the training set during training and treated as the'anomaly' class.
Causal Temporal Representation Learning with Nonstationary Sparse Transition Xiangchen Song 1 Zijian Li2 Guangyi Chen 1,2
Causal Temporal Representation Learning (Ctrl) methods aim to identify the temporal causal dynamics of complex nonstationary temporal sequences. Despite the success of existing Ctrl methods, they require either directly observing the domain variables or assuming a Markov prior on them. Such requirements limit the application of these methods in real-world scenarios when we do not have such prior knowledge of the domain variables. To address this problem, this work adopts a sparse transition assumption, aligned with intuitive human understanding, and presents identifiability results from a theoretical perspective. In particular, we explore under what conditions on the significance of the variability of the transitions we can build a model to identify the distribution shifts. Based on the theoretical result, we introduce a novel framework, Causal Temporal Representation Learning with Nonstationary Sparse Transition (CtrlNS), designed to leverage the constraints on transition sparsity and conditional independence to reliably identify both distribution shifts and latent factors. Our experimental evaluations on synthetic and real-world datasets demonstrate significant improvements over existing baselines, highlighting the effectiveness of our approach.
Learning to Discuss Strategically: A Case Study on One Night Ultimate Werewolf
Communication is a fundamental aspect of human society, facilitating the exchange of information and beliefs among people. Despite the advancements in large language models (LLMs), recent agents built with these often neglect the control over discussion tactics, which are essential in communication scenarios and games. As a variant of the famous communication game Werewolf, One Night Ultimate Werewolf (ONUW) requires players to develop strategic discussion policies due to the potential role changes that increase the uncertainty and complexity of the game. In this work, we first present the existence of the Perfect Bayesian Equilibria (PBEs) in two scenarios of the ONUW game: one with discussion and one without. The results showcase that the discussion greatly changes players' utilities by affecting their beliefs, emphasizing the significance of discussion tactics. Based on the insights obtained from the analyses, we propose an RL-instructed language agent framework, where a discussion policy trained by reinforcement learning (RL) is employed to determine appropriate discussion tactics to adopt. Our experimental results on several ONUW game settings demonstrate the effectiveness and generalizability of our proposed framework.
A Appendix (H) Y defined as แธก (f) = f(x) foreach f = MAJ(h,, h
Similarly, define another dual space G: a set of functions g: H Y defined as g(x) = h(x) for each h H and each x X. We begin with describing the construction of the adversary U. Let m N; we For any point x X \ Z that is remaining, define U(x) = {}. Let B be an arbitrary reduction algorithm, and let ฮต > 0 be the error requirement. We will now describe the construction of the target class C. The target class C will be constructed randomly. We now state a few properties of the randomly-constructed target class C that we will use in the remainder of the proof.
Reducing Adversarially Robust Learning to Non-Robust PAC Learning
We study the problem of reducing adversarially robust learning to standard PAC learning, i.e. the complexity of learning adversarially robust predictors using access to only a black-box non-robust learner. We give a reduction that can robustly learn any hypothesis class C using any non-robust learner A for C. The number of calls to A depends logarithmically on the number of allowed adversarial perturbations per example, and we give a lower bound showing this is unavoidable.
would like to address all concerns raised. please find some details regarding the proposed methods
We would like to thank all of the reviewers for their valuable time and their constructive comments. Reviewer 1: We will incorporate the proposed minor corrections in the final version of the paper. The two-stage approach, i.e., i) running gradient descent to convergence, and then ii) projection onto sparsity set, On whether support set changes during iterations, we observe that in experiments (subsection 4.1) IHT changes support, Reviewer 2: We thank the reviewer for the supportive and constructive review. Regarding the comment in lines 198-202, we apologize for any confusion. Regarding variance in experiments, we have observed high variance is not enough for the algorithm to get "lucky".
Personalized Federated Learning via Feature Distribution Adaptation
Federated learning (FL) is a distributed learning framework that leverages commonalities between distributed client datasets to train a global model. Under heterogeneous clients, however, FL can fail to produce stable training results. Personalized federated learning (PFL) seeks to address this by learning individual models tailored to each client. One approach is to decompose model training into shared representation learning and personalized classifier training. Nonetheless, previous works struggle to navigate the bias-variance trade-off in classifier learning, relying solely on limited local datasets or introducing costly techniques to improve generalization. In this work, we frame representation learning as a generative modeling task, where representations are trained with a classifier based on the global feature distribution. We then propose an algorithm, pFedFDA, that efficiently generates personalized models by adapting global generative classifiers to their local feature distributions. Through extensive computer vision benchmarks, we demonstrate that our method can adjust to complex distribution shifts with significant improvements over current state-of-the-art in data-scarce settings.
Joint-task Self-supervised Learning for Temporal Correspondence
Xueting Li, Sifei Liu, Shalini De Mello, Xiaolong Wang, Jan Kautz, Ming-Hsuan Yang
This paper proposes to learn reliable dense correspondence from videos in a self-supervised manner. Our learning process integrates two highly related tasks: tracking large image regions and establishing fine-grained pixel-level associations between consecutive video frames. We exploit the synergy between both tasks through a shared inter-frame affinity matrix, which simultaneously models transitions between video frames at both the region-and pixel-levels. While region-level localization helps reduce ambiguities in fine-grained matching by narrowing down search regions; fine-grained matching provides bottom-up features to facilitate region-level localization. Our method outperforms the state-of-the-art self-supervised methods on a variety of visual correspondence tasks, including video-object and part-segmentation propagation, keypoint tracking, and object tracking. Our self-supervised method even surpasses the fully-supervised affinity feature representation obtained from a ResNet-18 pre-trained on the ImageNet. The project website can be found at https://sites.google.com/view/uvc2019/.
140f6969d5213fd0ece03148e62e461e-AuthorFeedback.pdf
The shared affinity matrix bridges these tasks, and facilitates iterative improvements. These contributions are significant in the field of self-supervised learning. The contributions of this work are also demonstrated by our ablation study, i.e., Table 2 in the paper. We note that these components are novel and have not been explored in prior work. Which methods should the work compare with?