classifier
Unity by Diversity: Improved Representation Learning for Multimodal VAEs
Variational Autoencoders for multimodal data hold promise for many tasks in data analysis, such as representation learning, conditional generation, and imputation. Current architectures either share the encoder output, decoder input, or both across modalities to learn a shared representation. Such architectures impose hard constraints on the model. In this work, we show that a better latent representation can be obtained by replacing these hard constraints with a soft constraint. We propose a new mixture-of-experts prior, softly guiding each modality's latent representation towards a shared aggregate posterior. This approach results in a superior latent representation and allows each encoding to preserve information better from its uncompressed original features. In extensive experiments on multiple benchmark datasets and two challenging real-world datasets, we show improved learned latent representations and imputation of missing data modalities compared to existing methods.
02bf86214e264535e3412283e817deaa-AuthorFeedback.pdf
We thank the reviewers for their insightful feedback, and we appreciate the opportunity to improve our paper. We would like to emphasize that Theorem 1 is the most important contribution of our paper due to its generality. In the Gaussian case, our sample complexity result follows directly from the expression for the optimal loss. Response to Reviewer 2: We thank the reviewer for pointing us to Dohmatob's "Generalized No Free Lunch Theorem Finally, while Dohmatob's bounds become non-trivial only when the adversarial We will also add a clearer description of the "translate and pair in place" coupling. Comparisons with Sinha et al. are in Section 7 and we compare to Dohmatob above.
Neural Concept Binder Antonia Wüst
The challenge in object-based visual reasoning lies in generating concept representations that are both descriptive and distinct. Achieving this in an unsupervised manner requires human users to understand the model's learned concepts and, if necessary, revise incorrect ones. To address this challenge, we introduce the Neural Concept Binder (NCB), a novel framework for deriving both discrete and continuous concept representations, which we refer to as "concept-slot encodings". NCB employs two types of binding: "soft binding", which leverages the recent SysBinder mechanism to obtain object-factor encodings, and subsequent "hard binding", achieved through hierarchical clustering and retrieval-based inference. This enables obtaining expressive, discrete representations from unlabeled images. Moreover, the structured nature of NCB's concept representations allows for intuitive inspection and the straightforward integration of external knowledge, such as human input or insights from other AI models like GPT-4. Additionally, we demonstrate that incorporating the hard binding mechanism preserves model performance while enabling seamless integration into both neural and symbolic modules for complex reasoning tasks. We validate the effectiveness of NCB through evaluations on our newly introduced CLEVR-Sudoku dataset.
Building a stable classifier with the inflated argmax
We propose a new framework for algorithmic stability in the context of multiclass classification. In practice, classification algorithms often operate by first assigning a continuous score (for instance, an estimated probability) to each possible label, then taking the maximizer--i.e., selecting the class that has the highest score. A drawback of this type of approach is that it is inherently unstable, meaning that it is very sensitive to slight perturbations of the training data, since taking the maximizer is discontinuous. Motivated by this challenge, we propose a pipeline for constructing stable classifiers from data, using bagging (i.e., resampling and averaging) to produce stable continuous scores, and then using a stable relaxation of argmax, which we call the "inflated argmax", to convert these scores to a set of candidate labels. The resulting stability guarantee places no distributional assumptions on the data, does not depend on the number of classes or dimensionality of the covariates, and holds for any base classifier. Using a common benchmark data set, we demonstrate that the inflated argmax provides necessary protection against unstable classifiers, without loss of accuracy.
A Proof of Soft Medoid breakdown point
We start with a discussion of some preliminaries for the proofs. In A.2, we build upon Lopuhaä and Rousseeuw [39]'s work to prove Lemma 1. In A.3, we prove Theorem 1. A.1 Preliminaries The adversary can replace m arbitrary points. For a concise notation we simply write that we replace the first m values, but the points come with an arbitrary order beforehand. In case (a), just some values during the derivation change, but the results are essentially the same. For case (b), the number of samples is no longer n.
993edc98ca87f7e08494eec37fa836f7-AuthorFeedback.pdf
Thank you to all the reviewers for their detailed reviews. We address specific concerns below. This amounts to saying, "my classifier should be equally good on all classes, except the extremely rare ones which Reviewer 2 [results from a single dataset] As you point out, dataset availability is a challenge. In short - we found that EGAL's performance degrades to that of the standard approaches as the Reviewer 3 Thank you for picking up those typos - we will correct them in the final draft. We will make this clear in the camera ready.
BoostAdapter: Improving Vision-Language Test-Time Adaptation via Regional Bootstrapping Jinpeng Wang 1
Adaptation of pretrained vision-language models such as CLIP to various downstream tasks have raised great interest in recent researches. Previous works have proposed a variety of test-time adaptation (TTA) methods to achieve strong generalization without any knowledge of the target domain. However, existing trainingrequired TTA approaches like TPT necessitate entropy minimization that involves large computational overhead, while training-free methods like TDA overlook the potential for information mining from the test samples themselves. In this paper, we break down the design of existing popular training-required and training-free TTA methods and bridge the gap between them within our framework. Specifically, we maintain a light-weight key-value memory for feature retrieval from instance-agnostic historical samples and instance-aware boosting samples. The historical samples are filtered from the testing data stream and serve to extract useful information from the target distribution, while the boosting samples are drawn from regional bootstrapping and capture the knowledge of the test sample itself. We theoretically justify the rationality behind our method and empirically verify its effectiveness on both the out-of-distribution and the cross-domain datasets, showcasing its applicability in real-world situations.
Supplementary Materials for " Private Set Generation with Discriminative Information "
These supplementary materials include the privacy analysis ( A), the details of the adopted algorithms ( B), and the details of experiment setup ( C), and additional results and discussions ( D). Our privacy computation is based on the notion of Rényi-DP, which we recall as follows. Lastly, we use the following theorem to convert (α, ε)-RDP to (ε, δ)-DP. We present the pseudocode of the generator prior experiments (Section 6 of the main paper) in Algorithm 2, which is supplementary to Figure 4,5 and Equation 8 of the main paper. While it is possible to allow random sampling of the latent code and generate changeable S to mimic the training of generative models (i.e., train a generative network using the gradient matching loss), we observe that the training easily fails in the early stage.
Private Set Generation with Discriminative Information
Differentially private data generation techniques have become a promising solution to the data privacy challenge -- it enables sharing of data while complying with rigorous privacy guarantees, which is essential for scientific progress in sensitive domains. Unfortunately, restricted by the inherent complexity of modeling highdimensional distributions, existing private generative models are struggling with the utility of synthetic samples. In contrast to existing works that aim at fitting the complete data distribution, we directly optimize for a small set of samples that are representative of the distribution under the supervision of discriminative information from downstream tasks, which is generally an easier task and more suitable for private training. Our work provides an alternative view for differentially private generation of high-dimensional data and introduces a simple yet effective method that greatly improves the sample utility of state-of-the-art approaches.