Europe
Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization
The promising zero-shot generalization of vision-language models such as CLIP has led to their adoption using prompt learning for numerous downstream tasks. Previous works have shown test-time prompt tuning using entropy minimization to adapt text prompts for unseen domains. While effective, this overlooks the key cause for performance degradation to unseen domains - distribution shift. In this work, we explicitly handle this problem by aligning the out-of-distribution (OOD) test sample statistics to those of the source data using prompt tuning. We use a single test sample to adapt multi-modal prompts at test time by minimizing the feature distribution shift to bridge the gap in the test domain. Evaluating against the domain generalization benchmark, our method improves zero-shot top1 accuracy beyond existing prompt-learning techniques, with a 3.08%improvement over the baseline MaPLe. In cross-dataset generalization with unseen categories across 10 datasets, our method improves consistently across all datasets compared to the existing state-of-the-art.
DAC-DETR: Divide the Attention Layers and Conquer
This paper reveals a characteristic of DEtection Transformer (DETR) that negatively impacts its training efficacy, i.e., the cross-attention and self-attention layers in DETR decoder have opposing impacts on the object queries (though both impacts are important). Specifically, we observe the cross-attention tends to gather multiple queries around the same object, while the self-attention disperses these queries far away. To improve the training efficacy, we propose a Divide-And-Conquer DETR (DAC-DETR) that separates out the cross-attention to avoid these competing objectives. During training, DAC-DETR employs an auxiliary decoder that focuses on learning the cross-attention layers. The auxiliary decoder, while sharing all the other parameters, has NO self-attention layers and employs one-to-many label assignment to improve the gathering effect. Experiments show that DAC-DETR brings remarkable improvement over popular DETRs. For example, under the 12 epochs training scheme on MS-COCO, DAC-DETR improves Deformable DETR (ResNet50) by +3.4AP and achieves 50.9 (ResNet-50) / 58.1 AP (Swin-Large) based on some popular methods (i.e., DINO and an IoU-related loss).
On the Complexity of Differentially Private Best-Arm Identification with Fixed Confidence
Best Arm Identification (BAI) problems are progressively used for data-sensitive applications, such as designing adaptive clinical trials, tuning hyper-parameters, and conducting user studies to name a few. Motivated by the data privacy concerns invoked by these applications, we study the problem of BAI with fixed confidence under ฯต-global Differential Privacy (DP). First, to quantify the cost of privacy, we derive a lower bound on the sample complexity of any ฮด-correct BAI algorithm satisfying ฯต-global DP. Our lower bound suggests the existence of two privacy regimes depending on the privacy budget ฯต. In the high-privacy regime (small ฯต), the hardness depends on a coupled effect of privacy and a novel informationtheoretic quantity, called the Total Variation Characteristic Time.
Learning Rate Free Sampling in Constrained Domains
We introduce a suite of new particle-based algorithms for sampling in constrained domains which are entirely learning rate free. Our approach leverages coin betting ideas from convex optimisation, and the viewpoint of constrained sampling as a mirrored optimisation problem on the space of probability measures. Based on this viewpoint, we also introduce a unifying framework for several existing constrained sampling algorithms, including mirrored Langevin dynamics and mirrored Stein variational gradient descent. We demonstrate the performance of our algorithms on a range of numerical examples, including sampling from targets on the simplex, sampling with fairness constraints, and constrained sampling problems in postselection inference. Our results indicate that our algorithms achieve competitive performance with existing constrained sampling methods, without the need to tune any hyperparameters.
Transfer Learning on Heterogeneous Feature Spaces for Treatment Effects Estimation
Consider the problem of improving the estimation of conditional average treatment effects (CATE) for a target domain of interest by leveraging related information from a source domain with a different feature space. This heterogeneous transfer learning problem for CATE estimation is ubiquitous in areas such as healthcare where we may wish to evaluate the effectiveness of a treatment for a new patient population for which different clinical covariates and limited data are available. In this paper, we address this problem by introducing several building blocks that use representation learning to handle the heterogeneous feature spaces and a flexible multi-task architecture with shared and private layers to transfer information between potential outcome functions across domains. Then, we show how these building blocks can be used to recover transfer learning equivalents of the standard CATE learners. On a new semi-synthetic data simulation benchmark for heterogeneous transfer learning we not only demonstrate performance improvements of our heterogeneous transfer causal effect learners across datasets, but also provide insights into the differences between these learners from a transfer perspective.
5fc47800ee5b30b8777fdd30abcaaf3b-Supplemental-Conference.pdf
Having defined and validated the pairwise feedback simulator and evaluations in AlpacaFarm, we569 now turn our attention to studying methods that learn from pairwise feedback on AlpacaFarm.570 Unfortunately, the lack of existing benchmarks for learning from pairwise feedback for instruction571 following means that there has not been any open study of these methods in the instruction-following572 setting. In the remainder of this section, we will introduce our reference methods, which fall into two575 categories based on whether they fit a surrogate reward model as part of the learning process.576 FeedME is a method proposed by OpenAI [45] that incorporates human feedback578 with supervised fine-tuning on model generations that are rated 7/7 by human labelers. We adapt579 this approach to the pairwise feedback setting and call this baseline binary FeedME. This approach580 fine-tunes the SFT model on the chosen response in each preference pair with supervised learning.581 Motivated by controllable generation through conditioning [27, 34,582 29, 21], we propose binary reward conditioning, a baseline method that fine-tunes the SFT model583 with the feedback data Dpairwise by conditioning instances with either a positive or negative control584 token. Specifically, for each instance (x,y0,y1,z) 2D pairwise, the string concatenation of instruction585 x and response yz denoted as [x,yz] is prepended with the positive token and used in supervised586 fine-tuning (similarly [x,y1 z]is prepended with the negative token). This process creates a modified587 demonstration dataset that is double the size of Dpairwise. At test time, we draw samples from the588 fine-tuned model conditioned on the positive token.589 A.2 Methods that optimize a surrogate reward function590 We now describe methods that incorporate feedback by first building a surrogate reward model with591 pairwise feedback data. To start, we describe the step of training the surrogate reward model.592 While this can be a powerful approach,596 we will see that it can also lead to over-optimization [19] where models learn to exploit the reward597 model rather than achieve high true reward. We now describe 4 methods that leverage the surrogate598 reward model.599
Retrieval-Augmented Diffusion Models
Novel architectures have recently improved generative image synthesis leading to excellent visual quality in various tasks. Much of this success is due to the scalability of these architectures and hence caused by a dramatic increase in model complexity and in the computational resources invested in training these models.