pan
Nemesis: Normalizing the Soft-prompt Vectors of Vision-Language Models
Fu, Shuai, Wang, Xiequn, Huang, Qiushi, Zhang, Yu
With the prevalence of large-scale pretrained vision-language models (VLMs), such as CLIP, soft-prompt tuning has become a popular method for adapting these models to various downstream tasks. However, few works delve into the inherent properties of learnable soft-prompt vectors, specifically the impact of their norms to the performance of VLMs. This motivates us to pose an unexplored research question: "Do we need to normalize the soft prompts in VLMs?" To fill this research gap, we first uncover a phenomenon, called the Low-Norm Effect by performing extensive corruption experiments, suggesting that reducing the norms of certain learned prompts occasionally enhances the performance of VLMs, while increasing them often degrades it. To harness this effect, we propose a novel method named Normalizing the soft-prompt vectors of vision-language models (Nemesis) to normalize soft-prompt vectors in VLMs. To the best of our knowledge, our work is the first to systematically investigate the role of norms of soft-prompt vector in VLMs, offering valuable insights for future research in soft-prompt tuning. The code is available at https://github.com/ShyFoo/Nemesis. In the age of large-scale pretrained vision-language models (VLMs), such as CLIP (Radford et al., 2021), Flamingo (Alayrac et al., 2022), and BLIP (Li et al., 2022), soft-prompt-based methods, also known as prompt-tuning, have emerged as a dominant approach for adapting these models to a wide range of downstream tasks. For instance, Zhou et al. (2022b) propose a Context Optimization (CoOp) method to learn soft prompts in a continuous space of CLIP for image classification tasks. Additionally, Rao et al. (2022) and Du et al. (2022) also employ prompt-tuning to address dense prediction and open-vocabulary object detection tasks, respectively. Recent research in the field of VLMs has been primarily focused on enhancing model performance through the alignment of visual and textual features. For instance, in (Lu et al., 2022), the weight distribution of output embeddings is estimated, while Zang et al. (2022) propose a joint optimization approach for prompts across multiple modalities.
Report 84-35 A Method for Managing Evidential Reasoning
Although informal models of evidential reasoning have been successfully app'ied in automated reasoning systems, it is generally difficult to define the range of their applicability In addition, they hay., not provided a basis for coherent management of evidence bearing on hypotheses that are related hierarchically. The Dempster-Shafer (D-S) theory of evidence is appealing because it does suggest a coherent approach for dealing with such relationships However, the theory's complexity and potential for computational inefficiency have tended to discourage its use in reasoning systems In this paper we describe the central elements of the D-S theory, basing our exposition on simple examples drawn from the field of medicine. We then demonstrate the relevance of the 0-S theory to a familiar expert system domain, namely the bacterial organism identification problem that lies at the heart of the MYCIN system. Finally, we present a new adaptation of the D-S approach that achieves computational efficiency while permitting the management of evidential reasoning.within