few-shot learner
- Europe > Switzerland > Zürich > Zürich (0.14)
- Asia > China > Jiangsu Province (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- North America > United States > Pennsylvania (0.04)
- North America > United States > New York > Suffolk County > Stony Brook (0.04)
- Asia > Nepal (0.04)
Exploring Diverse In-Context Configurations for Image Captioning
After discovering that Language Models (LMs) can be good in-context few-shot learners, numerous strategies have been proposed to optimize in-context sequence configurations. Recently, researchers in Vision-Language (VL) domains also develop their few-shot learners, while they only use the simplest way, \ie, randomly sampling, to configure in-context image-text pairs. In order to explore the effects of varying configurations on VL in-context learning, we devised four strategies for image selection and four for caption assignment to configure in-context image-text pairs for image captioning. Here Image Captioning is used as the case study since it can be seen as the visually-conditioned LM. Our comprehensive experiments yield two counter-intuitive but valuable insights, highlighting the distinct characteristics of VL in-context learning due to multi-modal synergy, as compared to the NLP case. Furthermore, in our exploration of optimal combination strategies, we observed an average performance enhancement of 20.9 in CIDEr scores compared to the baseline.
Defending Pre-trained Language Models as Few-shot Learners against Backdoor Attacks
Pre-trained language models (PLMs) have demonstrated remarkable performance as few-shot learners. However, their security risks under such settings are largely unexplored. In this work, we conduct a pilot study showing that PLMs as few-shot learners are highly vulnerable to backdoor attacks while existing defenses are inadequate due to the unique challenges of few-shot scenarios. To address such challenges, we advocate MDP, a novel lightweight, pluggable, and effective defense for PLMs as few-shot learners. Specifically, MDP leverages the gap between the masking-sensitivity of poisoned and clean samples: with reference to the limited few-shot data as distributional anchors, it compares the representations of given samples under varying masking and identifies poisoned samples as ones with significant variations. We show analytically that MDP creates an interesting dilemma for the attacker to choose between attack effectiveness and detection evasiveness. The empirical evaluation using benchmark datasets and representative attacks validates the efficacy of MDP. The code of MDP is publicly available.
Language Models are Few-Shot Learners
We demonstrate that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even becoming competitive with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks. We also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora.
- Europe > Switzerland > Zürich > Zürich (0.14)
- Asia > China > Jiangsu Province (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- North America > United States > Pennsylvania (0.04)
- North America > United States > New York > Suffolk County > Stony Brook (0.04)
- Asia > Nepal (0.04)
- Information Technology > Security & Privacy (0.49)
- Media > Film (0.46)
Appendix of " Decoupling Knowledge from Memorization: Retrieval-augmented Prompt Learning "
T ( x) = [CLS]x It was [MASK]. PLM to extract the label-related words from the whole unlabeled training corpus. We report the hyper-parameters in Table 2. Most of the hyper-parameters are the default parameters Thus, we provide insight into the effect of β, k and λ on the final results. We think the model may require more reference when there is no data for training. We will leave the engineering optimization about retrieval speed in our future work.
Review for NeurIPS paper: Language Models are Few-Shot Learners
Strengths: The paper in one of these research works that are simple conceptually (training a very large language model at scale) yet ground-breaking (redefines what we thought was possible). The amount of work behind this is enormous and the combination of simplicity, strong engineering work and new discovery makes it a very enjoyable paper to read. I have of course particularly enjoyed reading the part on the distinction of zero-/one-/few-shot learning and seeing the incredible capacity of the GPT-3 model. The fact that a very big neural net can perform a language task without any finetuning is definitely novel and in my opinion unforeseen. This takes us much closer to a system capable of performing multiple tasks at once with little to no supervision - as humans - and reveals a hint of what will be possible in the *near* future with large-scale self-supervised techniques, possibly combined with multiple modalities.