self-critique
Learning to Learn By Self-Critique
In few-shot learning, a machine learning system is required to learn from a small set of labelled examples of a specific task, such that it can achieve strong generalization on new unlabelled examples of the same task. Given the limited availability of labelled examples in such tasks, we need to make use of all the information we can. For this reason we propose the use of transductive meta-learning for few shot settings to obtain state-of-the-art few-shot learning. Usually a model learns task-specific information from a small training-set (the \emph{support-set}) and subsequently produces predictions on a small unlabelled validation set (\emph{target-set}). The target-set contains additional task-specific information which is not utilized by existing few-shot learning methods. This is a challenge requiring approaches beyond the current methods as at inference time, the target-set contains only input data-points, and so discriminative-based learning cannot be used. In this paper, we propose a framework called \emph{Self-Critique and Adapt} or SCA. This approach learns to learn a label-free loss function, parameterized as a neural network, which leverages target-set information. A base-model learns on a support-set using existing methods (e.g.
Detecting Data Contamination from Reinforcement Learning Post-training for Large Language Models
Tao, Yongding, Wang, Tian, Dong, Yihong, Liu, Huanyu, Zhang, Kechi, Hu, Xiaolong, Li, Ge
Data contamination poses a significant threat to the reliable evaluation of Large Language Models (LLMs). This issue arises when benchmark samples may inadvertently appear in training sets, compromising the validity of reported performance. While detection methods have been developed for the pre-training and Supervised Fine-Tuning stages, a critical research gap exists for the increasingly significant phase of Reinforcement Learning (RL) post-training. As RL post-training becomes pivotal for advancing LLM reasoning, the absence of specialized contamination detection methods in this paradigm presents a critical vulnerability. To address this, we conduct the first systematic study of data detection within RL post-training scenario and propose Self-Critique. Our method is motivated by a key observation: after RL phase, the output entropy distribution of LLMs tends to collapse into highly specific and sparse modes. Self-Critique probes for the underlying policy collapse, i.e., the model's convergence to a narrow reasoning path, which causes this entropy reduction. To facilitate this research, we also introduce RL-MIA, a benchmark constructed to simulate this specific contamination scenario. Extensive experiments show that Self-Critique significantly outperforms baseline methods across multiple models and contamination tasks, achieving an AUC improvement of up to 30%. Whereas existing methods are close to a random guess for RL-phase contamination, our method makes detection possible.
Reviews: Learning to Learn By Self-Critique
Summary: This paper considers few-shot classification and seeks to make use of the unlabeled query data during few-shot classification by training on it with a meta-learned critic loss. The algorithm builds on top of MAML, and has two stages. In the first stage, the model is adapted via gradient descent on the labeled support set. In the second stage, the model is further adapted via a meta-learned critic loss that is a function of a featurization of the model parameters and the unlabeled query set. Originality: The proposed approach strikes me as quite similar to One-Shot Imitation Learning by Domain-Adaptive Meta-Learning (Yu et al. 2018).
Learning to Learn By Self-Critique
In few-shot learning, a machine learning system is required to learn from a small set of labelled examples of a specific task, such that it can achieve strong generalization on new unlabelled examples of the same task. Given the limited availability of labelled examples in such tasks, we need to make use of all the information we can. For this reason we propose the use of transductive meta-learning for few shot settings to obtain state-of-the-art few-shot learning. Usually a model learns task-specific information from a small training-set (the \emph{support-set}) and subsequently produces predictions on a small unlabelled validation set (\emph{target-set}). The target-set contains additional task-specific information which is not utilized by existing few-shot learning methods.
Merging Improves Self-Critique Against Jailbreak Attacks
The robustness of large language models (LLMs) against adversarial manipulations, such as jailbreak attacks, remains a significant challenge. In this work, we propose an approach that enhances the self-critique capability of the LLM and further fine-tunes it over sanitized synthetic data. This is done with the addition of an external critic model that can be merged with the original, thus bolstering self-critique capabilities and improving the robustness of the LLMs response to adversarial prompts. Our results demonstrate that the combination of merging and self-critique can reduce the attack success rate of adversaries significantly, thus offering a promising defense mechanism against jailbreak attacks. Code, data and models released at https://github.com/vicgalle/merging-self-critique-jailbreaks .
Distilled Self-Critique of LLMs with Synthetic Data: a Bayesian Perspective
Review: No Country for Old Men is an extraordinary movie that seamlessly blends elements of crime, drama, and psychological suspense into a cohesive and awe-inspiring work of art. From the opening scene to the final heart-stopping moments, director Joel Cohen has crafted a visually stunning vision that both challenges and captivates the viewer. The cinematography is unparalleled in its ability to convey emotion and character without resorting to cheap tricks or manipulation. The cast members all deliver impressive performances that allow us to empathize with their characters while simultaneously questioning their motives. From Javier Bardem's chilling portrayal of the villain to Tommy Lee Jones' nuanced exploration of a man faced with an impossible moral dilemma. Despite its lengthy runtime, No Country for Old Men maintains an intense narrative that keeps audiences engaged until the very end.
An automatically discovered chain-of-thought prompt generalizes to novel models and datasets
Hebenstreit, Konstantin, Praas, Robert, Kiesewetter, Louis P, Samwald, Matthias
Emergent chain-of-thought (CoT) reasoning capabilities promise to improve performance and explainability of large language models (LLMs). However, uncertainties remain about how reasoning strategies formulated for previous model generations generalize to new model generations and different datasets. In this small-scale study, we compare different reasoning strategies induced by zero-shot prompting across six recently released LLMs (davinci-002, davinci-003, GPT-3.5-turbo, GPT-4, Flan-T5-xxl and Cohere command-xlarge) on a mixture of six question-answering datasets, including datasets from scientific and medical domains. Our findings demonstrate that while some variations in effectiveness occur, gains from CoT reasoning strategies remain robust across different models and datasets. GPT-4 has the most benefit from current state-of-the-art reasoning strategies and exhibits the best performance by applying a prompt previously discovered through automated discovery.
Learning to Learn By Self-Critique
Antoniou, Antreas, Storkey, Amos J.
In few-shot learning, a machine learning system is required to learn from a small set of labelled examples of a specific task, such that it can achieve strong generalization on new unlabelled examples of the same task. Given the limited availability of labelled examples in such tasks, we need to make use of all the information we can. For this reason we propose the use of transductive meta-learning for few shot settings to obtain state-of-the-art few-shot learning. Usually a model learns task-specific information from a small training-set (the \emph{support-set}) and subsequently produces predictions on a small unlabelled validation set (\emph{target-set}). The target-set contains additional task-specific information which is not utilized by existing few-shot learning methods.