Goto

Collaborating Authors

 bct




Consistency Training Helps Stop Sycophancy and Jailbreaks

arXiv.org Artificial Intelligence

An LLM's factuality and refusal training can be compromised by simple changes to a prompt. Models often adopt user beliefs (sycophancy) or satisfy inappropriate requests which are wrapped within special text (jailbreaking). We explore \emph{consistency training}, a self-supervised paradigm that teaches a model to be invariant to certain irrelevant cues in the prompt. Instead of teaching the model what exact response to give on a particular prompt, we aim to teach the model to behave identically across prompt data augmentations (like adding leading questions or jailbreak text). We try enforcing this invariance in two ways: over the model's external outputs (\emph{Bias-augmented Consistency Training} (BCT) from Chua et al. [2025]) and over its internal activations (\emph{Activation Consistency Training} (ACT), a method we introduce). Both methods reduce Gemini 2.5 Flash's susceptibility to irrelevant cues. Because consistency training uses responses from the model itself as training data, it avoids issues that arise from stale training data, such as degrading model capabilities or enforcing outdated response guidelines. While BCT and ACT reduce sycophancy equally well, BCT does better at jailbreak reduction. We think that BCT can simplify training pipelines by removing reliance on static datasets. We argue that some alignment problems are better viewed not in terms of optimal responses, but rather as consistency issues.



The Multiplex Classification Framework: optimizing multi-label classifiers through problem transformation, ontology engineering, and model ensembling

arXiv.org Artificial Intelligence

Classification is a fundamental task in machine learning. While conventional methods--such as binary, multiclass, and multi-label classification--are effective for simpler problems, they may not adequately address the complexities of some real-world scenarios. This paper introduces the Multiplex Classification Framework, a novel approach developed to tackle these and similar challenges through the integration of problem transformation, ontology engineering, and model ensembling. The framework offers several advantages, including adaptability to any number of classes and logical constraints, an innovative method for managing class imbalance, the elimination of confidence threshold selection, and a modular structure. Two experiments were conducted to compare the performance of conventional classification models with the Multiplex approach. Our results demonstrate that the Multiplex approach can improve classification performance significantly (up to 10% gain in overall F1 score), particularly in classification problems with a large number of classes and pronounced class imbalances. However, it also has limitations, as it requires a thorough understanding of the problem domain and some experience with ontology engineering, and it involves training multiple models, which can make the whole process more intricate. Overall, this methodology provides a valuable tool for researchers and practitioners dealing with complex classification problems in machine learning.


A Method for Building Large Language Models with Predefined KV Cache Capacity

arXiv.org Artificial Intelligence

This paper introduces a novel approach, the Bounded-Cache Transformer (BCT), for building large language models with a predefined Key-Value (KV) cache capacity. The BCT addresses the excessive memory consumption issue in traditional KV caches by implementing a bounded-length KV cache, which is particularly suitable for the attention layers in Transformer decode-only architectures. By dynamically updating the key-value vector sequences, the BCT achieves efficient inference within limited cache capacity, significantly reducing memory usage while maintaining model performance and system throughput. Experimental results demonstrate that the BCT significantly reduces memory usage while maintaining the model's inference quality, offering a new solution for efficient inference in large language models.


Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought

arXiv.org Artificial Intelligence

While chain-of-thought prompting (CoT) has the potential to improve the explainability of language model reasoning, it can systematically misrepresent the factors influencing models' behavior--for example, rationalizing answers in line with a user's opinion without mentioning this bias. To mitigate this biased reasoning problem, we introduce bias-augmented consistency training (BCT), an unsupervised fine-tuning scheme that trains models to give consistent reasoning across prompts with and without biasing features. We construct a suite testing nine forms of biased reasoning on seven question-answering tasks, and find that applying BCT to GPT-3.5-Turbo with one bias reduces the rate of biased reasoning by 86% on held-out tasks. Moreover, this model generalizes to other forms of bias, reducing biased reasoning on held-out biases by an average of 37%. As BCT generalizes to held-out biases and does not require gold labels, this method may hold promise for reducing biased reasoning from as-of-yet unknown biases and on tasks where supervision for ground truth reasoning is unavailable.


AI Insights: A Case Study on Utilizing ChatGPT Intelligence for Research Paper Analysis

arXiv.org Artificial Intelligence

This paper discusses the effectiveness of leveraging Chatbot: Generative Pre-trained Transformer (ChatGPT) versions 3.5 and 4 for analyzing research papers for effective writing of scientific literature surveys. The study selected the \textit{Application of Artificial Intelligence in Breast Cancer Treatment} as the research topic. Research papers related to this topic were collected from three major publication databases Google Scholar, Pubmed, and Scopus. ChatGPT models were used to identify the category, scope, and relevant information from the research papers for automatic identification of relevant papers related to Breast Cancer Treatment (BCT), organization of papers according to scope, and identification of key information for survey paper writing. Evaluations performed using ground truth data annotated using subject experts reveal, that GPT-4 achieves 77.3\% accuracy in identifying the research paper categories and 50\% of the papers were correctly identified by GPT-4 for their scopes. Further, the results demonstrate that GPT-4 can generate reasons for its decisions with an average of 27\% new words, and 67\% of the reasons given by the model were completely agreeable to the subject experts.


AI & Blockchain as sustainable teaching and learning tools to cope with the 4IR

arXiv.org Artificial Intelligence

The Fourth Industrial Revolution (4IR) is transforming the way we live and work, and education is no exception. To cope with the challenges of 4IR, there is a need for innovative and sustainable teaching and learning tools. AI and block chain technologies hold great promise in this regard, with potential benefits such as personalized learning, secure credentialing, and decentralized learning networks. This paper presents a review of existing research on AI and block chain in education, analyzing case studies and exploring the potential benefits and challenges of these technologies. The paper also suggests a unique model for integrating AI and block chain into sustainable teaching and learning practices. Future research directions are discussed, including the need for more empirical studies and the exploration of ethical and social implications. The key summary of this discussion is that, by enhancing accessibility, efficacy, and security in education, AI and blockchain have the potential to revolutionise the field. In order to ensure that students can benefit from these potentially game-changing technologies as technology develops, it will be crucial to find ways to harness its power while minimising hazards. Overall, this paper highlights the potential of AI and block chain as sustainable tools for teaching and learning in the 4IR era and their respective advantages, issues and future prospects have been discussed in this writing.


Blockwise Compression of Transformer-based Models without Retraining

arXiv.org Artificial Intelligence

Transformer-based models, exemplified by GPT-3, ChatGPT, and GPT-4, have recently garnered considerable attention in both academia and industry due to their promising performance in general language tasks. Nevertheless, these models typically involve computationally encoding processes, and in some cases, decoding processes as well, both of which are fundamentally large-scale matrix multiplication. These operations bring the inevitable challenges of massive computation resources and huge memory footprint, usually requiring at least 10^23 FLOPs and hundreds of gigabytes, respectively. A common method to address this issue is to reduce the computational and memory requirements by applying layerwise quantization to the transformer, replacing the usual fp32 data type with a low-bit equivalent. Unfortunately, this method often leads to decreased model accuracy and necessitates time-consuming retraining. Such retraining not only requires fine-tuning skills but also substantial computational resources, posing challenges for users. To specifically tackle these issues, we propose BCT, a framework of blockwise compression for transformers without retraining, aiming to facilitate model deployment. Unlike layerwise compression methods, BCT achieves finer compression of the entire transformer by operating blockwise. This method mitigates data distribution deviation caused by quantization, eliminating the requirement for retraining. BCT effectively compresses all components of the model, including but not limited to the embedding, matrix multiplication, GELU, Softmax, layer normalization, and intermediate results. In a case study, an efficient model is compressed by BCT achieving up to 7.988x compression. Subsequently, we also evaluate it on several General Language Understanding Evaluation (GLUE) datasets.