Goto

Collaborating Authors

 instruction


Queryable LoRA: Instruction-Regularized Routing Over Shared Low-Rank Update Atoms

arXiv.org Machine Learning

We present a data-adaptive method for parameter-efficient fine-tuning of large neural networks. Standard low-rank adaptation methods improve efficiency by restricting each layer update to a fixed low-rank form, but this static parameterization can be too rigid when the appropriate correction depends on the input and on the evolving depth-wise computation of the network. Our approach replaces a purely layer-local adapter with a shared queryable memory of low-rank update atoms. For each block of layers, the model forms a query from the current low-rank state and a running summary of previous blocks, uses this query to retrieve a content-dependent combination of shared update components via attention, and applies the resulting routed operator within the low-rank bottleneck. In this way, the method retains the efficiency and scalability of low-rank adaptation while allowing the effective update to vary across inputs and to share reusable structure across layers. The resulting architecture provides a principled middle ground between static LoRA-style updates and fully generated parameter updates: it remains compact and parameter-efficient while supporting dynamic, context-sensitive adaptation. Further, we incorporate instruction-regularization by augmenting routing logits with a language-induced prior over update atoms, thereby biasing the selection of low-rank transformations toward semantically relevant directions without generating unconstrained parameter updates. Experiments on noisy non-linear regression tasks and LLM fine-tuning suggest that this queryable update-memory formulation can improve final test performance and training stability compared to standard low-rank adaptation, while using a comparable number of trainable parameters.


5fc47800ee5b30b8777fdd30abcaaf3b-Supplemental-Conference.pdf

Neural Information Processing Systems

Having defined and validated the pairwise feedback simulator and evaluations in AlpacaFarm, we569 now turn our attention to studying methods that learn from pairwise feedback on AlpacaFarm.570 Unfortunately, the lack of existing benchmarks for learning from pairwise feedback for instruction571 following means that there has not been any open study of these methods in the instruction-following572 setting. In the remainder of this section, we will introduce our reference methods, which fall into two575 categories based on whether they fit a surrogate reward model as part of the learning process.576 FeedME is a method proposed by OpenAI [45] that incorporates human feedback578 with supervised fine-tuning on model generations that are rated 7/7 by human labelers. We adapt579 this approach to the pairwise feedback setting and call this baseline binary FeedME. This approach580 fine-tunes the SFT model on the chosen response in each preference pair with supervised learning.581 Motivated by controllable generation through conditioning [27, 34,582 29, 21], we propose binary reward conditioning, a baseline method that fine-tunes the SFT model583 with the feedback data Dpairwise by conditioning instances with either a positive or negative control584 token. Specifically, for each instance (x,y0,y1,z) 2D pairwise, the string concatenation of instruction585 x and response yz denoted as [x,yz] is prepended with the positive token and used in supervised586 fine-tuning (similarly [x,y1 z]is prepended with the negative token). This process creates a modified587 demonstration dataset that is double the size of Dpairwise. At test time, we draw samples from the588 fine-tuned model conditioned on the positive token.589 A.2 Methods that optimize a surrogate reward function590 We now describe methods that incorporate feedback by first building a surrogate reward model with591 pairwise feedback data. To start, we describe the step of training the surrogate reward model.592 While this can be a powerful approach,596 we will see that it can also lead to over-optimization [19] where models learn to exploit the reward597 model rather than achieve high true reward. We now describe 4 methods that leverage the surrogate598 reward model.599



0cd6a652ed1f7811192db1f700c8f0e7-Paper.pdf

Neural Information Processing Systems

Large language models have recently shown a remarkable ability for few-shot learning, including patterns of algorithmic nature. However, it is still an open question to determine what kind of patterns these models can capture and how many examples they need in their prompts. We frame this question as a teaching problem with strong priors, and study whether language models can identify simple algorithmic concepts from small witness sets. In particular, we explore how several GPT architectures, program induction systems and humans perform in terms of the complexity of the concept and the number of additional examples, and how much their behaviour differs. This first joint analysis of language models and machine teaching can address key questions for artificial intelligence and machine learning, such as whether some strong priors, and Occam's razor in particular, can be distilled from data, making learning from a few examples possible.


Sam Altman's ChatGPT Couldn't Stop Obsessing Over Goblins

Mother Jones

OpenAI desires less regulation, but it still doesn't know how its chatbot works. Get your news from a source that's not owned and controlled by oligarchs. OpenAI admitted it had to develop a specific instruction in the code of its latest model of ChatGPT to stop it from repeatedly referencing "goblins, gremlins, and other creatures." In an explanation posted Wednesday, the company said the "strange habit" came from its chatbot personality feature --specifically for users who chose the "Nerdy" personality. You are an unapologetically nerdy, playful and wise AI mentor to a human.


ChatGPT has a 'goblin' obsession. Now we know why

PCWorld

PCWorld reports that OpenAI's GPT models, including GPT-5.5, developed an unusual obsession with mentioning goblins and similar creatures in responses. This quirky behavior stemmed from a "Nerdy" personality instruction encouraging playful language use, which became reinforced through AI training processes. The goblin references became so prevalent that OpenAI implemented a direct ban in its Codex app, illustrating the unpredictable nature of large language model training. I've seen some odd AI system instructions in my day, but this one takes the cake: a prompt in OpenAI's Codex command-line app that demands models "never talk about goblins, gremlins, trolls, ogres, pigeons, or other animals or creatures."


f86c5c4d4dca70d30b1c12a33a2bc1a4-Supplemental-Conference.pdf

Neural Information Processing Systems

In this supplementary material, we provide more details regarding implementation details in Appendix B, more analysis of ERDA in Appendix C, full experimental results in Appendix D, and studies on parameters in Appendix E.


ed3fea9033a80fea1376299fa7863f4a-Paper-Conference.pdf

Neural Information Processing Systems

Large Language Models (LLMs) can achieve strong performance on many tasks by producing step-by-step reasoning before giving a final output, often referred to as chain-of-thought reasoning (CoT). It is tempting to interpret these CoT explanations as the LLM's process for solving a task. This level of transparency into LLMs' predictions would yield significant safety benefits. However, we find that CoT explanations can systematically misrepresent the true reason for a model's prediction. We demonstrate that CoT explanations can be heavily influenced by adding biasing features to model inputs--e.g., by reordering the multiple-choice options in a few-shot prompt to make the answer always "(A)"--which models systematically fail to mention in their explanations.