Goto

Collaborating Authors

 ift


Data Descriptions from Large Language Models with Influence Estimation

Kim, Chaeri, Bae, Jaeyeon, Kim, Taehwan

arXiv.org Artificial Intelligence

Deep learning models have been successful in many areas but understanding their behaviors still remains a black-box. Most prior explainable AI (XAI) approaches have focused on interpreting and explaining how models make predictions. In contrast, we would like to understand how data can be explained with deep learning model training and propose a novel approach to understand the data via one of the most common media - language - so that humans can easily understand. Our approach proposes a pipeline to generate textual descriptions that can explain the data with large language models by incorporating external knowledge bases. However, generated data descriptions may still include irrelevant information, so we introduce to exploit influence estimation to choose the most informative textual descriptions, along with the CLIP score. Furthermore, based on the phenomenon of cross-modal transferability, we propose a novel benchmark task named cross-modal transfer classification to examine the effectiveness of our textual descriptions. In the experiment of zero-shot setting, we show that our textual descriptions are more effective than other baseline descriptions, and furthermore, we successfully boost the performance of the model trained only on images across all nine image classification datasets. These results are further supported by evaluation using GPT-4o. Through our approach, we may gain insights into the inherent interpretability of the decision-making process of the model.


When Does Reasoning Matter? A Controlled Study of Reasoning's Contribution to Model Performance

Boizard, Nicolas, Gisserot-Boukhlef, Hippolyte, El-Haddad, Kevin, Hudelot, Céline, Colombo, Pierre

arXiv.org Artificial Intelligence

MICS, CentraleSup elec, Universit e Paris-Saclay Large Language Models (LLMs) with reasoning capabilities have achieved state-of-the-art performance on a wide range of tasks. Despite its empirical success, the tasks and model scales at which reasoning becomes effective, as well as its training and inference costs, remain underexplored. In this work, we rely on a synthetic data distillation framework to conduct a large-scale supervised study. We compare Instruction Fine-Tuning (IFT) and reasoning models of varying sizes, on a wide range of math-centric and general-purpose tasks, evaluating both multiple-choice and open-ended formats. Our analysis reveals that reasoning consistently improves model performance, often matching or surpassing significantly larger IFT systems. Notably, while IFT remains Pareto-optimal in training and inference costs, reasoning models become increasingly valuable as model size scales, overcoming IFT performance limits on reasoning-intensive and open-ended tasks. Reasoning helps most on open-ended and math tasks; gains are limited or inconsistent on general multiple-choice tasks. Large Language Models (LLMs) that generate explicit Chains of Thought (CoT) have rapidly become a defining paradigm. The research community is releasing increasingly capable reasoning models, which consistently outperform standard Instruction Fine-Tuned (IFT) counterparts at test time, especially on math, coding, and other reasoning-heavy tasks DeepSeek-AI (2025); OpenAI (2024); Mistral-AI (2025). Despite rapid progress, we still lack clarity on when explicit reasoning is most beneficial. Both prior evidence and our findings (Figure 1) point to a highly task-dependent picture: reasoning yields substantial gains on math and coding benchmarks where multi-step problem solving is essential (Zhu et al., 2024), but provides only limited improvements on simpler factual or classification tasks (Liu et al., 2024). As Figure 1 shows, these gains concentrate on reasoning-intensive (e.g., gsm8k, aime) and open-ended tasks, while benefits on general multiple-choice tasks are much smaller or inconsistent. Meanwhile, the scaling dynamics of reasoning models pose further challenges.


Anchoring Refusal Direction: Mitigating Safety Risks in Tuning via Projection Constraint

Du, Yanrui, Fan, Fenglei, Zhao, Sendong, Cao, Jiawei, Lin, Qika, He, Kai, Liu, Ting, Qin, Bing, Feng, Mengling

arXiv.org Artificial Intelligence

Instruction Fine-Tuning (IFT) has been widely adopted as an effective post-training strategy to enhance various abilities of Large Language Models (LLMs). However, prior studies have shown that IFT can significantly compromise LLMs' safety, particularly their ability to refuse malicious instructions, raising significant concerns. Recent research into the internal mechanisms of LLMs has identified the refusal direction (r-direction) in the hidden states, which plays a pivotal role in governing refusal behavior. Building on this insight, our study reveals that the r-direction tends to drift during training, which we identify as one of the causes of the associated safety risks. To mitigate such drift, our proposed ProCon method introduces a projection-constrained loss term that regularizes the projection magnitude of each training sample's hidden state onto the r-direction. Our initial analysis shows that applying an appropriate constraint can effectively mitigate the refusal direction drift and associated safety risks, but remains limited by overall performance barriers. To overcome this barrier, informed by our observation of early-stage sharp drift and a data-driven perspective, we introduce a warm-up strategy that emphasizes early-stage strong constraints and broaden the data distribution to strengthen constraint signals, leading to an enhanced ProCon method. Experimental results under various datasets, scenarios, and LLMs demonstrate that our method can significantly mitigate safety risks posed by IFT while preserving task performance gains. Even compared with strong baselines, our method consistently delivers superior overall performance. Crucially, our analysis indicates that ProCon can contribute to stabilizing the r-direction during training, while such an interpretability-driven exploration of LLMs' internal mechanisms lays a solid foundation for future safety research.


Navigating Rifts in Human-LLM Grounding: Study and Benchmark

Shaikh, Omar, Mozannar, Hussein, Bansal, Gagan, Fourney, Adam, Horvitz, Eric

arXiv.org Artificial Intelligence

Language models excel at following instructions but often struggle with the collaborative aspects of conversation that humans naturally employ. This limitation in grounding -- the process by which conversation participants establish mutual understanding -- can lead to outcomes ranging from frustrated users to serious consequences in high-stakes scenarios. To systematically study grounding challenges in human-LLM interactions, we analyze logs from three human-assistant datasets: WildChat, MultiWOZ, and Bing Chat. We develop a taxonomy of grounding acts and build models to annotate and forecast grounding behavior. Our findings reveal significant differences in human-human and human-LLM grounding: LLMs were three times less likely to initiate clarification and sixteen times less likely to provide follow-up requests than humans. Additionally, early grounding failures predicted later interaction breakdowns. Building on these insights, we introduce RIFTS: a benchmark derived from publicly available LLM interaction data containing situations where LLMs fail to initiate grounding. We note that current frontier models perform poorly on RIFTS, highlighting the need to reconsider how we train and prompt LLMs for human interaction. To this end, we develop a preliminary intervention that mitigates grounding failures.


How Expressive are Knowledge Graph Foundation Models?

Huang, Xingyue, Barceló, Pablo, Bronstein, Michael M., Ceylan, İsmail İlkan, Galkin, Mikhail, Reutter, Juan L, Orth, Miguel Romero

arXiv.org Artificial Intelligence

Knowledge Graph Foundation Models (KGFMs) are at the frontier for deep learning on knowledge graphs (KGs), as they can generalize to completely novel knowledge graphs with different relational vocabularies. Despite their empirical success, our theoretical understanding of KGFMs remains very limited. In this paper, we conduct a rigorous study of the expressive power of KGFMs. Specifically, we show that the expressive power of KGFMs directly depends on the motifs that are used to learn the relation representations. We then observe that the most typical motifs used in the existing literature are binary, as the representations are learned based on how pairs of relations interact, which limits the model's expressiveness. As part of our study, we design more expressive KGFMs using richer motifs, which necessitate learning relation representations based on, e.g., how triples of relations interact with each other. Finally, we empirically validate our theoretical findings, showing that the use of richer motifs results in better performance on a wide range of datasets drawn from different domains.


Navigating the Helpfulness-Truthfulness Trade-Off with Uncertainty-Aware Instruction Fine-Tuning

Wu, Tianyi, Ni, Jingwei, Hooi, Bryan, Zhang, Jiaheng, Ash, Elliott, Ng, See-Kiong, Sachan, Mrinmaya, Leippold, Markus

arXiv.org Artificial Intelligence

Instruction Fine-tuning (IFT) can enhance the helpfulness of Large Language Models (LLMs), but it may lower their truthfulness. This trade-off arises because IFT steers LLMs to generate responses with long-tail knowledge that is not well covered during pre-training, leading to more informative but less truthful answers when generalizing to unseen tasks. In this paper, we empirically demonstrate this helpfulness-truthfulness trade-off in IFT and propose $\textbf{UNIT}$, a novel IFT paradigm to address it. UNIT teaches LLMs to recognize their uncertainty and explicitly reflect it at the end of their responses. Experimental results show that UNIT-tuned models maintain their helpfulness while distinguishing between certain and uncertain claims, thereby reducing hallucinations.


Shopping MMLU: A Massive Multi-Task Online Shopping Benchmark for Large Language Models

Jin, Yilun, Li, Zheng, Zhang, Chenwei, Cao, Tianyu, Gao, Yifan, Jayarao, Pratik, Li, Mao, Liu, Xin, Sarkhel, Ritesh, Tang, Xianfeng, Wang, Haodong, Wang, Zhengyang, Xu, Wenju, Yang, Jingfeng, Yin, Qingyu, Li, Xian, Nigam, Priyanka, Xu, Yi, Chen, Kai, Yang, Qiang, Jiang, Meng, Yin, Bing

arXiv.org Artificial Intelligence

Online shopping is a complex multi-task, few-shot learning problem with a wide and evolving range of entities, relations, and tasks. However, existing models and benchmarks are commonly tailored to specific tasks, falling short of capturing the full complexity of online shopping. Large Language Models (LLMs), with their multi-task and few-shot learning abilities, have the potential to profoundly transform online shopping by alleviating task-specific engineering efforts and by providing users with interactive conversations. Despite the potential, LLMs face unique challenges in online shopping, such as domain-specific concepts, implicit knowledge, and heterogeneous user behaviors. Motivated by the potential and challenges, we propose Shopping MMLU, a diverse multi-task online shopping benchmark derived from real-world Amazon data. Shopping MMLU consists of 57 tasks covering 4 major shopping skills: concept understanding, knowledge reasoning, user behavior alignment, and multi-linguality, and can thus comprehensively evaluate the abilities of LLMs as general shop assistants. With Shopping MMLU, we benchmark over 20 existing LLMs and uncover valuable insights about practices and prospects of building versatile LLM-based shop assistants. Shopping MMLU can be publicly accessed at https://github.com/KL4805/ShoppingMMLU. In addition, with Shopping MMLU, we host a competition in KDD Cup 2024 with over 500 participating teams. The winning solutions and the associated workshop can be accessed at our website https://amazon-kddcup24.github.io/.


Impacts of Continued Legal Pre-Training and IFT on LLMs' Latent Representations of Human-Defined Legal Concepts

Ho, Shaun

arXiv.org Artificial Intelligence

This paper aims to offer AI & Law researchers and practitioners a more detailed understanding of whether and how continued pre - training and instruction fine - tuning (IFT) of large language models (LLMs) on legal corpora increases their utilization of human - defined legal concepts when developing global contextual representations of input sequences. We compare d three models: Mistral 7B, SaulLM - 7B - Base (Mistral 7B with continued pre - training on legal corpora), and SaulLM - 7B - Instruct (with further IFT). T his preliminary assessment examine d 7 distinct text sequences from recent AI & Law literature, each containing a human - defined legal concept. We first compared the proportions of total attention the models allocated to subsets of tokens representing the legal concepts. We then visualized patterns of raw attention score alterations, evaluating whether legal training introduce d novel attention patterns corresponding to structures of human legal knowledge. This inqu i ry revealed that (1) the impact of legal training was unevenly distributed across the various human - defined legal concepts, and (2) the contextual representations of legal knowledge learned during legal training did not coincide with structures of human - defined legal concepts. We conclude with suggestions for further investigation into the dynamics of legal LLM training .


Towards Secure Tuning: Mitigating Security Risks Arising from Benign Instruction Fine-Tuning

Du, Yanrui, Zhao, Sendong, Cao, Jiawei, Ma, Ming, Zhao, Danyang, Fan, Fenglei, Liu, Ting, Qin, Bing

arXiv.org Artificial Intelligence

Instruction Fine-Tuning (IFT) has become an essential method for adapting base Large Language Models (LLMs) into variants for professional and private use. However, researchers have raised concerns over a significant decrease in LLMs' security following IFT, even when the IFT process involves entirely benign instructions (termed Benign IFT). Our study represents a pioneering effort to mitigate the security risks arising from Benign IFT. Specifically, we conduct a Module Robustness Analysis, aiming to investigate how LLMs' internal modules contribute to their security. Based on our analysis, we propose a novel IFT strategy, called the Modular Layer-wise Learning Rate (ML-LR) strategy. In our analysis, we implement a simple security feature classifier that serves as a proxy to measure the robustness of modules (e.g. $Q$/$K$/$V$, etc.). Our findings reveal that the module robustness shows clear patterns, varying regularly with the module type and the layer depth. Leveraging these insights, we develop a proxy-guided search algorithm to identify a robust subset of modules, termed Mods$_{Robust}$. During IFT, the ML-LR strategy employs differentiated learning rates for Mods$_{Robust}$ and the rest modules. Our experimental results show that in security assessments, the application of our ML-LR strategy significantly mitigates the rise in harmfulness of LLMs following Benign IFT. Notably, our ML-LR strategy has little impact on the usability or expertise of LLMs following Benign IFT. Furthermore, we have conducted comprehensive analyses to verify the soundness and flexibility of our ML-LR strategy.


RNR: Teaching Large Language Models to Follow Roles and Rules

Wang, Kuan, Bukharin, Alexander, Jiang, Haoming, Yin, Qingyu, Wang, Zhengyang, Zhao, Tuo, Shang, Jingbo, Zhang, Chao, Yin, Bing, Li, Xian, Chen, Jianshu, Li, Shiyang

arXiv.org Artificial Intelligence

Instruction fine-tuning (IFT) elicits instruction following capabilities and steers the behavior of large language models (LLMs) via supervised learning. However, existing models trained on open-source IFT datasets only have the ability to follow instructions from users, and often fail to follow complex role and rules specified by developers, a.k.a. system prompts. The ability to follow these roles and rules is essential for deployment, as it ensures that the model safely interacts with users within developer defined guidelines. To improve such role and rule following ability, we propose \model, an automated data generation pipeline that generates diverse roles and rules from existing IFT instructions, along with corresponding responses. This data can then be used to train models that follow complex system prompts. The models are evaluated on our newly created benchmarks for role and rule following ability, as well as standard instruction-following benchmarks and general NLP tasks. Our framework significantly improves role and rule following capability in LLMs, as evidenced by over 25% increase in pass-rate on rule adherence, i.e. following all requirements, in our experiments with the Alpaca and Ultrachat datasets. Moreover, our models achieves this increase without any regression on popular instruction following benchmarks.