Goto

Collaborating Authors

 llama-2-7b-chat-hf



Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization

Neural Information Processing Systems

Models (LLMs) and build personalized LLMs tailored for various applications. While fine-tuning seems to be a direct solution, it requires substantial computational resources and may significantly affect the utility of the original LLM.


TelcoLM: collecting data, adapting, and benchmarking language models for the telecommunication domain

Barboule, Camille, Huynh, Viet-Phi, Bufort, Adrien, Chabot, Yoan, Damnati, Géraldine, Lecorvé, Gwénolé

arXiv.org Artificial Intelligence

Despite outstanding processes in many tasks, Large Language Models (LLMs) still lack accuracy when dealing with highly technical domains. Especially, telecommunications (telco) is a particularly challenging domain due the large amount of lexical, semantic and conceptual peculiarities. Yet, this domain holds many valuable use cases, directly linked to industrial needs. Hence, this paper studies how LLMs can be adapted to the telco domain. It reports our effort to (i) collect a massive corpus of domain-specific data (800M tokens, 80K instructions), (ii) perform adaptation using various methodologies, and (iii) benchmark them against larger generalist models in downstream tasks that require extensive knowledge of telecommunications. Our experiments on Llama-2-7b show that domain-adapted models can challenge the large generalist models. They also suggest that adaptation can be restricted to a unique instruction-tuning step, dicarding the need for any fine-tuning on raw texts beforehand.


Global Challenge for Safe and Secure LLMs Track 1

Jia, Xiaojun, Huang, Yihao, Liu, Yang, Tan, Peng Yan, Yau, Weng Kuan, Mak, Mun-Thye, Sim, Xin Ming, Ng, Wee Siong, Ng, See Kiong, Liu, Hanqing, Zhou, Lifeng, Yan, Huanqian, Sun, Xiaobing, Liu, Wei, Wang, Long, Qian, Yiming, Liu, Yong, Yang, Junxiao, Zhang, Zhexin, Lei, Leqi, Chen, Renmiao, Lu, Yida, Cui, Shiyao, Wang, Zizhou, Li, Shaohua, Wang, Yan, Goh, Rick Siow Mong, Zhen, Liangli, Zhang, Yingjie, Zhao, Zhe

arXiv.org Artificial Intelligence

This paper introduces the Global Challenge for Safe and Secure Large Language Models (LLMs), a pioneering initiative organized by AI Singapore (AISG) and the CyberSG R&D Programme Office (CRPO) to foster the development of advanced defense mechanisms against automated jailbreaking attacks. With the increasing integration of LLMs in critical sectors such as healthcare, finance, and public administration, ensuring these models are resilient to adversarial attacks is vital for preventing misuse and upholding ethical standards. This competition focused on two distinct tracks designed to evaluate and enhance the robustness of LLM security frameworks. Track 1 tasked participants with developing automated methods to probe LLM vulnerabilities by eliciting undesirable responses, effectively testing the limits of existing safety protocols within LLMs. Participants were challenged to devise techniques that could bypass content safeguards across a diverse array of scenarios, from offensive language to misinformation and illegal activities. Through this process, Track 1 aimed to deepen the understanding of LLM vulnerabilities and provide insights for creating more resilient models.


Visual Editing with LLM-based Tool Chaining: An Efficient Distillation Approach for Real-Time Applications

Sultan, Oren, Khasin, Alex, Shiran, Guy, Greenstein-Messica, Asnat, Shahaf, Dafna

arXiv.org Artificial Intelligence

We present a practical distillation approach to fine-tune LLMs for invoking tools in real-time applications. We focus on visual editing tasks; specifically, we modify images and videos by interpreting user stylistic requests, specified in natural language ("golden hour"), using an LLM to select the appropriate tools and their parameters to achieve the desired visual effect. We found that proprietary LLMs such as GPT-3.5-Turbo show potential in this task, but their high cost and latency make them unsuitable for real-time applications. In our approach, we fine-tune a (smaller) student LLM with guidance from a (larger) teacher LLM and behavioral signals. We introduce offline metrics to evaluate student LLMs. Both online and offline experiments show that our student models manage to match the performance of our teacher model (GPT-3.5-Turbo), significantly reducing costs and latency. Lastly, we show that fine-tuning was improved by 25% in low-data regimes using augmentation.


Unlocking the Future: Exploring Look-Ahead Planning Mechanistic Interpretability in Large Language Models

Men, Tianyi, Cao, Pengfei, Jin, Zhuoran, Chen, Yubo, Liu, Kang, Zhao, Jun

arXiv.org Artificial Intelligence

Planning, as the core module of agents, is crucial in various fields such as embodied agents, web navigation, and tool using. With the development of large language models (LLMs), some researchers treat large language models as intelligent agents to stimulate and evaluate their planning capabilities. However, the planning mechanism is still unclear. In this work, we focus on exploring the look-ahead planning mechanism in large language models from the perspectives of information flow and internal representations. First, we study how planning is done internally by analyzing the multi-layer perception (MLP) and multi-head self-attention (MHSA) components at the last token. We find that the output of MHSA in the middle layers at the last token can directly decode the decision to some extent. Based on this discovery, we further trace the source of MHSA by information flow, and we reveal that MHSA mainly extracts information from spans of the goal states and recent steps. According to information flow, we continue to study what information is encoded within it. Specifically, we explore whether future decisions have been encoded in advance in the representation of flow. We demonstrate that the middle and upper layers encode a few short-term future decisions to some extent when planning is successful. Overall, our research analyzes the look-ahead planning mechanisms of LLMs, facilitating future research on LLMs performing planning tasks.


Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization

Cao, Yuanpu, Zhang, Tianrong, Cao, Bochuan, Yin, Ziyi, Lin, Lu, Ma, Fenglong, Chen, Jinghui

arXiv.org Artificial Intelligence

Researchers have been studying approaches to steer the behavior of Large Language Models (LLMs) and build personalized LLMs tailored for various applications. While fine-tuning seems to be a direct solution, it requires substantial computational resources and may significantly affect the utility of the original LLM. Recent endeavors have introduced more lightweight strategies, focusing on extracting "steering vectors" to guide the model's output toward desired behaviors by adjusting activations within specific layers of the LLM's transformer architecture. However, such steering vectors are directly extracted from the activations of human preference data and thus often lead to suboptimal results and occasional failures, especially in alignment-related scenarios. This work proposes an innovative approach that could produce more effective steering vectors through bi-directional preference optimization. Our method is designed to allow steering vectors to directly influence the generation probability of contrastive human preference data pairs, thereby offering a more precise representation of the target behavior. By carefully adjusting the direction and magnitude of the steering vector, we enabled personalized control over the desired behavior across a spectrum of intensities. Extensive experimentation across various open-ended generation tasks, particularly focusing on steering AI personas, has validated the efficacy of our approach. Moreover, we comprehensively investigate critical alignment-concerning scenarios, such as managing truthfulness, mitigating hallucination, and addressing jailbreaking attacks. Remarkably, our method can still demonstrate outstanding steering effectiveness across these scenarios. Furthermore, we showcase the transferability of our steering vectors across different models/LoRAs and highlight the synergistic benefits of applying multiple vectors simultaneously.