Zeng, Yuchen
State-offset Tuning: State-based Parameter-Efficient Fine-Tuning for State Space Models
Kang, Wonjun, Galim, Kevin, Zeng, Yuchen, Lee, Minjae, Koo, Hyung Il, Cho, Nam Ik
State Space Models (SSMs) have emerged as efficient alternatives to Transformers, mitigating their quadratic computational cost. However, the application of Parameter-Efficient Fine-Tuning (PEFT) methods to SSMs remains largely unexplored. In particular, prompt-based methods like Prompt Tuning and Prefix-Tuning, which are widely used in Transformers, do not perform well on SSMs. To address this, we propose state-based methods as a superior alternative to prompt-based methods. This new family of methods naturally stems from the architectural characteristics of SSMs. State-based methods adjust state-related features directly instead of depending on external prompts. Furthermore, we introduce a novel state-based PEFT method: State-offset Tuning. At every timestep, our method directly affects the state at the current step, leading to more effective adaptation. Through extensive experiments across diverse datasets, we demonstrate the effectiveness of our method. Code is available at https://github.com/furiosa-ai/ssm-state-tuning.
DARWIN 1.5: Large Language Models as Materials Science Adapted Learners
Xie, Tong, Wan, Yuwei, Liu, Yixuan, Zeng, Yuchen, Zhang, Wenjie, Kit, Chunyu, Zhou, Dongzhan, Hoex, Bram
Materials discovery and design aim to find components and structures with desirable properties over highly complex and diverse search spaces. Traditional solutions, such as high-throughput simulations and machine learning (ML), often rely on complex descriptors, which hinder generalizability and transferability across tasks. Moreover, these descriptors may deviate from experimental data due to inevitable defects and purity issues in the real world, which may reduce their effectiveness in practical applications. To address these challenges, we propose Darwin 1.5, an open-source large language model (LLM) tailored for materials science. By leveraging natural language as input, Darwin eliminates the need for task-specific descriptors and enables a flexible, unified approach to material property prediction and discovery. We employ a two-stage training strategy combining question-answering (QA) fine-tuning with multi-task learning (MTL) to inject domain-specific knowledge in various modalities and facilitate cross-task knowledge transfer. Through our strategic approach, we achieved a significant enhancement in the prediction accuracy of LLMs, with a maximum improvement of 60\% compared to LLaMA-7B base models. It further outperforms traditional machine learning models on various tasks in material science, showcasing the potential of LLMs to provide a more versatile and scalable foundation model for materials discovery and design.
Parameter-Efficient Fine-Tuning of State Space Models
Galim, Kevin, Kang, Wonjun, Zeng, Yuchen, Koo, Hyung Il, Lee, Kangwook
Deep State Space Models (SSMs), such as Mamba (Gu & Dao, 2024), have emerged as powerful tools for language modeling, offering high performance with efficient inference and linear scaling in sequence length. However, the application of parameter-efficient fine-tuning (PEFT) methods to SSM-based models remains largely unexplored. This paper aims to systematically study two key questions: (i) How do existing PEFT methods perform on SSM-based models? (ii) Which modules are most effective for fine-tuning? We conduct an empirical benchmark of four basic PEFT methods on SSM-based models. Our findings reveal that prompt-based methods (e.g., prefix-tuning) are no longer effective, an empirical result further supported by theoretical analysis. In contrast, LoRA remains effective for SSM-based models. We further investigate the optimal application of LoRA within these models, demonstrating both theoretically and experimentally that applying LoRA to linear projection matrices without modifying SSM modules yields the best results, as LoRA is not effective at tuning SSM modules. To further improve performance, we introduce LoRA with Selective Dimension tuning (SDLoRA), which selectively updates certain channels and states on SSM modules while applying LoRA to linear projection matrices. Extensive experimental results show that this approach outperforms standard LoRA.
Can MLLMs Perform Text-to-Image In-Context Learning?
Zeng, Yuchen, Kang, Wonjun, Chen, Yicong, Koo, Hyung Il, Lee, Kangwook
The evolution from Large Language Models (LLMs) to Multimodal Large Language Models (MLLMs) has spurred research into extending In-Context Learning (ICL) to its multimodal counterpart. Existing such studies have primarily concentrated on image-to-text ICL. However, the Text-to-Image ICL (T2I-ICL), with its unique characteristics and potential applications, remains underexplored. To address this gap, we formally define the task of T2I-ICL and present CoBSAT, the first T2I-ICL benchmark dataset, encompassing ten tasks. Utilizing our dataset to benchmark six state-of-the-art MLLMs, we uncover considerable difficulties MLLMs encounter in solving T2I-ICL. We identify the primary challenges as the inherent complexity of multimodality and image generation. To overcome these challenges, we explore strategies like fine-tuning and Chain-of-Thought prompting, demonstrating notable improvements. Our code and dataset are available at \url{https://github.com/UW-Madison-Lee-Lab/CoBSAT}.
The Expressive Power of Low-Rank Adaptation
Zeng, Yuchen, Lee, Kangwook
Low-Rank Adaptation (LoRA), a parameter-efficient fine-tuning method that leverages low-rank adaptation of weight matrices, has emerged as a prevalent technique for fine-tuning pre-trained models such as large language models and diffusion models. Despite its huge success in practice, the theoretical underpinnings of LoRA have largely remained unexplored. This paper takes the first step to bridge this gap by theoretically analyzing the expressive power of LoRA. We also quantify the approximation error when the LoRArank is lower than the threshold. All our theoretical insights are validated by numerical experiments. Recent foundation models, such as large language models (OpenAI, 2023; Liu et al., 2019; Touvron et al., 2023), have achieved remarkable success in a wide range of applications. Due to their substantial size, the standard full fine-tuning approach--where all the model's parameters are updated for specialized tasks--is becoming increasingly difficult and inefficient. This leads to the growing popularity of parameter-efficient fine-tuning approaches (Hu et al., 2022a; Liu et al., 2022; Ben Zaken et al., 2022; Hu et al., 2022b). Instead of updating all parameters, these approaches selectively update smaller subsets of weights or introduce lightweight adapters, thereby greatly decreasing the computational and storage costs. The most dominant approach along this line is Low-Rank Adaptation (LoRA) (Hu et al., 2022a), which employs lightweight low-rank adapters to pre-trained weight matrices. Far from merely enhancing computational efficiency, empirical evidence has shown that LoRA can match or even exceed the performance of full fine-tuning (Hu et al., 2022a). To date, LoRA has been widely used and achieved considerable success in adapting large language models (Hu et al., 2022a; Dinh et al., 2022b) and image generation models (Ryu, 2023; Fan et al., 2023) for various downstream tasks. Despite the empirical success of LoRA, little is known in theory about how it works. In fact, several crucial theoretical questions remain open, such as: What is the minimum rank of the LoRA adapters required to adapt a (pre-trained) model f to match the functionality of the target model f? How does the model architecture (i.e., depth, width) affect the minimal rank? If the adapter rank is lower than this threshold, what is the resulting approximation error?
Equal Improvability: A New Fairness Notion Considering the Long-term Impact
Guldogan, Ozgur, Zeng, Yuchen, Sohn, Jy-yong, Pedarsani, Ramtin, Lee, Kangwook
Devising a fair classifier that does not discriminate against different groups is an important problem in machine learning. Recently, effort-based fairness notions are getting attention, which considers the scenarios of each individual making effort to improve its feature over time. Such scenarios happen in the real world, e.g., college admission and credit loaning, where each rejected sample makes effort to change its features to get accepted afterward. In this paper, we propose a new effortbased fairness notion called Equal Improvability (EI), which equalizes the potential acceptance rate of the rejected samples across different groups assuming a bounded level of effort will be spent by each rejected sample. We also propose and study three different approaches for finding a classifier that satisfies the EI requirement. Through experiments on both synthetic and real datasets, we demonstrate that the proposed EI-regularized algorithms encourage us to find a fair classifier in terms of EI. Additionally, we ran experiments on dynamic scenarios which highlight the advantages of our EI metric in equalizing the distribution of features across different groups, after the rejected samples make some effort to improve. Finally, we provide mathematical analyses of several aspects of EI: the relationship between EI and existing fairness notions, and the effect of EI in dynamic scenarios. Over the past decade, machine learning has been used in a wide variety of applications. However, these machine learning approaches are observed to be unfair to individuals having different ethnicity, race, and gender. As the implicit bias in artificial intelligence tools raised concerns over potential discrimination and equity issues, various researchers suggested defining fairness notions and developing classifiers that achieve fairness. One popular fairness notion is demographic parity (DP), which requires the decision-making system to provide output such that the groups are equally likely to be assigned to the desired prediction classes, e.g., acceptance in the admission procedure. DP and related fairness notions are largely employed to mitigate the bias in many realistic problems such as recruitment, credit lending, and university admissions (Zafar et al., 2017b; Hardt et al., 2016; Dwork et al., 2012; Zafar et al., 2017a). However, most of the existing fairness notions only focus on immediate fairness, without taking potential follow-up inequity risk into consideration.
Improving Fairness via Federated Learning
Zeng, Yuchen, Chen, Hongxu, Lee, Kangwook
Recently, lots of algorithms have been proposed for learning a fair classifier from decentralized data. However, many theoretical and algorithmic questions remain open. First, is federated learning necessary, i.e., can we simply train locally fair classifiers and aggregate them? In this work, we first propose a new theoretical framework, with which we demonstrate that federated learning can strictly boost model fairness compared with such non-federated algorithms. We then theoretically and empirically show that the performance tradeoff of FedAvg-based fair learning algorithms is strictly worse than that of a fair classifier trained on centralized data. To bridge this gap, we propose FedFB, a private fair learning algorithm on decentralized data. The key idea is to modify the FedAvg protocol so that it can effectively mimic the centralized fair learning. Our experimental results show that FedFB significantly outperforms existing approaches, sometimes matching the performance of the centrally trained model.
Multiway clustering via tensor block models
Zeng, Yuchen, Wang, Miaoyan
We consider the problem of identifying multiway block structure from a large noisy tensor. Such problems arise frequently in applications such as genomics, recommendation system, topic modeling, and sensor network localization. We propose a tensor block model, develop a unified least-square estimation, and obtain the theoretical accuracy guarantees for multiway clustering. The statistical convergence of the estimator is established, and we show that the associated clustering procedure achieves partition consistency. A sparse regularization is further developed for identifying important blocks with elevated means. The proposal handles a broad range of data types, including binary, continuous, and hybrid observations. Through simulation and application to two real datasets, we demonstrate the outperformance of our approach over previous methods.