Lim, Woosang
Rotate, Clip, and Partition: Towards W2A4KV4 Quantization by Integrating Rotation and Learnable Non-uniform Quantizer
Choi, Euntae, Song, Sumin, Lim, Woosang, Yoo, Sungjoo
We propose Rotate, Clip, and Partition (RCP), a quantization-aware training (QAT) approach that first realizes extreme compression of LLMs with W2A4KV4(2-bit weight, 4-bit activation, and 4-bit KV cache) configuration. RCP integrates recent rotation techniques with a novel non-uniform weight quantizer design, by quantitatively analyzing the impact of random rotation on 2-bit weight quantization. Our weight quantizer features Learnable Direct Partitioning (LDP), which introduces learnable parameters to directly learn non-uniform intervals jointly with LLM weights. We also present a specialized GPU kernel that supports GEMV on non-uniform W2A4. Experiments show that RCP can compress LLaMA-2-7B to W2A4KV4 with a loss of only 2.84 WikiText2 ppl and 5.29 times reduced memory footprint. Furthermore, RCP can quantize challenging mobile-targeted LLaMA-3.2 models and domain-specific WizardCoder-7B and MetaMath-7B with no critical problems such as convergence failure and repetition. Code will be made available at blind_review.
MMSci: A Multimodal Multi-Discipline Dataset for PhD-Level Scientific Comprehension
Li, Zekun, Yang, Xianjun, Choi, Kyuri, Zhu, Wanrong, Hsieh, Ryan, Kim, HyeonJung, Lim, Jin Hyuk, Ji, Sungyoung, Lee, Byungju, Yan, Xifeng, Petzold, Linda Ruth, Wilson, Stephen D., Lim, Woosang, Wang, William Yang
The rapid advancement of Large Language Models (LLMs) and Large Multimodal Models (LMMs) has heightened the demand for AI-based scientific assistants capable of understanding scientific articles and figures. Despite progress, there remains a significant gap in evaluating models' comprehension of professional, graduate-level, and even PhD-level scientific content. Current datasets and benchmarks primarily focus on relatively simple scientific tasks and figures, lacking comprehensive assessments across diverse advanced scientific disciplines. To bridge this gap, we collected a multimodal, multidisciplinary dataset from open-access scientific articles published in Nature Communications journals. This dataset spans 72 scientific disciplines, ensuring both diversity and quality. We created benchmarks with various tasks and settings to comprehensively evaluate LMMs' capabilities in understanding scientific figures and content. Our evaluation revealed that these tasks are highly challenging: many open-source models struggled significantly, and even GPT-4V and GPT-4o faced difficulties. We also explored using our dataset as training resources by constructing visual instruction-following data, enabling the 7B LLaVA model to achieve performance comparable to GPT-4V/o on our benchmark. Additionally, we investigated the use of our interleaved article texts and figure images for pre-training LMMs, resulting in improvements on the material generation task. The source dataset, including articles, figures, constructed benchmarks, and visual instruction-following data, is open-sourced.
CoDiNMF: Co-Clustering of Directed Graphs via NMF
Lim, Woosang (Georgia Institute of Technology) | Du, Rundong (Georgia Institute of Technology) | Park, Haesun (Georgia Institute of Technology)
Co-clustering computes clusters of data items and the related features concurrently, and it has been used in many applications such as community detection, product recommendation, computer vision, and pricing optimization. In this paper, we propose a new co-clustering method, called CoDiNMF, which improves the clustering quality and finds directional patterns among co-clusters by using multiple directed and undirected graphs. We design the objective function of co-clustering by using min-cut criterion combined with an additional term which controls the sum of net directional flow between different co-clusters. In addition, we show that a variant of Nonnegative Matrix Factorization (NMF) can solve the proposed objective function effectively. We run experiments on the US patents and BlogCatalog data sets whose ground truth have been known, and show that CoDiNMF improves clustering results compared to other co-clustering methods in terms of average F1 score, Rand index, and adjusted Rand index (ARI). Finally, we compare CoDiNMF and other co-clustering methods on the Wikipedia data set of philosophers, and we can find meaningful directional flow of influence among co-clusters of philosophers.
Hadamard Product for Low-rank Bilinear Pooling
Kim, Jin-Hwa, On, Kyoung-Woon, Lim, Woosang, Kim, Jeonghee, Ha, Jung-Woo, Zhang, Byoung-Tak
Bilinear models provide rich representations compared with linear models. They have been applied in various visual tasks, such as object recognition, segmentation, and visual question-answering, to get state-of-the-art performances taking advantage of the expanded representations. However, bilinear representations tend to be high-dimensional, limiting the applicability to computationally complex tasks. We propose low-rank bilinear pooling using Hadamard product for an efficient attention mechanism of multimodal learning. We show that our model outperforms compact bilinear pooling in visual question-answering tasks with the state-of-the-art results on the VQA dataset, having a better parsimonious property.
Reward Shaping for Model-Based Bayesian Reinforcement Learning
Kim, Hyeoneun (KAIST) | Lim, Woosang (KAIST) | Lee, Kanghoon (KAIST) | Noh, Yung-Kyun (KAIST) | Kim, Kee-Eung (KAIST)
Bayesian reinforcement learning (BRL) provides a formal framework for optimal exploration-exploitation tradeoff in reinforcement learning. Unfortunately, it is generally intractable to find the Bayes-optimal behavior except for restricted cases. As a consequence, many BRL algorithms, model-based approaches in particular, rely on approximated models or real-time search methods. In this paper, we present potential-based shaping for improving the learning performance in model-based BRL. We propose a number of potential functions that are particularly well suited for BRL, and are domain-independent in the sense that they do not require any prior knowledge about the actual environment. By incorporating the potential function into real-time heuristic search, we show that we can significantly improve the learning performance in standard benchmark domains.