AITopics | Lee, Deokjae

Collaborating Authors

Lee, Deokjae

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Q-Palette: Fractional-Bit Quantizers Toward Optimal Bit Allocation for Efficient LLM Deployment

Lee, Deokjae, Song, Hyun Oh

arXiv.org Artificial IntelligenceOct-23-2025

We study weight-only post-training quantization (PTQ), which quantizes the weights of a large language model (LLM) without retraining, using little or no calibration data. Weight-only PTQ is crucial for reducing the memory footprint and latency of LLM inference, especially in memory-bound, small-batch inference scenarios, such as personalized inference on edge devices. Despite its importance, irregular weight distributions with heavy-tailed outliers in LLMs complicate quantization, recently motivating rotation-based methods that transform weights into near-Gaussian distributions, which are more regular with fewer outliers, thereby reducing quantization error. In this work, we first derive the information-theoretically optimal bit allocation for Gaussianized weights under given bit budgets, revealing that fine-grained fractional-bit quantizers approaching the Gaussian distortion-rate bound are essential to achieve near-optimal quantization performance. To bridge this theoretical insight and practical implementation, we introduce Q-Palette, a versatile collection of fractional-bit quantizers that range from trellis-coded quantizers offering near-optimal distortion to simpler vector and scalar quantizers optimized for faster inference, all efficiently implemented with optimized CUDA kernels across various bitwidths. Furthermore, leveraging Q-Palette as a foundational component, we propose a novel mixed-scheme quantization framework, jointly optimizing quantizer choices and layer fusion decisions given resource constraints. The code is available at https://github.com/snu-mllab/Q-Palette.

large language model, machine learning, quantization, (22 more...)

arXiv.org Artificial Intelligence

2509.20214

Country:

Asia > South Korea > Seoul > Seoul (0.04)
North America > United States > Virginia (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

GuidedQuant: Large Language Model Quantization via Exploiting End Loss Guidance

Kim, Jinuk, Halabi, Marwa El, Park, Wonpyo, Schaefer, Clemens JS, Lee, Deokjae, Park, Yeonhong, Lee, Jae W., Song, Hyun Oh

arXiv.org Artificial IntelligenceSep-23-2025

Post-training quantization is a key technique for reducing the memory and inference latency of large language models by quantizing weights and activations without requiring retraining. However, existing methods either (1) fail to account for the varying importance of hidden features to the end loss or, when incorporating end loss, (2) neglect the critical interactions between model weights. To address these limitations, we propose GuidedQuant, a novel quantization approach that integrates gradient information from the end loss into the quantization objective while preserving cross-weight dependencies within output channels. GuidedQuant consistently boosts the performance of state-of-the-art quantization methods across weight-only scalar, weight-only vector, and weight-and-activation quantization. Additionally, we introduce a novel non-uniform scalar quantization algorithm, which is guaranteed to monotonically decrease the quantization objective value, and outperforms existing methods in this category. We release the code at https://github.com/snu-mllab/GuidedQuant.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2505.07004

Country:

Asia > South Korea > Seoul > Seoul (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Training Greedy Policy for Proposal Batch Selection in Expensive Multi-Objective Combinatorial Optimization

Lee, Deokjae, Song, Hyun Oh, Cho, Kyunghyun

arXiv.org Artificial IntelligenceJun-21-2024

These problems focus on identifying designs, represented as discrete objects like strings or graphs, that optimize multiple Active learning is increasingly adopted for expensive attributes, often requiring substantial resources for accurate multi-objective combinatorial optimization assessment (Ehrgott, 2005; Gómez-Bombarelli et al., 2016; problems, but it involves a challenging subset Stanton et al., 2022; Winter et al., 2019; Mirhoseini et al., selection problem, optimizing the batch acquisition 2021). Active learning frameworks, which iteratively propose score that quantifies the goodness of a batch of candidates and learn from the attributes a batch for evaluation. Due to the excessively evaluated on those candidates, are increasingly employed in large search space of the subset selection problem, these fields due to their query efficiency, which is a critical prior methods optimize the batch acquisition component to handling expensive evaluation costs (Aggarwal on the latent space, which has discrepancies with et al., 2014; Jain et al., 2022; Gruver et al., 2023; Zhu the actual space, or optimize individual acquisition et al., 2023; Agnesina et al., 2023). In active learning, each scores without considering the dependencies round entails an internal problem of selecting a proposal among candidates in a batch instead of directly batch of candidates for querying, formulated by cardinalityconstrained optimizing the batch acquisition.

algorithm, proposal batch selection, training greedy policy, (13 more...)

arXiv.org Artificial Intelligence

2406.14876

Country:

Europe > Austria > Vienna (0.14)
Asia > South Korea > Seoul > Seoul (0.04)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.67)
Semiconductors & Electronics (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Efficient Latency-Aware CNN Depth Compression via Two-Stage Dynamic Programming

Kim, Jinuk, Jeong, Yeonwoo, Lee, Deokjae, Song, Hyun Oh

arXiv.org Artificial IntelligenceJun-2-2023

Recent works on neural network pruning advocate that reducing the depth of the network is more effective in reducing run-time memory usage and accelerating inference latency than reducing the width of the network through channel pruning. In this regard, some recent works propose depth compression algorithms that merge convolution layers. However, the existing algorithms have a constricted search space and rely on human-engineered heuristics. In this paper, we propose a novel depth compression algorithm which targets general convolution operations. We propose a subset selection problem that replaces inefficient activation layers with identity functions and optimally merges consecutive convolution operations into shallow equivalent convolution operations for efficient end-to-end inference latency. Since the proposed subset selection problem is NP-hard, we formulate a surrogate optimization problem that can be solved exactly via two-stage dynamic programming within a few seconds. We evaluate our methods and baselines by TensorRT for a fair inference latency comparison. Our method outperforms the baseline method with higher accuracy and faster inference speed in MobileNetV2 on the ImageNet dataset. Specifically, we achieve $1.41\times$ speed-up with $0.11$\%p accuracy gain in MobileNetV2-1.0 on the ImageNet.

artificial intelligence, machine learning, optimization problem, (16 more...)

arXiv.org Artificial Intelligence

2301.12187

Country:

North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Vision (0.92)

Add feedback

Query-Efficient Black-Box Red Teaming via Bayesian Optimization

Lee, Deokjae, Lee, JunYeong, Ha, Jung-Woo, Kim, Jin-Hwa, Lee, Sang-Woo, Lee, Hwaran, Song, Hyun Oh

arXiv.org Artificial IntelligenceMay-27-2023

The deployment of large-scale generative models is often restricted by their potential risk of causing harm to users in unpredictable ways. We focus on the problem of black-box red teaming, where a red team generates test cases and interacts with the victim model to discover a diverse set of failures with limited query access. Existing red teaming methods construct test cases based on human supervision or language model (LM) and query all test cases in a brute-force manner without incorporating any information from past evaluations, resulting in a prohibitively large number of queries. To this end, we propose Bayesian red teaming (BRT), novel query-efficient black-box red teaming methods based on Bayesian optimization, which iteratively identify diverse positive test cases leading to model failures by utilizing the pre-defined user input pool and the past evaluations. Experimental results on various user input pools demonstrate that our method consistently finds a significantly larger number of diverse positive test cases under the limited query budget than the baseline methods. The source code is available at https://github.com/snu-mllab/Bayesian-Red-Teaming.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2305.17444

Country: Asia > South Korea > Seoul > Seoul (0.04)

Genre:

Research Report (1.00)
Personal > Interview (1.00)

Industry:

Transportation > Air (0.81)
Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback