Goto

Collaborating Authors

 Kim, Taeho


Data-Driven Sequential Sampling for Tail Risk Mitigation

arXiv.org Machine Learning

In various operational problems, risk-sensitive decision makers often encounter the challenge of selecting an alternative with minimal tail risk from a collection of stochastic alternatives that generate random losses. Tail risk, in this context, refers to the potential for experiencing substantial losses, which will be formally defined shortly. Despite the significance of addressing this challenge, the majority of related studies still focus on identifying a subset of the alternatives with acceptable (or minimal) expected losses, rather than using tail risk as a ranking criterion. Our objective is to develop a tractable and effective solution to this problem in situations where decision makers aim to compare the alternatives based only on their tail risk. In practical scenarios, it would be ideal to apply our proposed solution to the aforementioned subset of the alternatives, which can be obtained via existing approaches, so that decision makers can ultimately find an alternative with both acceptable expected loss and minimal tail risk.


Optimizing Input Data Collection for Ranking and Selection

arXiv.org Machine Learning

We study a ranking and selection (R&S) problem when all solutions share common parametric Bayesian input models updated with the data collected from multiple independent data-generating sources. Our objective is to identify the best system by designing a sequential sampling algorithm that collects input and simulation data given a budget. We adopt the most probable best (MPB) as the estimator of the optimum and show that its posterior probability of optimality converges to one at an exponential rate as the sampling budget increases. Assuming that the input parameters belong to a finite set, we characterize the $\epsilon$-optimal static sampling ratios for input and simulation data that maximize the convergence rate. Using these ratios as guidance, we propose the optimal sampling algorithm for R&S (OSAR) that achieves the $\epsilon$-optimal ratios almost surely in the limit. We further extend OSAR by adopting the kernel ridge regression to improve the simulation output mean prediction. This not only improves OSAR's finite-sample performance, but also lets us tackle the case where the input parameters lie in a continuous space with a strong consistency guarantee for finding the optimum. We numerically demonstrate that OSAR outperforms a state-of-the-art competitor.


LLMem: Estimating GPU Memory Usage for Fine-Tuning Pre-Trained LLMs

arXiv.org Artificial Intelligence

Fine-tuning pre-trained large language models (LLMs) with limited hardware presents challenges due to GPU memory constraints. Various distributed fine-tuning methods have been proposed to alleviate memory constraints on GPU. However, determining the most effective method for achieving rapid fine-tuning while preventing GPU out-of-memory issues in a given environment remains unclear. To address this challenge, we introduce LLMem, a solution that estimates the GPU memory consumption when applying distributed fine-tuning methods across multiple GPUs and identifies the optimal method. We conduct GPU memory usage estimation prior to fine-tuning, leveraging the fundamental structure of transformer-based decoder models and the memory usage distribution of each method. Experimental results show that LLMem accurately estimates peak GPU memory usage on a single GPU, with error rates of up to 1.6%. Additionally, it shows an average error rate of 3.0% when applying distributed fine-tuning methods to LLMs with more than a billion parameters on multi-GPU setups.


HyperCLOVA X Technical Report

arXiv.org Artificial Intelligence

We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in developing their sovereign LLMs.


CPrune: Compiler-Informed Model Pruning for Efficient Target-Aware DNN Execution

arXiv.org Artificial Intelligence

Mobile devices run deep learning models for various purposes, such as image classification and speech recognition. Due to the resource constraints of mobile devices, researchers have focused on either making a lightweight deep neural network (DNN) model using model pruning or generating an efficient code using compiler optimization. Surprisingly, we found that the straightforward integration between model compression and compiler auto-tuning often does not produce the most efficient model for a target device. We propose CPrune, a compiler-informed model pruning for efficient target-aware DNN execution to support an application with a required target accuracy. CPrune makes a lightweight DNN model through informed pruning based on the structural information of subgraphs built during the compiler tuning process. Our experimental results show that CPrune increases the DNN execution speed up to 2.73x compared to the state-of-the-art TVM auto-tune while satisfying the accuracy requirement.


Selection of the Most Probable Best

arXiv.org Artificial Intelligence

We consider an expected-value ranking and selection problem where all k solutions' simulation outputs depend on a common uncertain input model. Given that the uncertainty of the input model is captured by a probability simplex on a finite support, we define the most probable best (MPB) to be the solution whose probability of being optimal is the largest. To devise an efficient sampling algorithm to find the MPB, we first derive a lower bound to the large deviation rate of the probability of falsely selecting the MPB, then formulate an optimal computing budget allocation (OCBA) problem to find the optimal static sampling ratios for all solution-input model pairs that maximize the lower bound. We devise a series of sequential algorithms that apply interpretable and computationally efficient sampling rules and prove their sampling ratios achieve the optimality conditions for the OCBA problem as the simulation budget increases. The algorithms are benchmarked against a state-of-the-art sequential sampling algorithm designed for contextual ranking and selection problems and demonstrated to have superior empirical performances at finding the MPB.