Park, Dongju
HyperCLOVA X Technical Report
Yoo, Kang Min, Han, Jaegeun, In, Sookyo, Jeon, Heewon, Jeong, Jisu, Kang, Jaewook, Kim, Hyunwook, Kim, Kyung-Min, Kim, Munhyong, Kim, Sungju, Kwak, Donghyun, Kwak, Hanock, Kwon, Se Jung, Lee, Bado, Lee, Dongsoo, Lee, Gichang, Lee, Jooho, Park, Baeseong, Shin, Seongjin, Yu, Joonsang, Baek, Seolki, Byeon, Sumin, Cho, Eungsup, Choe, Dooseok, Han, Jeesung, Jin, Youngkyun, Jun, Hyein, Jung, Jaeseung, Kim, Chanwoong, Kim, Jinhong, Kim, Jinuk, Lee, Dokyeong, Park, Dongwook, Sohn, Jeong Min, Han, Sujung, Heo, Jiae, Hong, Sungju, Jeon, Mina, Jung, Hyunhoon, Jung, Jungeun, Jung, Wangkyo, Kim, Chungjoon, Kim, Hyeri, Kim, Jonghyun, Kim, Min Young, Lee, Soeun, Park, Joonhee, Shin, Jieun, Yang, Sojin, Yoon, Jungsoon, Lee, Hwaran, Bae, Sanghwan, Cha, Jeehwan, Gylleus, Karl, Ham, Donghoon, Hong, Mihak, Hong, Youngki, Hong, Yunki, Jang, Dahyun, Jeon, Hyojun, Jeon, Yujin, Jeong, Yeji, Ji, Myunggeun, Jin, Yeguk, Jo, Chansong, Joo, Shinyoung, Jung, Seunghwan, Kim, Adrian Jungmyung, Kim, Byoung Hoon, Kim, Hyomin, Kim, Jungwhan, Kim, Minkyoung, Kim, Minseung, Kim, Sungdong, Kim, Yonghee, Kim, Youngjun, Kim, Youngkwan, Ko, Donghyeon, Lee, Dughyun, Lee, Ha Young, Lee, Jaehong, Lee, Jieun, Lee, Jonghyun, Lee, Jongjin, Lee, Min Young, Lee, Yehbin, Min, Taehong, Min, Yuri, Moon, Kiyoon, Oh, Hyangnam, Park, Jaesun, Park, Kyuyon, Park, Younghun, Seo, Hanbae, Seo, Seunghyun, Sim, Mihyun, Son, Gyubin, Yeo, Matt, Yeom, Kyung Hoon, Yoo, Wonjoon, You, Myungin, Ahn, Doheon, Ahn, Homin, Ahn, Joohee, Ahn, Seongmin, An, Chanwoo, An, Hyeryun, An, Junho, An, Sang-Min, Byun, Boram, Byun, Eunbin, Cha, Jongho, Chang, Minji, Chang, Seunggyu, Cho, Haesong, Cho, Youngdo, Choi, Dalnim, Choi, Daseul, Choi, Hyoseok, Choi, Minseong, Choi, Sangho, Choi, Seongjae, Choi, Wooyong, Chun, Sewhan, Go, Dong Young, Ham, Chiheon, Han, Danbi, Han, Jaemin, Hong, Moonyoung, Hong, Sung Bum, Hwang, Dong-Hyun, Hwang, Seongchan, Im, Jinbae, Jang, Hyuk Jin, Jang, Jaehyung, Jang, Jaeni, Jang, Sihyeon, Jang, Sungwon, Jeon, Joonha, Jeong, Daun, Jeong, Joonhyun, Jeong, Kyeongseok, Jeong, Mini, Jin, Sol, Jo, Hanbyeol, Jo, Hanju, Jo, Minjung, Jung, Chaeyoon, Jung, Hyungsik, Jung, Jaeuk, Jung, Ju Hwan, Jung, Kwangsun, Jung, Seungjae, Ka, Soonwon, Kang, Donghan, Kang, Soyoung, Kil, Taeho, Kim, Areum, Kim, Beomyoung, Kim, Byeongwook, Kim, Daehee, Kim, Dong-Gyun, Kim, Donggook, Kim, Donghyun, Kim, Euna, Kim, Eunchul, Kim, Geewook, Kim, Gyu Ri, Kim, Hanbyul, Kim, Heesu, Kim, Isaac, Kim, Jeonghoon, Kim, Jihye, Kim, Joonghoon, Kim, Minjae, Kim, Minsub, Kim, Pil Hwan, Kim, Sammy, Kim, Seokhun, Kim, Seonghyeon, Kim, Soojin, Kim, Soong, Kim, Soyoon, Kim, Sunyoung, Kim, Taeho, Kim, Wonho, Kim, Yoonsik, Kim, You Jin, Kim, Yuri, Kwon, Beomseok, Kwon, Ohsung, Kwon, Yoo-Hwan, Lee, Anna, Lee, Byungwook, Lee, Changho, Lee, Daun, Lee, Dongjae, Lee, Ha-Ram, Lee, Hodong, Lee, Hwiyeong, Lee, Hyunmi, Lee, Injae, Lee, Jaeung, Lee, Jeongsang, Lee, Jisoo, Lee, Jongsoo, Lee, Joongjae, Lee, Juhan, Lee, Jung Hyun, Lee, Junghoon, Lee, Junwoo, Lee, Se Yun, Lee, Sujin, Lee, Sungjae, Lee, Sungwoo, Lee, Wonjae, Lee, Zoo Hyun, Lim, Jong Kun, Lim, Kun, Lim, Taemin, Na, Nuri, Nam, Jeongyeon, Nam, Kyeong-Min, Noh, Yeonseog, Oh, Biro, Oh, Jung-Sik, Oh, Solgil, Oh, Yeontaek, Park, Boyoun, Park, Cheonbok, Park, Dongju, Park, Hyeonjin, Park, Hyun Tae, Park, Hyunjung, Park, Jihye, Park, Jooseok, Park, Junghwan, Park, Jungsoo, Park, Miru, Park, Sang Hee, Park, Seunghyun, Park, Soyoung, Park, Taerim, Park, Wonkyeong, Ryu, Hyunjoon, Ryu, Jeonghun, Ryu, Nahyeon, Seo, Soonshin, Seo, Suk Min, Shim, Yoonjeong, Shin, Kyuyong, Shin, Wonkwang, Sim, Hyun, Sim, Woongseob, Soh, Hyejin, Son, Bokyong, Son, Hyunjun, Son, Seulah, Song, Chi-Yun, Song, Chiyoung, Song, Ka Yeon, Song, Minchul, Song, Seungmin, Wang, Jisung, Yeo, Yonggoo, Yi, Myeong Yeon, Yim, Moon Bin, Yoo, Taehwan, Yoo, Youngjoon, Yoon, Sungmin, Yoon, Young Jin, Yu, Hangyeol, Yu, Ui Seon, Zuo, Xingdong, Bae, Jeongin, Bae, Joungeun, Cho, Hyunsoo, Cho, Seonghyun, Cho, Yongjin, Choi, Taekyoon, Choi, Yera, Chung, Jiwan, Han, Zhenghui, Heo, Byeongho, Hong, Euisuk, Hwang, Taebaek, Im, Seonyeol, Jegal, Sumin, Jeon, Sumin, Jeong, Yelim, Jeong, Yonghyun, Jiang, Can, Jiang, Juyong, Jin, Jiho, Jo, Ara, Jo, Younghyun, Jung, Hoyoun, Jung, Juyoung, Kang, Seunghyeong, Kim, Dae Hee, Kim, Ginam, Kim, Hangyeol, Kim, Heeseung, Kim, Hyojin, Kim, Hyojun, Kim, Hyun-Ah, Kim, Jeehye, Kim, Jin-Hwa, Kim, Jiseon, Kim, Jonghak, Kim, Jung Yoon, Kim, Rak Yeong, Kim, Seongjin, Kim, Seoyoon, Kim, Sewon, Kim, Sooyoung, Kim, Sukyoung, Kim, Taeyong, Ko, Naeun, Koo, Bonseung, Kwak, Heeyoung, Kwon, Haena, Kwon, Youngjin, Lee, Boram, Lee, Bruce W., Lee, Dagyeong, Lee, Erin, Lee, Euijin, Lee, Ha Gyeong, Lee, Hyojin, Lee, Hyunjeong, Lee, Jeeyoon, Lee, Jeonghyun, Lee, Jongheok, Lee, Joonhyung, Lee, Junhyuk, Lee, Mingu, Lee, Nayeon, Lee, Sangkyu, Lee, Se Young, Lee, Seulgi, Lee, Seung Jin, Lee, Suhyeon, Lee, Yeonjae, Lee, Yesol, Lee, Youngbeom, Lee, Yujin, Li, Shaodong, Liu, Tianyu, Moon, Seong-Eun, Moon, Taehong, Nihlenramstroem, Max-Lasse, Oh, Wonseok, Oh, Yuri, Park, Hongbeen, Park, Hyekyung, Park, Jaeho, Park, Nohil, Park, Sangjin, Ryu, Jiwon, Ryu, Miru, Ryu, Simo, Seo, Ahreum, Seo, Hee, Seo, Kangdeok, Shin, Jamin, Shin, Seungyoun, Sin, Heetae, Wang, Jiangping, Wang, Lei, Xiang, Ning, Xiao, Longxiang, Xu, Jing, Yi, Seonyeong, Yoo, Haanju, Yoo, Haneul, Yoo, Hwanhee, Yu, Liang, Yu, Youngjae, Yuan, Weijie, Zeng, Bo, Zhou, Qian, Cho, Kyunghyun, Ha, Jung-Woo, Park, Joonsuk, Hwang, Jihyun, Kwon, Hyoung Jo, Kwon, Soonyong, Lee, Jungyeon, Lee, Seungho, Lim, Seonghyeon, Noh, Hyunkyung, Choi, Seungho, Lee, Sang-Woo, Lim, Jung Hwa, Sung, Nako
We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in developing their sovereign LLMs.
GPT3Mix: Leveraging Large-scale Language Models for Text Augmentation
Yoo, Kang Min, Park, Dongju, Kang, Jaewook, Lee, Sang-Woo, Park, Woomyeong
Large-scale language models such as GPT-3 are excellent few-shot learners, allowing them to be controlled via natural text prompts. Recent studies report that prompt-based direct classification eliminates the need for fine-tuning but lacks data and inference scalability. This paper proposes a novel data augmentation technique that leverages large-scale language models to generate realistic text samples from a mixture of real samples. We also propose utilizing soft-labels predicted by the language models, effectively distilling knowledge from the large-scale language models and creating textual perturbations simultaneously. We perform data augmentation experiments on diverse classification tasks and show that our method hugely outperforms existing text augmentation methods. Ablation studies and a qualitative analysis provide more insights into our approach.