AITopics | Nam, Daniel Wontae

Collaborating Authors

Nam, Daniel Wontae

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Kanana: Compute-efficient Bilingual Language Models

Kanana LLM Team, null, Bak, Yunju, Lee, Hojin, Ryu, Minho, Ham, Jiyeon, Jung, Seungjae, Nam, Daniel Wontae, Eo, Taegyeong, Lee, Donghun, Jung, Doohae, Kim, Boseop, Kim, Nayeon, Park, Jaesun, Kim, Hyunho, Ko, Hyunwoong, Lee, Changmin, On, Kyoung-Woon, Baeg, Seulye, Cho, Junrae, Jung, Sunghee, Kang, Jieun, Kim, EungGyun, Kim, Eunhwa, Ko, Byeongil, Lee, Daniel, Lee, Minchul, Lee, Miok, Lee, Shinbok, Seo, Gaeun

arXiv.org Artificial IntelligenceFeb-28-2025

We introduce Kanana, a series of bilingual language models that demonstrate exceeding performance in Korean and competitive performance in English. The computational cost of Kanana is significantly lower than that of state-of-the-art models of similar size. The report details the techniques employed during pre-training to achieve compute-efficient yet competitive models, including high quality data filtering, staged pre-training, depth up-scaling, and pruning and distillation. Furthermore, the report outlines the methodologies utilized during the post-training of the Kanana models, encompassing supervised fine-tuning and preference optimization, aimed at enhancing their capability for seamless interaction with users. Lastly, the report elaborates on plausible approaches used for language model adaptation to specific scenarios, such as embedding, retrieval augmented generation, and function calling. The Kanana model series spans from 2.1B to 32.5B parameters with 2.1B models (base, instruct, embedding) publicly released to promote research on Korean language models.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2502.18934

Country:

Asia (0.67)
North America > United States (0.28)
North America > Mexico > Mexico City (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Binary Classifier Optimization for Large Language Model Alignment

Jung, Seungjae, Han, Gunsoo, Nam, Daniel Wontae, On, Kyoung-Woon

arXiv.org Artificial IntelligenceApr-6-2024

Aligning Large Language Models (LLMs) to human preferences through preference optimization has been crucial but labor-intensive, necessitating for each prompt a comparison of both a chosen and a rejected text completion by evaluators. Recently, Kahneman-Tversky Optimization (KTO) has demonstrated that LLMs can be aligned using merely binary "thumbs-up" or "thumbs-down" signals on each prompt-completion pair. In this paper, we present theoretical foundations to explain the successful alignment achieved through these binary signals. Our analysis uncovers a new perspective: optimizing a binary classifier, whose logit is a reward, implicitly induces minimizing the Direct Preference Optimization (DPO) loss. In the process of this discovery, we identified two techniques for effective alignment: reward shift and underlying distribution matching. Consequently, we propose a new algorithm, \textit{Binary Classifier Optimization}, that integrates the techniques. We validate our methodology in two settings: first, on a paired preference dataset, where our method performs on par with DPO and KTO; and second, on binary signal datasets simulating real-world conditions with divergent underlying distributions between thumbs-up and thumbs-down data. Our model consistently demonstrates effective and robust alignment across two base LLMs and three different binary signal datasets, showcasing the strength of our approach to learning from binary feedback.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2404.04656

Genre: Research Report > New Finding (0.46)

Industry: Banking & Finance (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Hexa: Self-Improving for Knowledge-Grounded Dialogue System

Jo, Daejin, Nam, Daniel Wontae, Han, Gunsoo, On, Kyoung-Woon, Kwon, Taehwan, Rho, Seungeun, Kim, Sungwoong

arXiv.org Artificial IntelligenceOct-22-2023

A common practice in knowledge-grounded dialogue generation is to explicitly utilize intermediate steps (e.g., web-search, memory retrieval) with modular approaches. However, data for such steps are often inaccessible compared to those of dialogue responses as they are unobservable in an ordinary dialogue. To fill in the absence of these data, we develop a self-improving method to improve the generative performances of intermediate steps without the ground truth data. In particular, we propose a novel bootstrapping scheme with a guided prompt and a modified loss function to enhance the diversity of appropriate self-generated responses. Through experiments on various benchmark datasets, we empirically demonstrate that our method successfully leverages a self-improving mechanism in generating intermediate and final responses and improves the performances on the task of knowledge-grounded dialogue generation. Along with the progress of Language Model (LM) pretraining, open-domain dialogue models have evolved to leverage the advantage of the transformer architecture's generalization ability (Zhang et al., 2019; Freitas et al., 2020; Roller et al., 2021; Xu et al., 2022a; Shuster et al., 2022b; Thoppilan et al., 2022). While model scaling also improves the dialogue quality (Freitas et al., 2020) as seen in large LMs, relying on sole LMs casts limitations such as hallucination and the lack of faithfulness by outdated training data (Brown et al., 2020; Thoppilan et al., 2022; Chowdhery et al., 2022). In order to overcome the limitations, prior works have adopted a modular design where multiple modules generate intermediate texts (e.g., to retrieve documents) before the final response (Lewis et al., 2020; Adolphs et al., 2021; Zhang et al., 2021; Shuster et al., 2022a). Among them, Komeili et al. (2022); Shuster et al. (2022b) have shown promising results in dialogue generation. Specifically, they adopted a modular design to integrate external knowledge (e.g., internet) and internal knowledge (e.g., memory) in dialogue models. For example, in Komeili et al. (2022), a LM first decides whether to access a knowledge in a form of text generation. Upon deciding to access knowledge, the LM generates an appropriate query for knowledge retrieval from external sources such as search engines. Then, the LM generates a response based on extracted knowledge from the accessed data. See Figure 2 of Appendix A for an illustrative example. Regarding each intermediate phase as a separate module, a convenient method of training these modules would be to apply supervised learning on each module using individual datasets (Dinan et al., 2019; Shuster et al., 2022a; Glass et al., 2022; Shuster et al., 2022b).

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2310.06404

Country:

Europe (0.67)
North America > United States > Pennsylvania (0.14)
North America > United States > New Mexico (0.14)
North America > United States > California (0.14)

Genre: Research Report > New Finding (0.67)

Industry:

Leisure & Entertainment > Sports > Football (1.00)
Education (0.93)
Government > Regional Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.87)

Add feedback

Effortless Integration of Memory Management into Open-Domain Conversation Systems

Choi, Eunbi, On, Kyoung-Woon, Han, Gunsoo, Kim, Sungwoong, Nam, Daniel Wontae, Jo, Daejin, Rho, Seung Eun, Kwon, Taehwan, Seo, Minjoon

arXiv.org Artificial IntelligenceMay-23-2023

Open-domain conversation systems integrate multiple conversation skills into a single system through a modular approach. One of the limitations of the system, however, is the absence of management capability for external memory. In this paper, we propose a simple method to improve BlenderBot3 by integrating memory management ability into it. Since no training data exists for this purpose, we propose an automating dataset creation for memory management. Our method 1) requires little cost for data construction, 2) does not affect performance in other tasks, and 3) reduces external memory. We show that our proposed model BlenderBot3-M^3, which is multi-task trained with memory management, outperforms BlenderBot3 with a relative 4% performance gain in terms of F1 score.

artificial intelligence, memory management, natural language, (15 more...)

arXiv.org Artificial Intelligence

2305.13973

Genre: Research Report (0.64)

Technology:

Information Technology > Hardware > Memory (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback