AITopics | Hwang, Hyeonbin

Collaborating Authors

Hwang, Hyeonbin

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Knowledge Entropy Decay during Language Model Pretraining Hinders New Knowledge Acquisition

Kim, Jiyeon, Lee, Hyunji, Cho, Hyowon, Jang, Joel, Hwang, Hyeonbin, Won, Seungpil, Ahn, Youbin, Lee, Dohaeng, Seo, Minjoon

arXiv.org Artificial IntelligenceDec-2-2024

In this work, we investigate how a model's tendency to broadly integrate its parametric knowledge evolves throughout pretraining, and how this behavior affects overall performance, particularly in terms of knowledge acquisition and forgetting. We introduce the concept of knowledge entropy, which quantifies the range of memory sources the model engages with; high knowledge entropy indicates that the model utilizes a wide range of memory sources, while low knowledge entropy suggests reliance on specific sources with greater certainty. Our analysis reveals a consistent decline in knowledge entropy as pretraining advances. We also find that the decline is closely associated with a reduction in the model's ability to acquire and retain knowledge, leading us to conclude that diminishing knowledge entropy (smaller number of active memory sources) impairs the model's knowledge acquisition and retention capabilities. We find further support for this by demonstrating that increasing the activity of inactive memory sources enhances the model's capacity for knowledge acquisition and retention.

knowledge management, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2410.0138

Country:

Europe (1.00)
Asia > Middle East > UAE (0.14)
North America > Mexico > Mexico City (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Knowledge Management > Knowledge Engineering (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models

Kim, Seungone, Suk, Juyoung, Cho, Ji Yong, Longpre, Shayne, Kim, Chaeeun, Yoon, Dongkeun, Son, Guijin, Cho, Yejin, Shafayat, Sheikh, Baek, Jinheon, Park, Sue Hyun, Hwang, Hyeonbin, Jo, Jinkyung, Cho, Hyowon, Shin, Haebin, Lee, Seongyun, Oh, Hanseok, Lee, Noah, Ho, Namgyu, Joo, Se June, Ko, Miyoung, Lee, Yoonjoo, Chae, Hyungjoo, Shin, Jamin, Jang, Joel, Ye, Seonghyeon, Lin, Bill Yuchen, Welleck, Sean, Neubig, Graham, Lee, Moontae, Lee, Kyungjae, Seo, Minjoon

arXiv.org Artificial IntelligenceJun-9-2024

As language models (LMs) become capable of handling a wide range of tasks, their evaluation is becoming as challenging as their development. Most generation benchmarks currently assess LMs using abstract evaluation criteria like helpfulness and harmlessness, which often lack the flexibility and granularity of human assessment. Additionally, these benchmarks tend to focus disproportionately on specific capabilities such as instruction following, leading to coverage bias. To overcome these limitations, we introduce the BiGGen Bench, a principled generation benchmark designed to thoroughly evaluate nine distinct capabilities of LMs across 77 diverse tasks. A key feature of the BiGGen Bench is its use of instance-specific evaluation criteria, closely mirroring the nuanced discernment of human evaluation. We apply this benchmark to assess 103 frontier LMs using five evaluator LMs. Our code, data, and evaluation results are all publicly available at https://github.com/prometheus-eval/prometheus-eval/tree/main/BiGGen-Bench.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2406.05761

Country:

Europe (1.00)
North America > United States > Illinois (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Education (0.92)
Energy > Power Industry (0.67)
Energy > Renewable > Biofuel (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Investigating the Effectiveness of Task-Agnostic Prefix Prompt for Instruction Following

Ye, Seonghyeon, Hwang, Hyeonbin, Yang, Sohee, Yun, Hyeongu, Kim, Yireun, Seo, Minjoon

arXiv.org Artificial IntelligenceDec-24-2023

In this paper, we present our finding that prepending a Task-Agnostic Prefix Prompt (TAPP) to the input improves the instruction-following ability of various Large Language Models (LLMs) during inference. TAPP is different from canonical prompts for LLMs in that it is a fixed prompt prepended to the beginning of every input regardless of the target task for zero-shot generalization. We observe that both base LLMs (i.e. not fine-tuned to follow instructions) and instruction-tuned models benefit from TAPP, resulting in 34.58% and 12.26% improvement on average, respectively. This implies that the instruction-following ability of LLMs can be improved during inference time with a fixed prompt constructed with simple heuristics. We hypothesize that TAPP assists language models to better estimate the output distribution by focusing more on the instruction of the target task during inference. In other words, such ability does not seem to be sufficiently activated in not only base LLMs but also many instruction-fine-tuned LLMs. All experiments are reproducible from https://github.com/seonghyeonye/TAPP.

demonstration, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2302.14691

Country:

North America > United States (1.00)
Asia (1.00)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine (1.00)
Government > Regional Government (0.93)
Education > Educational Setting (0.68)
Leisure & Entertainment > Sports (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets

Ye, Seonghyeon, Kim, Doyoung, Kim, Sungdong, Hwang, Hyeonbin, Kim, Seungone, Jo, Yongrae, Thorne, James, Kim, Juho, Seo, Minjoon

arXiv.org Artificial IntelligenceOct-4-2023

Evaluation of Large Language Models (LLMs) is challenging because instruction-following necessitates alignment with human values and the required set of skills varies depending on the instruction. However, previous studies have mainly focused on coarse-grained evaluation (i.e. overall preference-based evaluation), which limits interpretability since it does not consider the nature of user instructions that require instance-wise skill composition. In this paper, we introduce FLASK (Fine-grained Language Model Evaluation based on Alignment Skill Sets), a fine-grained evaluation protocol for both human-based and model-based evaluation which decomposes coarse-level scoring to a skill set-level scoring for each instruction. We experimentally observe that the fine-graininess of evaluation is crucial for attaining a holistic view of model performance and increasing the reliability of the evaluation. Using FLASK, we compare multiple open-source and proprietary LLMs and observe a high correlation between model-based and human-based evaluations. We publicly release the evaluation data and code implementation at https://github.com/kaistAI/FLASK.

fine-grained language model evaluation, large language model, natural language, (3 more...)

arXiv.org Artificial Intelligence

2307.10928

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.73)

Add feedback

MED-SE: Medical Entity Definition-based Sentence Embedding

Hwang, Hyeonbin, Yoo, Haanju, Choi, Yera

arXiv.org Artificial IntelligenceDec-9-2022

We propose Medical Entity Definition-based Sentence Embedding (MED-SE), a novel unsupervised contrastive learning framework designed for clinical texts, which exploits the definitions of medical entities. To this end, we conduct an extensive analysis of multiple sentence embedding techniques in clinical semantic textual similarity (STS) settings. In the entity-centric setting that we have designed, MED-SE achieves significantly better performance, while the existing unsupervised methods including SimCSE show degraded performance. Our experiments elucidate the inherent discrepancies between the general- and clinical-domain texts, and suggest that entity-centric contrastive approaches may help bridge this gap and lead to a better representation of clinical sentences.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2212.04734

Country: North America (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback