AITopics | Kim, Chaeeun

Collaborating Authors

Kim, Chaeeun

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models

Kim, Seungone, Suk, Juyoung, Cho, Ji Yong, Longpre, Shayne, Kim, Chaeeun, Yoon, Dongkeun, Son, Guijin, Cho, Yejin, Shafayat, Sheikh, Baek, Jinheon, Park, Sue Hyun, Hwang, Hyeonbin, Jo, Jinkyung, Cho, Hyowon, Shin, Haebin, Lee, Seongyun, Oh, Hanseok, Lee, Noah, Ho, Namgyu, Joo, Se June, Ko, Miyoung, Lee, Yoonjoo, Chae, Hyungjoo, Shin, Jamin, Jang, Joel, Ye, Seonghyeon, Lin, Bill Yuchen, Welleck, Sean, Neubig, Graham, Lee, Moontae, Lee, Kyungjae, Seo, Minjoon

arXiv.org Artificial IntelligenceJun-9-2024

As language models (LMs) become capable of handling a wide range of tasks, their evaluation is becoming as challenging as their development. Most generation benchmarks currently assess LMs using abstract evaluation criteria like helpfulness and harmlessness, which often lack the flexibility and granularity of human assessment. Additionally, these benchmarks tend to focus disproportionately on specific capabilities such as instruction following, leading to coverage bias. To overcome these limitations, we introduce the BiGGen Bench, a principled generation benchmark designed to thoroughly evaluate nine distinct capabilities of LMs across 77 diverse tasks. A key feature of the BiGGen Bench is its use of instance-specific evaluation criteria, closely mirroring the nuanced discernment of human evaluation. We apply this benchmark to assess 103 frontier LMs using five evaluator LMs. Our code, data, and evaluation results are all publicly available at https://github.com/prometheus-eval/prometheus-eval/tree/main/BiGGen-Bench.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2406.05761

Country:

Europe (1.00)
North America > United States > Illinois (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Education (0.92)
Energy > Power Industry (0.67)
Energy > Renewable > Biofuel (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Exploring the Practicality of Generative Retrieval on Dynamic Corpora

Yoon, Soyoung, Kim, Chaeeun, Lee, Hyunji, Jang, Joel, Yang, Sohee, Seo, Minjoon

arXiv.org Artificial IntelligenceNov-16-2023

Benchmarking the performance of information retrieval (IR) methods are mostly conducted with a fixed set of documents (static corpora); in realistic scenarios, this is rarely the case and the document to be retrieved are constantly updated and added. In this paper, we focus on conducting a comprehensive comparison between two categories of contemporary retrieval systems, Dual Encoders (DE) and Generative Retrievals (GR), in a dynamic scenario where the corpora to be retrieved is updated. We also conduct an extensive evaluation of computational and memory efficiency, crucial factors for IR systems for real-world deployment. Our results demonstrate that GR is more adaptable to evolving knowledge (+13-18% on the StreamingQA Benchmark), robust in handling data with temporal information (x 10 times), and efficient in terms of memory (x 4 times), indexing time (x 6 times), and inference flops (x 10 times). Our paper highlights GR's potential for future use in practical IR systems.

information retrieval, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2305.18952

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Health & Medicine (0.94)
Government > Immigration & Customs (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

How Well Do Large Language Models Truly Ground?

Lee, Hyunji, Joo, Sejune, Kim, Chaeeun, Jang, Joel, Kim, Doyoung, On, Kyoung-Woon, Seo, Minjoon

arXiv.org Artificial IntelligenceNov-15-2023

Reliance on the inherent knowledge of Large Language Models (LLMs) can cause issues such as hallucinations, lack of control, and difficulties in integrating variable knowledge. To mitigate this, LLMs can be probed to generate responses by grounding on external context, often given as input (knowledge-augmented models). Yet, previous research is often confined to a narrow view of the term "grounding", often only focusing on whether the response contains the correct answer or not, which does not ensure the reliability of the entire response. To address this limitation, we introduce a strict definition of grounding: a model is considered truly grounded when its responses (1) fully utilize necessary knowledge from the provided context, and (2) don't exceed the knowledge within the contexts. We introduce a new dataset and a grounding metric to assess this new definition and perform experiments across 13 LLMs of different sizes and training methods to provide insights into the factors that influence grounding performance. Our findings contribute to a better understanding of how to improve grounding capabilities and suggest an area of improvement toward more reliable and controllable LLM applications.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2311.09069

Country:

Asia > Middle East > Iraq (0.14)
Europe > Austria > Vienna (0.14)

Genre: Research Report > New Finding (0.48)

Industry:

Energy > Oil & Gas > Upstream (0.94)
Leisure & Entertainment > Sports (0.93)
Government > Regional Government (0.93)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback