AITopics | Song, Seyoung

Collaborating Authors

Song, Seyoung

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Knowledge-Aware Iterative Retrieval for Multi-Agent Systems

Song, Seyoung

arXiv.org Artificial IntelligenceMar-17-2025

Large Language Models (LLMs) are probabilistic language generation models that do not incorporate explicit reasoning systems or logical planning modules. Consequently, in tasks that require synthesizing information over multiple steps, the reasoning performed at each stage is not clearly delineated, and intermediate reasoning occurs implicitly, making the process susceptible to errors. Furthermore, the difficulty of rigorously validating each step exacerbates the accumulation of errors throughout the overall process. To overcome these challenges, it is often necessary to retrieve external knowledge that compensates for the inherent limitations of LLMs, especially in real-world scenarios. Approaches such as Retrieval Augmented Generation (RAG) play a significant role by acquiring information not contained within the model in real time, thereby enabling more precise responses. Multi-step question answering (QA) is a representative challenge that demands both high precision in intermediate reasoning and the integration of diverse information. It not only exposes the limitations of LLMs but has also emerged as an important benchmark for real-world problems that seek to transcend these limitations. In this context, we propose Knowledge-Aware Iterative Retrieval for Multi-Agent Systems, a retrieval optimization system that employs an agent-based framework. It iteratively optimizes search queries through agent-guided knowledge accumulation, with a focus on query refinement, the iterative process of modifying or enhancing an initial query to improve search results.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2503.13275

Country:

Europe > Italy (0.14)
Asia > China (0.14)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

Add feedback

LLM-C3MOD: A Human-LLM Collaborative System for Cross-Cultural Hate Speech Moderation

Park, Junyeong, Jeong, Seogyeong, Song, Seyoung, Lee, Yohan, Oh, Alice

arXiv.org Artificial IntelligenceMar-10-2025

Content moderation is a global challenge, yet major tech platforms prioritize high-resource languages, leaving low-resource languages with scarce native moderators. Since effective moderation depends on understanding contextual cues, this imbalance increases the risk of improper moderation due to non-native moderators' limited cultural understanding. Through a user study, we identify that non-native moderators struggle with interpreting culturally-specific knowledge, sentiment, and internet culture in the hate speech moderation. To assist them, we present LLM-C3MOD, a human-LLM collaborative pipeline with three steps: (1) RAG-enhanced cultural context annotations; (2) initial LLM-based moderation; and (3) targeted human moderation for cases lacking LLM consensus. Evaluated on a Korean hate speech dataset with Indonesian and German participants, our system achieves 78% accuracy (surpassing GPT-4o's 71% baseline), while reducing human workload by 83.6%. Notably, human moderators excel at nuanced contents where LLMs struggle. Our findings suggest that non-native moderators, when properly supported by LLMs, can effectively contribute to cross-cultural hate speech moderation.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2503.07237

Country:

Europe (1.00)
Asia > North Korea (0.46)
North America > United States (0.29)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Law Enforcement & Public Safety (0.93)
Government (0.67)
Media > Music (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

HERITAGE: An End-to-End Web Platform for Processing Korean Historical Documents in Hanja

Song, Seyoung, Yoo, Haneul, Jin, Jiho, Cho, Kyunghyun, Oh, Alice

arXiv.org Artificial IntelligenceJan-21-2025

While Korean historical documents are invaluable cultural heritage, understanding those documents requires in-depth Hanja expertise. Hanja is an ancient language used in Korea before the 20th century, whose characters were borrowed from old Chinese but had evolved in Korea for centuries. Modern Koreans and Chinese cannot understand Korean historical documents without substantial additional help, and while previous efforts have produced some Korean and English translations, this requires in-depth expertise, and so most of the documents are not translated into any modern language. To address this gap, we present HERITAGE, the first open-source Hanja NLP toolkit to assist in understanding and translating the unexplored Korean historical documents written in Hanja. HERITAGE is a web-based platform providing model predictions of three critical tasks in historical document understanding via Hanja language models: punctuation restoration, named entity recognition, and machine translation (MT). HERITAGE also provides an interactive glossary, which provides the character-level reading of the Hanja characters in modern Korean, as well as character-level English definition. HERITAGE serves two purposes. First, anyone interested in these documents can get a general understanding from the model predictions and the interactive glossary, especially MT outputs in Korean and English. Second, since the model outputs are not perfect, Hanja experts can revise them to produce better annotations and translations. This would boost the translation efficiency and potentially lead to most of the historical documents being translated into modern languages, lowering the barrier on unexplored Korean historical documents.

artificial intelligence, computational linguistic, natural language, (12 more...)

arXiv.org Artificial Intelligence

2501.11951

Country:

Asia (0.93)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.50)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

When Does Classical Chinese Help? Quantifying Cross-Lingual Transfer in Hanja and Kanbun

Song, Seyoung, Yoo, Haneul, Jin, Jiho, Cho, Kyunghyun, Oh, Alice

arXiv.org Artificial IntelligenceNov-7-2024

Historical and linguistic connections within the Sinosphere have led researchers to use Classical Chinese resources for cross-lingual transfer when processing historical documents from Korea and Japan. In this paper, we question the assumption of cross-lingual transferability from Classical Chinese to Hanja and Kanbun, the ancient written languages of Korea and Japan, respectively. Our experiments across machine translation, named entity recognition, and punctuation restoration tasks show minimal impact of Classical Chinese datasets on language model performance for ancient Korean documents written in Hanja, with performance differences within $\pm{}0.0068$ F1-score for sequence labeling tasks and up to $+0.84$ BLEU score for translation. These limitations persist consistently across various model sizes, architectures, and domain-specific datasets. Our analysis reveals that the benefits of Classical Chinese resources diminish rapidly as local language data increases for Hanja, while showing substantial improvements only in extremely low-resource scenarios for both Korean and Japanese historical documents. These mixed results emphasize the need for careful empirical validation rather than assuming benefits from indiscriminate cross-lingual transfer.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2411.04822

Country:

Asia (1.00)
Europe (0.93)
North America > United States > New York (0.14)
North America > United States > Pennsylvania (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback