AITopics | Choi, Seungtaek

Collaborating Authors

Choi, Seungtaek

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Towards Fully-Automated Materials Discovery via Large-Scale Synthesis Dataset and Expert-Level LLM-as-a-Judge

Kim, Heegyu, Jeon, Taeyang, Choi, Seungtaek, Hong, Ji Hoon, Jeon, Dong Won, Baek, Ga-Yeon, Kwak, Gyeong-Won, Lee, Dong-Hee, Bae, Jisu, Lee, Chihoon, Kim, Yunseo, Choi, Seon-Jin, Park, Jin-Seong, Cho, Sung Beom, Cho, Hyunsouk

arXiv.org Artificial IntelligenceMar-19-2025

Materials synthesis is vital for innovations such as energy storage, catalysis, electronics, and biomedical devices. Yet, the process relies heavily on empirical, trial-and-error methods guided by expert intuition. Our work aims to support the materials science community by providing a practical, data-driven resource. We have curated a comprehensive dataset of 17K expert-verified synthesis recipes from open-access literature, which forms the basis of our newly developed benchmark, AlchemyBench. AlchemyBench offers an end-to-end framework that supports research in large language models applied to synthesis prediction. It encompasses key tasks, including raw materials and equipment prediction, synthesis procedure generation, and characterization outcome forecasting. We propose an LLM-as-a-Judge framework that leverages large language models for automated evaluation, demonstrating strong statistical agreement with expert assessments. Overall, our contributions offer a supportive foundation for exploring the capabilities of LLMs in predicting and guiding materials synthesis, ultimately paving the way for more efficient experimental design and accelerated innovation in materials science.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2502.16457

Country: Asia > South Korea (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Energy > Energy Storage (0.48)
Health & Medicine > Health Care Technology (0.48)
Materials > Chemicals (0.46)
Energy > Oil & Gas > Upstream (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

FLEX: Expert-level False-Less EXecution Metric for Reliable Text-to-SQL Benchmark

Kim, Heegyu, Jeon, Taeyang, Choi, Seunghwan, Choi, Seungtaek, Cho, Hyunsouk

arXiv.org Artificial IntelligenceOct-28-2024

Text-to-SQL systems have become crucial for translating natural language into SQL queries in various industries, enabling non-technical users to perform complex data operations. The need for accurate evaluation methods has increased as these systems have grown more sophisticated. However, the Execution Accuracy (EX), the most prevalent evaluation metric, still shows many false positives and negatives. Thus, this paper introduces FLEX (False-Less EXecution), a novel approach to evaluating text-to-SQL systems using large language models (LLMs) to emulate human expert-level evaluation of SQL queries. Our metric improves agreement with human experts (from 62 to 87.04 in Cohen's kappa) with comprehensive context and sophisticated criteria. Our extensive experiments yield several key insights: (1) Models' performance increases by over 2.6 points on average, substantially affecting rankings on Spider and BIRD benchmarks; (2) The underestimation of models in EX primarily stems from annotation quality issues; and (3) Model performance on particularly challenging questions tends to be overestimated. This work contributes to a more accurate and nuanced evaluation of text-to-SQL systems, potentially reshaping our understanding of state-of-the-art performance in this field.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2409.19014

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.67)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Interventional Speech Noise Injection for ASR Generalizable Spoken Language Understanding

Jung, Yeonjoon, Lee, Jaeseong, Choi, Seungtaek, Lee, Dohyeon, Kim, Minsoo, Hwang, Seung-won

arXiv.org Artificial IntelligenceOct-20-2024

Recently, pre-trained language models (PLMs) have been increasingly adopted in spoken language understanding (SLU). However, automatic speech recognition (ASR) systems frequently produce inaccurate transcriptions, leading to noisy inputs for SLU models, which can significantly degrade their performance. To address this, our objective is to train SLU models to withstand ASR errors by exposing them to noises commonly observed in ASR systems, referred to as ASR-plausible noises. Speech noise injection (SNI) methods have pursued this objective by introducing ASR-plausible noises, but we argue that these methods are inherently biased towards specific ASR systems, or ASR-specific noises. In this work, we propose a novel and less biased augmentation method of introducing the noises that are plausible to any ASR system, by cutting off the non-causal effect of noises. Experimental results and analyses demonstrate the effectiveness of our proposed methods in enhancing the robustness and generalizability of SLU models against unseen ASR systems by introducing more diverse and plausible ASR noises in advance.

artificial intelligence, asr system, natural language, (16 more...)

arXiv.org Artificial Intelligence

2410.15609

Country:

Europe (0.93)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Efficient and Effective Vocabulary Expansion Towards Multilingual Large Language Models

Kim, Seungduk, Choi, Seungtaek, Jeong, Myeongho

arXiv.org Artificial IntelligenceFeb-22-2024

This report introduces \texttt{EEVE-Korean-v1.0}, a Korean adaptation of large language models that exhibit remarkable capabilities across English and Korean text understanding. Building on recent highly capable but English-centric LLMs, such as SOLAR-10.7B and Phi-2, where non-English texts are inefficiently processed with English-centric tokenizers, we present an efficient and effective vocabulary expansion (EEVE) method, which encompasses parameter freezing and subword initialization. In contrast to previous efforts that believe new embeddings require trillions of training tokens, we show that our method can significantly boost non-English proficiency within just 2 billion tokens. Surpassing most instruction-tuned LLMs on the Open Ko-LLM Leaderboard, as of January 2024, our model \texttt{EEVE-Korean-10.8B-v1.0} ranks as the leading Korean pre-trained model in the open-source community, according to Hugging Face's leaderboard. We open-source our models on Huggingface to empower the open research community in various languages.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2402.14714

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Addressing Negative Transfer in Diffusion Models

Go, Hyojun, Kim, JinYoung, Lee, Yunsung, Lee, Seunghyun, Oh, Shinhyeok, Moon, Hyeongdon, Choi, Seungtaek

arXiv.org Artificial IntelligenceDec-30-2023

Diffusion-based generative models have achieved remarkable success in various domains. It trains a shared model on denoising tasks that encompass different noise levels simultaneously, representing a form of multi-task learning (MTL). However, analyzing and improving diffusion models from an MTL perspective remains under-explored. In particular, MTL can sometimes lead to the well-known phenomenon of negative transfer, which results in the performance degradation of certain tasks due to conflicts between tasks. In this paper, we first aim to analyze diffusion training from an MTL standpoint, presenting two key observations: (O1) the task affinity between denoising tasks diminishes as the gap between noise levels widens, and (O2) negative transfer can arise even in diffusion training. Building upon these observations, we aim to enhance diffusion training by mitigating negative transfer. To achieve this, we propose leveraging existing MTL methods, but the presence of a huge number of denoising tasks makes this computationally expensive to calculate the necessary per-task loss or gradient. To address this challenge, we propose clustering the denoising tasks into small task clusters and applying MTL methods to them. Specifically, based on (O2), we employ interval clustering to enforce temporal proximity among denoising tasks within clusters. We show that interval clustering can be solved using dynamic programming, utilizing signal-to-noise ratio, timestep, and task affinity for clustering objectives. Through this, our approach addresses the issue of negative transfer in diffusion models by allowing for efficient computation of MTL methods. We validate the efficacy of proposed clustering and its integration with MTL methods through various experiments, demonstrating 1) improved generation quality and 2) faster training convergence of diffusion models.

artificial intelligence, diffusion model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2306.00354

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)

Add feedback

Addressing Cold Start Problem for End-to-end Automatic Speech Scoring

Park, Jungbae, Choi, Seungtaek

arXiv.org Artificial IntelligenceJun-25-2023

Integrating automatic speech scoring/assessment systems has become a critical aspect of second-language speaking education. With self-supervised learning advancements, end-to-end speech scoring approaches have exhibited promising results. However, this study highlights the significant decrease in the performance of speech scoring systems in new question contexts, thereby identifying this as a cold start problem in terms of items. With the finding of cold-start phenomena, this paper seeks to alleviate the problem by following methods: 1) prompt embeddings, 2) question context embeddings using BERT or CLIP models, and 3) choice of the pretrained acoustic model. Experiments are conducted on TOEIC speaking test datasets collected from English-as-a-second-language (ESL) learners rated by professional TOEIC speaking evaluators. The results demonstrate that the proposed framework not only exhibits robustness in a cold-start environment but also outperforms the baselines for known content.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2306.1431

Genre: Research Report > New Finding (0.67)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.95)

Add feedback

Cross Encoding as Augmentation: Towards Effective Educational Text Classification

Lee, Hyun Seung, Choi, Seungtaek, Lee, Yunsung, Moon, Hyeongdon, Oh, Shinhyeok, Jeong, Myeongho, Go, Hyojun, Wallraven, Christian

arXiv.org Artificial IntelligenceMay-30-2023

Text classification in education, usually called auto-tagging, is the automated process of assigning relevant tags to educational content, such as questions and textbooks. However, auto-tagging suffers from a data scarcity problem, which stems from two major challenges: 1) it possesses a large tag space and 2) it is multi-label. Though a retrieval approach is reportedly good at low-resource scenarios, there have been fewer efforts to directly address the data scarcity problem. To mitigate these issues, here we propose a novel retrieval approach CEAA that provides effective learning in educational text classification. Our main contributions are as follows: 1) we leverage transfer learning from question-answering datasets, and 2) we propose a simple but effective data augmentation method introducing cross-encoder style texts to a bi-encoder architecture for more efficient inference. An extensive set of experiments shows that our proposed method is effective in multi-label scenarios and low-resource tags compared to state-of-the-art models.

machine learning, natural language, text classification, (17 more...)

arXiv.org Artificial Intelligence

2305.18977

Genre: Research Report > Promising Solution (0.34)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.92)

Add feedback

Evaluation of Question Generation Needs More References

Oh, Shinhyeok, Go, Hyojun, Moon, Hyeongdon, Lee, Yunsung, Jeong, Myeongho, Lee, Hyun Seung, Choi, Seungtaek

arXiv.org Artificial IntelligenceMay-26-2023

Question generation (QG) is the task of generating a valid and fluent question based on a given context and the target answer. According to various purposes, even given the same context, instructors can ask questions about different concepts, and even the same concept can be written in different ways. However, the evaluation for QG usually depends on single reference-based similarity metrics, such as n-gram-based metric or learned metric, which is not sufficient to fully evaluate the potential of QG methods. To this end, we propose to paraphrase the reference question for a more robust QG evaluation. Using large language models such as GPT-3, we created semantically and syntactically diverse questions, then adopt the simple aggregation of the popular evaluation metrics as the final scores. Through our experiments, we found that using multiple (pseudo) references is more effective for QG evaluation while showing a higher correlation with human evaluations than evaluation with a single reference.

machine learning, natural language, question answering, (20 more...)

arXiv.org Artificial Intelligence

2305.16626

Country:

Europe (0.68)
North America > United States > Pennsylvania (0.14)
Asia > Middle East > UAE (0.14)

Genre: Research Report (1.00)

Industry: Energy > Renewable (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.55)

Add feedback

Towards Zero-Shot Functional Compositionality of Language Models

Yu, Hangyeol, Jeong, Myeongho, Shin, Jamin, Moon, Hyeongdon, Park, Juneyoung, Choi, Seungtaek

arXiv.org Artificial IntelligenceMar-6-2023

Large Pre-trained Language Models (PLM) have become the most desirable starting point in the field of NLP, as they have become remarkably good at solving many individual tasks. Despite such success, in this paper, we argue that current paradigms of working with PLMs are neglecting a critical aspect of modeling human intelligence: functional compositionality. Functional compositionality - the ability to compose learned tasks - has been a long-standing challenge in the field of AI (and many other fields) as it is considered one of the hallmarks of human intelligence. An illustrative example of such is cross-lingual summarization, where a bilingual person (English-French) could directly summarize an English document into French sentences without having to translate the English document or summary into French explicitly. We discuss why this matter is an important open problem that requires further attention from the field. Then, we show that current PLMs (e.g., GPT-2 and T5) don't have functional compositionality yet and it is far from human-level generalizability. Finally, we suggest several research directions that could push the field towards zero-shot functional compositionality of language models.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2303.03103

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)

Add feedback

Evaluating the Knowledge Dependency of Questions

Moon, Hyeongdon, Yang, Yoonseok, Shin, Jamin, Yu, Hangyeol, Lee, Seunghyun, Jeong, Myeongho, Park, Juneyoung, Kim, Minsam, Choi, Seungtaek

arXiv.org Artificial IntelligenceNov-21-2022

The automatic generation of Multiple Choice Questions (MCQ) has the potential to reduce the time educators spend on student assessment significantly. However, existing evaluation metrics for MCQ generation, such as BLEU, ROUGE, and METEOR, focus on the n-gram based similarity of the generated MCQ to the gold sample in the dataset and disregard their educational value. They fail to evaluate the MCQ's ability to assess the student's knowledge of the corresponding target fact. To tackle this issue, we propose a novel automatic evaluation metric, coined Knowledge Dependent Answerability (KDA), which measures the MCQ's answerability given knowledge of the target fact. Specifically, we first show how to measure KDA based on student responses from a human survey. Then, we propose two automatic evaluation metrics, KDA_disc and KDA_cont, that approximate KDA by leveraging pre-trained language models to imitate students' problem-solving behavior. Through our human studies, we show that KDA_disc and KDA_soft have strong correlations with both (1) KDA and (2) usability in an actual classroom setting, labeled by experts. Furthermore, when combined with n-gram based similarity metrics, KDA_disc and KDA_cont are shown to have a strong predictive power for various expert-labeled MCQ quality measures.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2022.emnlp-main.718

2211.11902

Country:

Europe > Denmark (0.14)
Europe > United Kingdom (0.14)

Genre:

Questionnaire & Opinion Survey (0.89)
Research Report > New Finding (0.68)
Research Report > Experimental Study (0.68)

Industry: Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback