AITopics | Lee, Jiwoo

Collaborating Authors

Lee, Jiwoo

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Exploring Multimodal Perception in Large Language Models Through Perceptual Strength Ratings

Lee, Jonghyun, Park, Dojun, Lee, Jiwoo, Choi, Hoekeon, Lee, Sung-Eun

arXiv.org Artificial IntelligenceMar-10-2025

This study investigated the multimodal perception of large language models (LLMs), focusing on their ability to capture human-like perceptual strength ratings across sensory modalities. Utilizing perceptual strength ratings as a benchmark, the research compared GPT-3.5, GPT-4, GPT-4o, and GPT-4o-mini, highlighting the influence of multimodal inputs on grounding and linguistic reasoning. While GPT-4 and GPT-4o demonstrated strong alignment with human evaluations and significant advancements over smaller models, qualitative analyses revealed distinct differences in processing patterns, such as multisensory overrating and reliance on loose semantic associations. Despite integrating multimodal capabilities, GPT-4o did not exhibit superior grounding compared to GPT-4, raising questions about their role in improving human-like grounding. These findings underscore how LLMs' reliance on linguistic patterns can both approximate and diverge from human embodied cognition, revealing limitations in replicating sensory experiences.

correlation, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2503.0698

Country: Asia > South Korea (0.47)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.67)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Evaluating Large language models on Understanding Korean indirect Speech acts

Koo, Youngeun, Lee, Jiwoo, Park, Dojun, Park, Seohyun, Lee, Sungeun

arXiv.org Artificial IntelligenceFeb-15-2025

To accurately understand the intention of an utterance is crucial in conversational communication. As conversational artificial intelligence models are rapidly being developed and applied in various fields, it is important to evaluate the LLMs' capabilities of understanding the intentions of user's utterance. This study evaluates whether current LLMs can understand the intention of an utterance by considering the given conversational context, particularly in cases where the actual intention differs from the surface-leveled, literal intention of the sentence, i.e. indirect speech acts. Our findings reveal that Claude3-Opus outperformed the other competing models, with 71.94% in MCQ and 65% in OEQ, showing a clear advantage. In general, proprietary models exhibited relatively higher performance compared to open-source models. Nevertheless, no LLMs reached the level of human performance. Most LLMs, except for Claude3-Opus, demonstrated significantly lower performance in understanding indirect speech acts compared to direct speech acts, where the intention is explicitly revealed through the utterance. This study not only performs an overall pragmatic evaluation of each LLM's language use through the analysis of OEQ response patterns, but also emphasizes the necessity for further research to improve LLMs' understanding of indirect speech acts for more natural communication with humans.

korean indirect speech act, large language model, natural language, (1 more...)

arXiv.org Artificial Intelligence

2502.10995

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Recommendations for Comprehensive and Independent Evaluation of Machine Learning-Based Earth System Models

Ullrich, Paul A., Barnes, Elizabeth A., Collins, William D., Dagon, Katherine, Duan, Shiheng, Elms, Joshua, Lee, Jiwoo, Leung, L. Ruby, Lu, Dan, Molina, Maria J., O'Brien, Travis A., Rebassoo, Finn O.

arXiv.org Artificial IntelligenceJan-6-2025

Machine learning (ML) is a revolutionary technology with demonstrable applications across multiple disciplines. Within the Earth science community, ML has been most visible for weather forecasting, producing forecasts that rival modern physics-based models. Given the importance of deepening our understanding and improving predictions of the Earth system on all time scales, efforts are now underway to develop forecasting models into Earth-system models (ESMs), capable of representing all components of the coupled Earth system (or their aggregated behavior) and their response to external changes. Modeling the Earth system is a much more difficult problem than weather forecasting, not least because the model must represent the alternate (e.g., future) coupled states of the system for which there are no historical observations. Given that the physical principles that enable predictions about the response of the Earth system are often not explicitly coded in these ML-based models, demonstrating the credibility of ML-based ESMs thus requires us to build evidence of their consistency with the physical system. To this end, this paper puts forward five recommendations to enhance comprehensive, standardized, and independent evaluation of ML-based ESMs to strengthen their credibility and promote their wider use.

artificial intelligence, machine learning, ml-based esm, (15 more...)

arXiv.org Artificial Intelligence

2410.19882

Country:

North America > United States > California (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report (1.00)

Industry:

Energy (1.00)
Government > Regional Government > North America Government > United States Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Model-Based Reasoning (0.37)

Add feedback

Small Language Models Learn Enhanced Reasoning Skills from Medical Textbooks

Kim, Hyunjae, Hwang, Hyeon, Lee, Jiwoo, Park, Sihyeon, Kim, Dain, Lee, Taewhoo, Yoon, Chanwoong, Sohn, Jiwoong, Choi, Donghee, Kang, Jaewoo

arXiv.org Artificial IntelligenceJun-30-2024

While recent advancements in commercial large language models (LM) have shown promising results in medical tasks, their closed-source nature poses significant privacy and security concerns, hindering their widespread use in the medical field. Despite efforts to create open-source models, their limited parameters often result in insufficient multi-step reasoning capabilities required for solving complex medical problems. To address this, we introduce Meerkat, a new family of medical AI systems ranging from 7 to 70 billion parameters. The models were trained using our new synthetic dataset consisting of high-quality chain-of-thought reasoning paths sourced from 18 medical textbooks, along with diverse instruction-following datasets. Our systems achieved remarkable accuracy across six medical benchmarks, surpassing the previous best models such as MediTron and BioMistral, and GPT-3.5 by a large margin. Notably, Meerkat-7B surpassed the passing threshold of the United States Medical Licensing Examination (USMLE) for the first time for a 7B-parameter model, while Meerkat-70B outperformed GPT-4 by an average of 1.3%. Additionally, Meerkat-70B correctly diagnosed 21 out of 38 complex clinical cases, outperforming humans' 13.8 and closely matching GPT-4's 21.8. Our systems offered more detailed free-form responses to clinical queries compared to existing small models, approaching the performance level of large commercial models. This significantly narrows the performance gap with large LMs, showcasing its effectiveness in addressing complex medical challenges.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2404.00376

Country:

Europe (0.93)
North America > United States (0.34)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Consumer Health (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MultiPragEval: Multilingual Pragmatic Evaluation of Large Language Models

Park, Dojun, Lee, Jiwoo, Park, Seohyun, Jeong, Hyeyun, Koo, Youngeun, Hwang, Soonha, Park, Seonwoo, Lee, Sungeun

arXiv.org Artificial IntelligenceJun-11-2024

As the capabilities of LLMs expand, it becomes increasingly important to evaluate them beyond basic knowledge assessment, focusing on higher-level language understanding. This study introduces MultiPragEval, a robust test suite designed for the multilingual pragmatic evaluation of LLMs across English, German, Korean, and Chinese. Comprising 1200 question units categorized according to Grice's Cooperative Principle and its four conversational maxims, MultiPragEval enables an in-depth assessment of LLMs' contextual awareness and their ability to infer implied meanings. Our findings demonstrate that Claude3-Opus significantly outperforms other models in all tested languages, establishing a state-of-the-art in the field. Among open-source models, Solar-10.7B and Qwen1.5-14B emerge as strong competitors. This study not only leads the way in the multilingual evaluation of LLMs in pragmatic inference but also provides valuable insights into the nuanced capabilities necessary for advanced language comprehension in AI systems.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2406.07736

Country:

North America > United States (0.14)
Europe > France (0.14)
Asia > China (0.14)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

Pragmatic Competence Evaluation of Large Language Models for Korean

Park, Dojun, Lee, Jiwoo, Jeong, Hyeyun, Park, Seohyun, Lee, Sungeun

arXiv.org Artificial IntelligenceMar-19-2024

The current evaluation of Large Language Models (LLMs) predominantly relies on benchmarks focusing on their embedded knowledge by testing through multiple-choice questions (MCQs), a format inherently suited for automated evaluation. Our study extends this evaluation to explore LLMs' pragmatic competence--a facet previously underexamined before the advent of sophisticated LLMs, specifically in the context of Korean. We employ two distinct evaluation setups: the conventional MCQ format, adapted for automatic evaluation, and Open-Ended Questions (OEQs), assessed by human experts, to examine LLMs' narrative response capabilities without predefined options. Our findings reveal that GPT-4 excels, scoring 81.11 and 85.69 in the MCQ and OEQ setups, respectively, with HyperCLOVA X, an LLM optimized for Korean, closely following, especially in the OEQ setup, demonstrating a score of 81.56 with a marginal difference of 4.13 points compared to GPT-4. Furthermore, while few-shot learning strategies generally enhance LLM performance, Chain-of-Thought (CoT) prompting introduces a bias toward literal interpretations, hindering accurate pragmatic inference. Considering the growing expectation for LLMs to understand and produce language that aligns with human communicative norms, our findings emphasize the importance for advancing LLMs' abilities to grasp and convey sophisticated meanings beyond mere literal interpretations.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2403.12675

Country:

Asia > South Korea (0.49)
Europe (0.46)

Genre: Research Report > New Finding (0.86)

Industry:

Education (0.48)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Improving seasonal forecast using probabilistic deep learning

Pan, Baoxiang, Anderson, Gemma J., Goncalves, AndrE, Lucas, Donald D., Bonfils, CEline J. W., Lee, Jiwoo

arXiv.org Machine LearningOct-27-2020

The path toward realizing the potential of seasonal forecasting and its socioeconomic benefits depends heavily on improving general circulation model based dynamical forecasting systems. To improve dynamical seasonal forecast, it is crucial to set up forecast benchmarks, and clarify forecast limitations posed by model initialization errors, formulation deficiencies, and internal climate variability. With huge cost in generating large forecast ensembles, and limited observations for forecast verification, the seasonal forecast benchmarking and diagnosing task proves challenging. In this study, we develop a probabilistic deep neural network model, drawing on a wealth of existing climate simulations to enhance seasonal forecast capability and forecast diagnosis. By leveraging complex physical relationships encoded in climate simulations, our probabilistic forecast model demonstrates favorable deterministic and probabilistic skill compared to state-of-the-art dynamical forecast systems in quasi-global seasonal forecast of precipitation and near-surface temperature. We apply this probabilistic forecast methodology to quantify the impacts of initialization errors and model formulation deficiencies in a dynamical seasonal forecasting system. We introduce the saliency analysis approach to efficiently identify the key predictors that influence seasonal variability. Furthermore, by explicitly modeling uncertainty using variational Bayes, we give a more definitive answer to how the El Nino/Southern Oscillation, the dominant mode of seasonal variability, modulates global seasonal predictability.

deep learning, forecast, neural network, (20 more...)

arXiv.org Machine Learning

2010.1461

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.48)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Energy (0.68)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Deep-dust: Predicting concentrations of fine dust in Seoul using LSTM

Kim, Sookyung, Lee, Jungmin M., Lee, Jiwoo, Seo, Jihoon

arXiv.org Machine LearningJan-29-2019

Polluting fine dusts in South Korea which are mainly consisted of biomass burning and fugitive dust blown from dust belt is significant problem these days. Predicting concentrations of fine dust particles in Seoul is challenging because they are product of complicate chemical reactions among gaseous pollutants and also influenced by dynamical interactions between pollutants and multiple climate variables. Elaborating state-of-art time series analysis techniques using deep learning, non-linear interactions between multiple variables can be captured and used to predict future dust concentration. In this work, we propose the LSTM based model to predict hourly concentration of fine dust at target location in Seoul based on previous concentration of pollutants, dust concentrations and climate variables in surrounding area. Our results show that proposed model successfully predicts future dust concentrations at 25 target districts(Gu) in Seoul.

concentration, deep learning, neural network, (15 more...)

arXiv.org Machine Learning

1901.10106

Country: Asia > South Korea > Seoul > Seoul (1.00)

Genre: Research Report > New Finding (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback