AITopics | Yang, Seongjun

Collaborating Authors

Yang, Seongjun

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

EHRSQL: A Practical Text-to-SQL Benchmark for Electronic Health Records

Lee, Gyubok, Hwang, Hyeonji, Bae, Seongsu, Kwon, Yeonsu, Shin, Woncheol, Yang, Seongjun, Seo, Minjoon, Kim, Jong-Yeup, Choi, Edward

arXiv.org Artificial IntelligenceDec-25-2023

We present a new text-to-SQL dataset for electronic health records (EHRs). The utterances were collected from 222 hospital staff members, including physicians, nurses, and insurance review and health records teams. To construct the QA dataset on structured EHR data, we conducted a poll at a university hospital and used the responses to create seed questions. We then manually linked these questions to two open-source EHR databases, MIMIC-III and eICU, and included various time expressions and held-out unanswerable questions in the dataset, which were also collected from the poll. Our dataset poses a unique set of challenges: the model needs to 1) generate SQL queries that reflect a wide range of needs in the hospital, including simple retrieval and complex operations such as calculating survival rate, 2) understand various time expressions to answer time-sensitive questions in healthcare, and 3) distinguish whether a given question is answerable or unanswerable. We believe our dataset, EHRSQL, can serve as a practical benchmark for developing and assessing QA models on structured EHR data and take a step further towards bridging the gap between text-to-SQL research and its real-life deployment in healthcare. EHRSQL is available at https://github.

admission, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2301.07695

Country:

Asia > South Korea (0.14)
Oceania > Australia (0.14)
North America > United States (0.14)
(2 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Health Care Technology > Medical Record (1.00)

Technology:

Information Technology > Databases (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Predictive Pipelined Decoding: A Compute-Latency Trade-off for Exact LLM Decoding

Yang, Seongjun, Lee, Gibbeum, Cho, Jaewoong, Papailiopoulos, Dimitris, Lee, Kangwook

arXiv.org Artificial IntelligenceJul-12-2023

This paper presents "Predictive Pipelined Decoding (PPD)," an approach that speeds up greedy decoding in Large Language Models (LLMs) while maintaining the exact same output as the original decoding. Unlike conventional strategies, PPD employs additional compute resources to parallelize the initiation of subsequent token decoding during the current token decoding. This innovative method reduces decoding latency and reshapes the understanding of trade-offs in LLM decoding strategies. We have developed a theoretical framework that allows us to analyze the trade-off between computation and latency. Using this framework, we can analytically estimate the potential reduction in latency associated with our proposed method, achieved through the assessment of the match rate, represented as p_correct. The results demonstrate that the use of extra computational resources has the potential to accelerate LLM greedy decoding.

compute resource, machine learning, natural language, (13 more...)

arXiv.org Artificial Intelligence

2307.05908

Country:

North America > United States > Wisconsin (0.14)
North America > United States > Texas (0.14)
North America > United States > Hawaii (0.14)

Genre:

Research Report > New Finding (0.34)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Towards the Practical Utility of Federated Learning in the Medical Domain

Yang, Seongjun, Hwang, Hyeonji, Kim, Daeyoung, Dua, Radhika, Kim, Jong-Yeup, Yang, Eunho, Choi, Edward

arXiv.org Artificial IntelligenceMay-19-2023

Federated learning (FL) is an active area of research. One of the most suitable areas for adopting FL is the medical domain, where patient privacy must be respected. Previous research, however, does not provide a practical guide to applying FL in the medical domain. We propose empirical benchmarks and experimental settings for three representative medical datasets with different modalities: longitudinal electronic health records, skin cancer images, and electrocardiogram signals. The likely users of FL such as medical institutions and IT companies can take these benchmarks as guides for adopting FL and minimize their trial and error. For each dataset, each client data is from a different source to preserve real-world heterogeneity. We evaluate six FL algorithms designed for addressing data heterogeneity among clients, and a hybrid algorithm combining the strengths of two representative FL algorithms. Based on experiment results from three modalities, we discover that simple FL algorithms tend to outperform more sophisticated ones, while the hybrid algorithm consistently shows good, if not the best performance. We also find that a frequent global model update leads to better performance under a fixed training iteration budget. As the number of participating clients increases, higher cost is incurred due to increased IT administrators and GPUs, but the performance consistently increases. We expect future users will refer to these empirical benchmarks to design the FL experiments in the medical domain considering their clinical tasks and obtain stronger performance with lower costs.

federated learning, medical domain, practical utility

arXiv.org Artificial Intelligence

2207.03075

Genre: Research Report (0.40)

Industry:

Health & Medicine > Therapeutic Area (0.73)
Health & Medicine > Health Care Technology > Medical Record (0.53)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.60)

Add feedback

Improving Lexically Constrained Neural Machine Translation with Source-Conditioned Masked Span Prediction

Lee, Gyubok, Yang, Seongjun, Choi, Edward

arXiv.org Artificial IntelligenceMay-12-2021

Generating accurate terminology is a crucial component for the practicality and reliability of neural machine translation (NMT) systems. To address this, lexically constrained NMT explores various methods to ensure pre-specified words and phrases to appear in the translations. In many cases, however, those methods are evaluated on general domain corpora, where the terms are mostly uni- and bi-grams (>98%). In this paper, we instead tackle a more challenging setup consisting of domain-specific corpora with much longer n-gram and highly specialized terms. To encourage span-level representations in generation, we additionally impose a source-sentence conditioned masked span prediction loss in the decoder and observe improvements on both terminology translation as well as BLEU scores. Experimental results on three domain-specific corpora in two language pairs demonstrate that the proposed training scheme can improve the performance of existing lexically constrained methods that can operate both with or without a term dictionary at test time.

artificial intelligence, computational linguistics, machine translation, (13 more...)

arXiv.org Artificial Intelligence

2105.05498

Country:

Europe (1.00)
Asia (0.93)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.50)

Industry: Law (0.96)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback