AITopics | Park, Jungyeul

Plotting

Park, Jungyeul

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Parsing Through Boundaries in Chinese Word Segmentation

Chen, Yige, Li, Zelong, Yang, Changbing, Zhang, Cindy, Cady, Amandisa, Lee, Ai Ka, Zeng, Zejiao, Pan, Haihua, Park, Jungyeul

arXiv.org Artificial IntelligenceMar-29-2025

Chinese word segmentation is a foundational task in natural language processing (NLP), with far-reaching effects on syntactic analysis. Unlike alphabetic languages like English, Chinese lacks explicit word boundaries, making segmentation both necessary and inherently ambiguous. This study highlights the intricate relationship between word segmentation and syntactic parsing, providing a clearer understanding of how different segmentation strategies shape dependency structures in Chinese. Focusing on the Chinese GSD treebank, we analyze multiple word boundary schemes, each reflecting distinct linguistic and computational assumptions, and examine how they influence the resulting syntactic structures. To support detailed comparison, we introduce an interactive web-based visualization tool that displays parsing outcomes across segmentation methods.

artificial intelligence, compound, natural language, (16 more...)

arXiv.org Artificial Intelligence

2503.23091

Country:

North America > United States (0.46)
Europe > United Kingdom (0.28)
North America > Mexico (0.28)
Asia > China (0.28)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback

Enhancing Korean Dependency Parsing with Morphosyntactic Features

Park, Jungyeul, Chen, Yige, Kim, Kyuwon, Lim, KyungTae, Park, Chulwoo

arXiv.org Artificial IntelligenceMar-26-2025

This paper introduces UniDive for Korean, an integrated framework that bridges Universal Dependencies (UD) and Universal Morphology (UniMorph) to enhance the representation and processing of Korean {morphosyntax}. Korean's rich inflectional morphology and flexible word order pose challenges for existing frameworks, which often treat morphology and syntax separately, leading to inconsistencies in linguistic analysis. UniDive unifies syntactic and morphological annotations by preserving syntactic dependencies while incorporating UniMorph-derived features, improving consistency in annotation. We construct an integrated dataset and apply it to dependency parsing, demonstrating that enriched morphosyntactic features enhance parsing accuracy, particularly in distinguishing grammatical relations influenced by morphology. Our experiments, conducted with both encoder-only and decoder-only models, confirm that explicit morphological information contributes to more accurate syntactic analysis.

annotation, artificial intelligence, natural language, (16 more...)

arXiv.org Artificial Intelligence

2503.21029

Country:

Europe (1.00)
Asia (1.00)
North America > Canada (0.28)
North America > United States (0.28)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback

Unlocking Korean Verbs: A User-Friendly Exploration into the Verb Lexicon

Song, Seohyun, Jo, Eunkyul Leah, Chen, Yige, Hong, Jeen-Pyo, Kim, Kyuwon, Wee, Jin, Kang, Miyoung, Lim, KyungTae, Park, Jungyeul, Park, Chulwoo

arXiv.org Artificial IntelligenceDec-1-2024

The Sejong dictionary dataset offers a valuable resource, providing extensive coverage of morphology, syntax, and semantic representation. This dataset can be utilized to explore linguistic information in greater depth. The labeled linguistic structures within this dataset form the basis for uncovering relationships between words and phrases and their associations with target verbs. This paper introduces a user-friendly web interface designed for the collection and consolidation of verb-related information, with a particular focus on subcategorization frames. Additionally, it outlines our efforts in mapping this information by aligning subcategorization frames with corresponding illustrative sentence examples. Furthermore, we provide a Python library that would simplify syntactic parsing and semantic role labeling. These tools are intended to assist individuals interested in harnessing the Sejong dictionary dataset to develop applications for Korean language processing.

artificial intelligence, information, natural language, (18 more...)

arXiv.org Artificial Intelligence

2410.011

Country:

North America > United States (0.46)
Asia > Japan (0.29)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.90)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.71)

Add feedback

Revisiting Absence withSymptoms that T Show up Decades Later to Recover Empty Categories

Chen, Emily, Huang, Nicholas, Robinson, Casey, Xu, Kevin, Huang, Zihao, Park, Jungyeul

arXiv.org Artificial IntelligenceDec-1-2024

This paper explores null elements in English, Chinese, and Korean Penn treebanks. Null elements contain important syntactic and semantic information, yet they have typically been treated as entities to be removed during language processing tasks, particularly in constituency parsing. Thus, we work towards the removal and, in particular, the restoration of null elements in parse trees. We focus on expanding a rule-based approach utilizing linguistic context information to Chinese, as rule based approaches have historically only been applied to English. We also worked to conduct neural experiments with a language agnostic sequence-to-sequence model to recover null elements for English (PTB), Chinese (CTB) and Korean (KTB). To the best of the authors' knowledge, null elements in three different languages have been explored and compared for the first time. In expanding a rule based approach to Chinese, we achieved an overall F1 score of 80.00, which is comparable to past results in the CTB. In our neural experiments we achieved F1 scores up to 90.94, 85.38 and 88.79 for English, Chinese, and Korean respectively with functional labels.

artificial intelligence, computational linguistic, natural language, (18 more...)

arXiv.org Artificial Intelligence

2412.01109

Country:

Europe (1.00)
North America > United States > Pennsylvania (0.29)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback

K-UD: Revising Korean Universal Dependencies Guidelines

Kim, Kyuwon, Chen, Yige, Jo, Eunkyul Leah, Lim, KyungTae, Park, Jungyeul, Park, Chulwoo

arXiv.org Artificial IntelligenceDec-1-2024

Critique has surfaced concerning the existing linguistic annotation framework for Korean Universal Dependencies (UDs), particularly in relation to syntactic relationships. In this paper, our primary objective is to refine the definition of syntactic dependency of UDs within the context of analyzing the Korean language. Our aim is not only to achieve a consensus within UDs but also to garner agreement beyond the UD framework for analyzing Korean sentences using dependency structure, by establishing a linguistic consensus model.

artificial intelligence, natural language, proceedings, (16 more...)

arXiv.org Artificial Intelligence

2412.00856

Country:

Europe (1.00)
Asia (1.00)
North America > Canada (0.47)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.35)

Add feedback

Comparative Analysis of Extrinsic Factors for NER in French

Yang, Grace, Li, Zhiyi, Liu, Yadong, Park, Jungyeul

arXiv.org Artificial IntelligenceOct-17-2024

Named entity recognition (NER) is a crucial task that aims to identify structured information, which is often replete with complex, technical terms and a high degree of variability. Accurate and reliable NER can facilitate the extraction and analysis of important information. However, NER for other than English is challenging due to limited data availability, as the high expertise, time, and expenses are required to annotate its data. In this paper, by using the limited data, we explore various factors including model structure, corpus annotation scheme and data augmentation techniques to improve the performance of a NER model for French. Our experiments demonstrate that these approaches can significantly improve the model's F1 score from original CRF score of 62.41 to 79.39. Our findings suggest that considering different extrinsic factors and combining these techniques is a promising approach for improving NER performance where the size of data is limited.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.1275

Country:

North America > United States (0.94)
Europe > France (0.72)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

jp-evalb: Robust Alignment-based PARSEVAL Measures

Park, Jungyeul, Wang, Junrui, Jo, Eunkyul Leah, Park, Angela Yoonseo

arXiv.org Artificial IntelligenceMay-22-2024

We introduce an evaluation system designed to compute PARSEVAL measures, offering a viable alternative to \texttt{evalb} commonly used for constituency parsing evaluation. The widely used \texttt{evalb} script has traditionally been employed for evaluating the accuracy of constituency parsing results, albeit with the requirement for consistent tokenization and sentence boundaries. In contrast, our approach, named \texttt{jp-evalb}, is founded on an alignment method. This method aligns sentences and words when discrepancies arise. It aims to overcome several known issues associated with \texttt{evalb} by utilizing the `jointly preprocessed (JP)' alignment-based method. We introduce a more flexible and adaptive framework, ultimately contributing to a more accurate assessment of constituency parsing performance.

computational linguistic, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2405.1415

Country:

Europe (1.00)
Asia (0.93)
North America > United States > California (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Neural Automated Writing Evaluation with Corrective Feedback

Wang, Izia Xiaoxiao, Wu, Xihan, Coates, Edith, Zeng, Min, Kuang, Jiexin, Liu, Siliang, Qiu, Mengyang, Park, Jungyeul

arXiv.org Artificial IntelligenceMay-6-2024

The utilization of technology in second language learning and teaching has become ubiquitous. For the assessment of writing specifically, automated writing evaluation (AWE) and grammatical error correction (GEC) have become immensely popular and effective methods for enhancing writing proficiency and delivering instant and individualized feedback to learners. By leveraging the power of natural language processing (NLP) and machine learning algorithms, AWE and GEC systems have been developed separately to provide language learners with automated corrective feedback and more accurate and unbiased scoring that would otherwise be subject to examiners. In this paper, we propose an integrated system for automated writing evaluation with corrective feedback as a means of bridging the gap between AWE and GEC results for second language learners. This system enables language learners to simulate the essay writing tests: a student writes and submits an essay, and the system returns the assessment of the writing along with suggested grammatical error corrections. Given that automated scoring and grammatical correction are more efficient and cost-effective than human grading, this integrated system would also alleviate the burden of manually correcting innumerable essays.

computational linguistic, large language model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2402.17613

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Georgia > Clarke County > Athens (0.14)

Genre:

Research Report (1.00)
Overview (0.93)

Industry:

Education > Curriculum > Subject-Specific Education (0.67)
Education > Educational Technology > Educational Software > Computer-Aided Assessment (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.57)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Word segmentation granularity in Korean

Park, Jungyeul, Kim, Mija

arXiv.org Artificial IntelligenceSep-7-2023

This paper describes word {segmentation} granularity in Korean language processing. From a word separated by blank space, which is termed an eojeol, to a sequence of morphemes in Korean, there are multiple possible levels of word segmentation granularity in Korean. For specific language processing and corpus annotation tasks, several different granularity levels have been proposed and utilized, because the agglutinative languages including Korean language have a one-to-one mapping between functional morpheme and syntactic category. Thus, we analyze these different granularity levels, presenting the examples of Korean language processing systems for future reference. Interestingly, the granularity by separating only functional morphemes including case markers and verbal endings, and keeping other suffixes for morphological derivation results in the optimal performance for phrase structure parsing. This contradicts previous best practices for Korean language processing, which has been the de facto standard for various applications that require separating all morphemes.

artificial intelligence, granularity, natural language, (18 more...)

arXiv.org Artificial Intelligence

2309.03713

Country:

Europe (1.00)
Asia (1.00)
North America > Canada (0.67)
North America > United States > California (0.28)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback

Extrinsic Factors Affecting the Accuracy of Biomedical NER

Li, Zhiyi, Zhang, Shengjie, Song, Yujie, Park, Jungyeul

arXiv.org Artificial IntelligenceMay-29-2023

Biomedical named entity recognition (NER) is a critial task that aims to identify structured information in clinical text, which is often replete with complex, technical terms and a high degree of variability. Accurate and reliable NER can facilitate the extraction and analysis of important biomedical information, which can be used to improve downstream applications including the healthcare system. However, NER in the biomedical domain is challenging due to limited data availability, as the high expertise, time, and expenses are required to annotate its data. In this paper, by using the limited data, we explore various extrinsic factors including the corpus annotation scheme, data augmentation techniques, semi-supervised learning and Brill transformation, to improve the performance of a NER model on a clinical text dataset (i2b2 2012, \citet{sun-rumshisky-uzuner:2013}). Our experiments demonstrate that these approaches can significantly improve the model's F1 score from original 73.74 to 77.55. Our findings suggest that considering different extrinsic factors and combining these techniques is a promising approach for improving NER performance in the biomedical domain where the size of data is limited.

information retrieval, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2305.18152

Country: North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.87)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.71)

Add feedback