AITopics | Lee, Jooyoung

Collaborating Authors

Lee, Jooyoung

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Collaborative Evaluation of Deepfake Text with Deliberation-Enhancing Dialogue Systems

Lee, Jooyoung, Zhu, Xiaochen, Karadzhov, Georgi, Stafford, Tom, Vlachos, Andreas, Lee, Dongwon

arXiv.org Artificial IntelligenceMar-6-2025

The proliferation of generative models has presented significant challenges in distinguishing authentic human-authored content from deepfake content. Collaborative human efforts, augmented by AI tools, present a promising solution. In this study, we explore the potential of DeepFakeDeLiBot, a deliberation-enhancing chatbot, to support groups in detecting deepfake text. Our findings reveal that group-based problem-solving significantly improves the accuracy of identifying machine-generated paragraphs compared to individual efforts. While engagement with DeepFakeDeLiBot does not yield substantial performance gains overall, it enhances group dynamics by fostering greater participant engagement, consensus building, and the frequency and diversity of reasoning-based utterances. Additionally, participants with higher perceived effectiveness of group collaboration exhibited performance benefits from DeepFakeDeLiBot. These findings underscore the potential of deliberative chatbots in fostering interactive and productive group dynamics while ensuring accuracy in collaborative deepfake text detection. \textit{Dataset and source code used in this study will be made publicly available upon acceptance of the manuscript.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2503.04945

Country:

North America > United States (1.00)
Asia (1.00)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (0.77)
Health & Medicine (0.68)
Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Beemo: Benchmark of Expert-edited Machine-generated Outputs

Artemova, Ekaterina, Lucas, Jason, Venkatraman, Saranya, Lee, Jooyoung, Tilga, Sergei, Uchendu, Adaku, Mikhailov, Vladislav

arXiv.org Artificial IntelligenceNov-6-2024

The rapid proliferation of large language models (LLMs) has increased the volume of machine-generated texts (MGTs) and blurred text authorship in various domains. However, most existing MGT benchmarks include single-author texts (human-written and machine-generated). This conventional design fails to capture more practical multi-author scenarios, where the user refines the LLM response for natural flow, coherence, and factual correctness. Our paper introduces the Benchmark of Expert-edited Machine-generated Outputs (Beemo), which includes 6.5k texts written by humans, generated by ten instruction-finetuned LLMs, and edited by experts for various use cases, ranging from creative writing to summarization. Beemo additionally comprises 13.1k machine-generated and LLM-edited texts, allowing for diverse MGT detection evaluation across various edit types. We document Beemo's creation protocol and present the results of benchmarking 33 configurations of MGT detectors in different experimental setups. We find that expert-based editing evades MGT detection, while LLM-edited texts are unlikely to be recognized as human-written. Beemo and all materials are publicly available.

detector, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2411.04032

Country:

North America > United States (0.46)
North America > Mexico > Mexico City (0.14)
Europe > Middle East > Malta (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Education > Educational Setting (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.99)

Add feedback

Can Small Language Models Help Large Language Models Reason Better?: LM-Guided Chain-of-Thought

Lee, Jooyoung, Yang, Fan, Tran, Thanh, Hu, Qian, Barut, Emre, Chang, Kai-Wei, Su, Chengwei

arXiv.org Artificial IntelligenceApr-4-2024

We introduce a novel framework, LM-Guided CoT, that leverages a lightweight (i.e., <1B) language model (LM) for guiding a black-box large (i.e., >10B) LM in reasoning tasks. Specifically, the lightweight LM first generates a rationale for each input instance. The Frozen large LM is then prompted to predict a task output based on the rationale generated by the lightweight LM. Our approach is resource-efficient in the sense that it only requires training the lightweight LM. We optimize the model through 1) knowledge distillation and 2) reinforcement learning from rationale-oriented and task-oriented reward signals. We assess our method with multi-hop extractive question answering (QA) benchmarks, HotpotQA, and 2WikiMultiHopQA. Experimental results show that our approach outperforms all baselines regarding answer prediction accuracy. We also find that reinforcement learning helps the model to produce higher-quality rationales with improved QA performance.

large language model, machine learning, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

2404.03414

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.46)

Industry:

Leisure & Entertainment (1.00)
Media > Film (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.69)

Add feedback

Fighting Fire with Fire: The Dual Role of LLMs in Crafting and Detecting Elusive Disinformation

Lucas, Jason, Uchendu, Adaku, Yamashita, Michiharu, Lee, Jooyoung, Rohatgi, Shaurya, Lee, Dongwon

arXiv.org Artificial IntelligenceOct-24-2023

Recent ubiquity and disruptive impacts of large language models (LLMs) have raised concerns about their potential to be misused (.i.e, generating large-scale harmful and misleading content). To combat this emerging risk of LLMs, we propose a novel "Fighting Fire with Fire" (F3) strategy that harnesses modern LLMs' generative and emergent reasoning capabilities to counter human-written and LLM-generated disinformation. First, we leverage GPT-3.5-turbo to synthesize authentic and deceptive LLM-generated content through paraphrase-based and perturbation-based prefix-style prompts, respectively. Second, we apply zero-shot in-context semantic reasoning techniques with cloze-style prompts to discern genuine from deceptive posts and news articles. In our extensive experiments, we observe GPT-3.5-turbo's zero-shot superiority for both in-distribution and out-of-distribution datasets, where GPT-3.5-turbo consistently achieved accuracy at 68-72%, unlike the decline observed in previous customized and fine-tuned disinformation detectors. Our codebase and dataset are available at https://github.com/mickeymst/F3.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2310.15515

Country:

North America > United States > New York (0.14)
North America > United States > Pennsylvania (0.14)
North America > United States > Massachusetts > Middlesex County (0.14)
Asia > Middle East > Qatar (0.14)

Genre: Research Report (1.00)

Industry:

Media > News (1.00)
Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Does Human Collaboration Enhance the Accuracy of Identifying LLM-Generated Deepfake Texts?

Uchendu, Adaku, Lee, Jooyoung, Shen, Hua, Le, Thai, Huang, Ting-Hao 'Kenneth', Lee, Dongwon

arXiv.org Artificial IntelligenceOct-9-2023

Advances in Large Language Models (e.g., GPT-4, LLaMA) have improved the generation of coherent sentences resembling human writing on a large scale, resulting in the creation of so-called deepfake texts. However, this progress poses security and privacy concerns, necessitating effective solutions for distinguishing deepfake texts from human-written ones. Although prior works studied humans' ability to detect deepfake texts, none has examined whether "collaboration" among humans improves the detection of deepfake texts. In this study, to address this gap of understanding on deepfake texts, we conducted experiments with two groups: (1) nonexpert individuals from the AMT platform and (2) writing experts from the Upwork platform. The results demonstrate that collaboration among humans can potentially improve the detection of deepfake texts for both groups, increasing detection accuracies by 6.36% for non-experts and 12.76% for experts, respectively, compared to individuals' detection accuracies. We further analyze the explanations that humans used for detecting a piece of text as deepfake text, and find that the strongest indicator of deepfake texts is their lack of coherence and consistency. Our study provides useful insights for future tools and framework designs to facilitate the collaborative human detection of deepfake texts. The experiment datasets and AMT implementations are available at: https://github.com/huashen218/llm-deepfake-human-study.git

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2304.01002

Country: North America > United States (0.68)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

Comparison of L2 Korean pronunciation error patterns from five L1 backgrounds by using automatic phonetic transcription

Yeo, Eun Jung, Ryu, Hyungshin, Lee, Jooyoung, Kim, Sunhee, Chung, Minhwa

arXiv.org Artificial IntelligenceJun-19-2023

This paper presents a large-scale analysis of L2 Korean pronunciation error patterns from five different language backgrounds, Chinese, Vietnamese, Japanese, Thai, and English, by using automatic phonetic transcription. For the analysis, confusion matrices are generated for each L1, by aligning canonical phone sequences and automatically transcribed phone sequences obtained from fine-tuned Wav2Vec2 XLS-R phone recognizer. Each value in the confusion matrices is compared to capture frequent common error patterns and to specify patterns unique to a certain language background. Using the Foreign Speakers' Voice Data of Korean for Artificial Intelligence Learning dataset, common error pattern types are found to be (1) substitutions of aspirated or tense consonants with plain consonants, (2) deletions of syllable-final consonants, and (3) substitutions of diphthongs with monophthongs. On the other hand, thirty-nine patterns including (1) syllable-final /l/ substitutions with /n/ for Vietnamese and (2) /\textturnm/ insertions for Japanese are discovered as language-dependent.

error pattern, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2306.10821

Country:

North America (0.14)
Oceania > Australia (0.14)
Asia (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.57)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.46)

Add feedback

Do Language Models Plagiarize?

Lee, Jooyoung, Le, Thai, Chen, Jinghui, Lee, Dongwon

arXiv.org Artificial IntelligenceFeb-13-2023

In this work, therefore, we study three types of plagiarism (i.e., verbatim, paraphrase, and idea) among GPT-2 generated texts, Language Models (LMs) have become core elements of Natural in comparison to its training data, and further analyze the plagiarism Language Processing (NLP) solutions, excelling in a wide range of patterns of fine-tuned LMs with domain-specific corpora which are tasks such as natural language generation (NLG), speech recognition, extensively used in practice. Our results suggest that (1) three types machine translation, and question answering. The development of plagiarism widely exist in LMs beyond memorization, (2) both of large-scale text corpora (generally scraped from the Web) has size and decoding methods of LMs are strongly associated with the enabled researchers to train increasingly large-scale LMs. Especially, degrees of plagiarism they exhibit, and (3) fine-tuned LMs' plagiarism large-scale LMs have demonstrated unprecedented performance on patterns vary based on their corpus similarity and homogeneity. NLG such that LM-generated texts routinely show more novel and Given that a majority of LMs' training data is scraped from the Web interesting stories than human writings do [35], and the distinction without informing content owners, their reiteration of words, phrases, between machine-authored and human-written texts has become and even core ideas from training sets into generated texts has ethical non-trivial [52, 53]. As a result, there has been a significant increase implications. Their patterns are likely to exacerbate as both in the use of LMs in user-facing products and critical applications.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3543507.3583199

2203.07618

Country: North America > United States (1.00)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.93)
Education (0.69)
Government > Regional Government > North America Government > United States Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.66)

Add feedback

Selective compression learning of latent representations for variable-rate image compression

Lee, Jooyoung, Jeong, Seyoon, Kim, Munchurl

arXiv.org Artificial IntelligenceNov-8-2022

Recently, many neural network-based image compression methods have shown promising results superior to the existing tool-based conventional codecs. However, most of them are often trained as separate models for different target bit rates, thus increasing the model complexity. Therefore, several studies have been conducted for learned compression that supports variable rates with single models, but they require additional network modules, layers, or inputs that often lead to complexity overhead, or do not provide sufficient coding efficiency. In this paper, we firstly propose a selective compression method that partially encodes the latent representations in a fully generalized manner for deep learning-based variable-rate image compression. The proposed method adaptively determines essential representation elements for compression of different target quality levels. For this, we first generate a 3D importance map as the nature of input content to represent the underlying importance of the representation elements. The 3D importance map is then adjusted for different target quality levels using importance adjustment curves. The adjusted 3D importance map is finally converted into a 3D binary mask to determine the essential representation elements for compression. The proposed method can be easily integrated with the existing compression models with a negligible amount of overhead increase. Our method can also enable continuously variable-rate compression via simple interpolation of the importance adjustment curves among different quality levels. The extensive experimental results show that the proposed method can achieve comparable compression efficiency as those of the separately trained reference compression models and can reduce decoding time owing to the selective compression. The sample codes are publicly available at https://github.com/JooyoungLeeETRI/SCR.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2211.04104

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback