deepfake text
Does Human Collaboration Enhance the Accuracy of Identifying LLM-Generated Deepfake Texts?
Uchendu, Adaku, Lee, Jooyoung, Shen, Hua, Le, Thai, Huang, Ting-Hao 'Kenneth', Lee, Dongwon
Advances in Large Language Models (e.g., GPT-4, LLaMA) have improved the generation of coherent sentences resembling human writing on a large scale, resulting in the creation of so-called deepfake texts. However, this progress poses security and privacy concerns, necessitating effective solutions for distinguishing deepfake texts from human-written ones. Although prior works studied humans' ability to detect deepfake texts, none has examined whether "collaboration" among humans improves the detection of deepfake texts. In this study, to address this gap of understanding on deepfake texts, we conducted experiments with two groups: (1) nonexpert individuals from the AMT platform and (2) writing experts from the Upwork platform. The results demonstrate that collaboration among humans can potentially improve the detection of deepfake texts for both groups, increasing detection accuracies by 6.36% for non-experts and 12.76% for experts, respectively, compared to individuals' detection accuracies. We further analyze the explanations that humans used for detecting a piece of text as deepfake text, and find that the strongest indicator of deepfake texts is their lack of coherence and consistency. Our study provides useful insights for future tools and framework designs to facilitate the collaborative human detection of deepfake texts. The experiment datasets and AMT implementations are available at: https://github.com/huashen218/llm-deepfake-human-study.git
- North America > United States > Pennsylvania (0.04)
- North America > United States > Mississippi (0.04)
- North America > United States > Michigan (0.04)
- (3 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
TopRoBERTa: Topology-Aware Authorship Attribution of Deepfake Texts
Uchendu, Adaku, Le, Thai, Lee, Dongwon
Recent advances in Large Language Models (LLMs) have enabled the generation of open-ended high-quality texts, that are non-trivial to distinguish from human-written texts. We refer to such LLM-generated texts as \emph{deepfake texts}. There are currently over 11K text generation models in the huggingface model repo. As such, users with malicious intent can easily use these open-sourced LLMs to generate harmful texts and misinformation at scale. To mitigate this problem, a computational method to determine if a given text is a deepfake text or not is desired--i.e., Turing Test (TT). In particular, in this work, we investigate the more general version of the problem, known as \emph{Authorship Attribution (AA)}, in a multi-class setting--i.e., not only determining if a given text is a deepfake text or not but also being able to pinpoint which LLM is the author. We propose \textbf{TopRoBERTa} to improve existing AA solutions by capturing more linguistic patterns in deepfake texts by including a Topological Data Analysis (TDA) layer in the RoBERTa model. We show the benefits of having a TDA layer when dealing with noisy, imbalanced, and heterogeneous datasets, by extracting TDA features from the reshaped $pooled\_output$ of RoBERTa as input. We use RoBERTa to capture contextual representations (i.e., semantic and syntactic linguistic features), while using TDA to capture the shape and structure of data (i.e., linguistic structures). Finally, \textbf{TopRoBERTa}, outperforms the vanilla RoBERTa in 2/3 datasets, achieving up to 7\% increase in Macro F1 score.
- North America > United States > Mississippi > Lafayette County > Oxford (0.14)
- North America > United States > Pennsylvania > Centre County > University Park (0.04)
- North America > United States > Massachusetts > Middlesex County > Lexington (0.04)
- Europe > Belgium (0.04)
The real threat of fake voices in a time of crisis – HYPEREDGE EMBED
Latanya Sweeney is a professor of government and technology in residence at Harvard University's Department of Government, editor-in-chief of Technology Science and the founding director of the Technology Science Initiative and the Data Privacy Lab at the Institute for Quantitative Social Science at Harvard. Max Weiss is a senior at Harvard University and the student who implemented the Deepfake Text experiment. As federal agencies take increasingly stringent actions to try to limit the spread of the novel coronavirus pandemic within the U.S., how can individual Americans and U.S. companies affected by these rules weigh in with their opinions and experiences? Because many of the new rules, such as travel restrictions and increased surveillance, require expansions of federal power beyond normal circumstances, our laws require the federal government to post these rules publicly and allow the public to contribute their comments to the proposed rules online. But are federal public comment websites -- a vital institution for American democracy -- secure in this time of crisis?