paraphrase
Adversarial Paraphrasing: A Universal Attack for Humanizing AI-Generated Text
Cheng, Yize, Sadasivan, Vinu Sankar, Saberi, Mehrdad, Saha, Shoumik, Feizi, Soheil
The increasing capabilities of Large Language Models (LLMs) have raised concerns about their misuse in AI-generated plagiarism and social engineering. While various AI-generated text detectors have been proposed to mitigate these risks, many remain vulnerable to simple evasion techniques such as paraphrasing. However, recent detectors have shown greater robustness against such basic attacks. In this work, we introduce Adversarial Paraphrasing, a training-free attack framework that universally humanizes any AI-generated text to evade detection more effectively. Our approach leverages an off-the-shelf instruction-following LLM to paraphrase AI-generated content under the guidance of an AI text detector, producing adversarial examples that are specifically optimized to bypass detection. Extensive experiments show that our attack is both broadly effective and highly transferable across several detection systems. For instance, compared to simple paraphrasing attack--which, ironically, increases the true positive at 1% false positive (T@1%F) by 8.57% on RADAR and 15.03% on Fast-DetectGPT--adversarial paraphrasing, guided by OpenAI-RoBERTa-Large, reduces T@1%F by 64.49% on RADAR and a striking 98.96% on Fast-DetectGPT. Across a diverse set of detectors--including neural network-based, watermark-based, and zero-shot approaches--our attack achieves an average T@1%F reduction of 87.88% under the guidance of OpenAI-RoBERTa-Large. We also analyze the tradeoff between text quality and attack success to find that our method can significantly reduce detection rates, with mostly a slight degradation in text quality. Our adversarial setup highlights the need for more robust and resilient detection strategies in the light of increasingly sophisticated evasion techniques.
- Asia > China (0.28)
- Asia > Vietnam (0.04)
- Europe > United Kingdom > England (0.04)
- North America > United States > Maryland > Prince George's County > College Park (0.04)
- Leisure & Entertainment (1.00)
- Health & Medicine > Therapeutic Area > Immunology (1.00)
- Government > Regional Government (1.00)
- (9 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.45)
MahaParaphrase: A Marathi Paraphrase Detection Corpus and BERT-based Models
Jadhav, Suramya, Shanbhag, Abhay, Thakurdesai, Amogh, Sinare, Ridhima, Joshi, Ananya, Joshi, Raviraj
Paraphrases are a vital tool to assist language understanding tasks such as question answering, style transfer, semantic parsing, and data augmentation tasks. Indic languages are complex in natural language processing (NLP) due to their rich morphological and syntactic variations, diverse scripts, and limited availability of annotated data. In this work, we present the L3Cube-MahaParaphrase Dataset, a high-quality paraphrase corpus for Marathi, a low resource Indic language, consisting of 8,000 sentence pairs, each annotated by human experts as either Paraphrase (P) or Non-paraphrase (NP). We also present the results of standard transformer-based BERT models on these datasets. The dataset and model are publicly shared at https://github.com/l3cube-pune/MarathiNLP
- Europe > Finland > Southwest Finland > Turku (0.05)
- North America > Dominican Republic (0.04)
- Europe > Albania > Tirana County > Tirana (0.04)
- (2 more...)
- Research Report (0.50)
- Overview (0.47)
1 . For all authors a
Do the main claims made in the abstract and introduction accurately reflect the paper's If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Y es] Provided in the Did you specify all the training details (e.g., data splits, hyperparameters, how they Did you report error bars (e.g., with respect to the random seed after running experiments multiple times)? Did you include the total amount of compute and the type of resources used (e.g., type If you are using existing assets (e.g., code, data, models) or curating/releasing new assets... (a) If your work uses existing assets, did you cite the creators? Did you include any new assets either in the supplemental material or as a URL? [Y es] Did you discuss whether and how consent was obtained from people whose data you're If you used crowdsourcing or conducted research with human subjects... (a) Figure 2 shows an overview of our proposed approach. Any number of differentiable constraints can be incorporated. D.1 Semantic similarity models We explain the semantic similarity models we use in our experiments in more detail here: 15 Weights Fluency (%) Transfer (%) wsim (w.r .t. input) wsim (w.r .t. ref.) log p ( y | x) We use this model for adding constraints in style-transfer ( 3.1) and D.2 Models used in multi-attribute transfer We collect Y elp restaurant reviews using scripts provided by Lample et al.
Could Thinking Multilingually Empower LLM Reasoning?
Gao, Changjiang, Huang, Xu, Zhu, Wenhao, Huang, Shujian, Li, Lei, Yuan, Fei
Previous work indicates that large language models exhibit a significant "English bias", i.e. they often perform better when tasks are presented in English. Interestingly, we have observed that using certain other languages in reasoning tasks can yield better performance than English. However, this phenomenon remains under-explored. In this paper, we explore the upper bound of harnessing multilingualism in reasoning tasks, suggesting that multilingual reasoning promises significantly (by nearly 10 Acc@$k$ points) and robustly (tolerance for variations in translation quality and language choice) higher upper bounds than English-only reasoning. Besides analyzing the reason behind the upper bound and challenges in reaching it, we also find that common answer selection methods cannot achieve this upper bound, due to their limitations and biases. These insights could pave the way for future research aimed at fully harnessing the potential of multilingual reasoning in LLMs.
Consistency Checks for Language Model Forecasters
Paleka, Daniel, Sudhir, Abhimanyu Pallavi, Alvarez, Alejandro, Bhat, Vineeth, Shen, Adam, Wang, Evan, Tramèr, Florian
Forecasting is a task that is difficult to evaluate: the ground truth can only be known in the future. Recent work showing LLM forecasters rapidly approaching human-level performance begs the question: how can we benchmark and evaluate these forecasters instantaneously? Following the consistency check framework, we measure the performance of forecasters in terms of the consistency of their predictions on different logically-related questions. We propose a new, general consistency metric based on arbitrage: for example, if a forecasting AI illogically predicts that both the Democratic and Republican parties have 60% probability of winning the 2024 US presidential election, an arbitrageur can trade against the forecaster's predictions and make a profit. We build an automated evaluation system that generates a set of base questions, instantiates consistency checks from these questions, elicits the predictions of the forecaster, and measures the consistency of the predictions. We then build a standard, proper-scoring-rule forecasting benchmark, and show that our (instantaneous) consistency metrics correlate with LLM forecasters' ground truth Brier scores (which are only known in the future). We also release a consistency benchmark that resolves in 2028, providing a long-term evaluation tool for forecasting.
- Research Report (1.00)
- Workflow (0.92)
- Government > Voting & Elections (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- Banking & Finance > Trading (1.00)
MOPO: Multi-Objective Prompt Optimization for Affective Text Generation
Resendiz, Yarik Menchaca, Klinger, Roman
How emotions are expressed depends on the context and domain. On X (formerly Twitter), for instance, an author might simply use the hashtag #anger, while in a news headline, emotions are typically written in a more polite, indirect manner. To enable conditional text generation models to create emotionally connotated texts that fit a domain, users need to have access to a parameter that allows them to choose the appropriate way to express an emotion. To achieve this, we introduce MOPO, a Multi-Objective Prompt Optimization methodology. MOPO optimizes prompts according to multiple objectives (which correspond here to the output probabilities assigned by emotion classifiers trained for different domains). In contrast to single objective optimization, MOPO outputs a set of prompts, each with a different weighting of the multiple objectives. Users can then choose the most appropriate prompt for their context. We evaluate MOPO using three objectives, determined by various domain-specific emotion classifiers. MOPO improves performance by up to 15 pp across all objectives with a minimal loss (1-2 pp) for any single objective compared to single-objective optimization. These minor performance losses are offset by a broader generalization across multiple objectives - which is not possible with single-objective optimization. Additionally, MOPO reduces computational requirements by simultaneously optimizing for multiple objectives, eliminating separate optimization procedures for each objective.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Czechia > Prague (0.04)
- Asia > Singapore (0.04)
- (14 more...)
- Research Report (0.64)
- Overview (0.46)
- Media (0.67)
- Leisure & Entertainment (0.46)
- Health & Medicine (0.46)
Understanding the Effects of Human-written Paraphrases in LLM-generated Text Detection
Lau, Hiu Ting, Zubiaga, Arkaitz
Natural Language Generation has been rapidly developing with the advent of large language models (LLMs). While their usage has sparked significant attention from the general public, it is important for readers to be aware when a piece of text is LLM-generated. This has brought about the need for building models that enable automated LLM-generated text detection, with the aim of mitigating potential negative outcomes of such content. Existing LLM-generated detectors show competitive performances in telling apart LLM-generated and human-written text, but this performance is likely to deteriorate when paraphrased texts are considered. In this study, we devise a new data collection strategy to collect Human & LLM Paraphrase Collection (HLPC), a first-of-its-kind dataset that incorporates human-written texts and paraphrases, as well as LLM-generated texts and paraphrases. With the aim of understanding the effects of human-written paraphrases on the performance of state-of-the-art LLM-generated text detectors OpenAI RoBERTa and watermark detectors, we perform classification experiments that incorporate human-written paraphrases, watermarked and non-watermarked LLM-generated documents from GPT and OPT, and LLM-generated paraphrases from DIPPER and BART. The results show that the inclusion of human-written paraphrases has a significant impact of LLM-generated detector performance, promoting TPR@1%FPR with a possible trade-off of AUROC and accuracy.
Are Paraphrases Generated by Large Language Models Invertible?
Soto, Rafael Rivera, Chen, Barry, Andrews, Nicholas
Large language models can produce highly fluent paraphrases while retaining much of the original meaning. While this capability has a variety of helpful applications, it may also be abused by bad actors, for example to plagiarize content or to conceal their identity. This motivates us to consider the problem of paraphrase inversion: given a paraphrased document, attempt to recover the original text. To explore the feasibility of this task, we fine-tune paraphrase inversion models, both with and without additional author-specific context to help guide the inversion process. We explore two approaches to author-specific inversion: one using in-context examples of the target author's writing, and another using learned style representations that capture distinctive features of the author's style. We show that, when starting from paraphrased machine-generated text, we can recover significant portions of the document using a learned inversion model. When starting from human-written text, the variety of source writing styles poses a greater challenge for invertability. However, even when the original tokens can't be recovered, we find the inverted text is stylistically similar to the original, which significantly improves the performance of plagiarism detectors and authorship identification systems that rely on stylistic markers.
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > Dominican Republic (0.04)
- (4 more...)
Towards Human Understanding of Paraphrase Types in ChatGPT
Meier, Dominik, Wahle, Jan Philip, Ruas, Terry, Gipp, Bela
Paraphrases represent a human's intuitive ability to understand expressions presented in various different ways. Current paraphrase evaluations of language models primarily use binary approaches, offering limited interpretability of specific text changes. Atomic paraphrase types (APT) decompose paraphrases into different linguistic changes and offer a granular view of the flexibility in linguistic expression (e.g., a shift in syntax or vocabulary used). In this study, we assess the human preferences towards ChatGPT in generating English paraphrases with ten APTs and five prompting techniques. We introduce APTY (Atomic Paraphrase TYpes), a dataset of 500 sentence-level and word-level annotations by 15 annotators. The dataset also provides a human preference ranking of paraphrases with different types that can be used to fine-tune models with RLHF and DPO methods. Our results reveal that ChatGPT can generate simple APTs, such as additions and deletions, but struggle with complex structures (e.g., subordination changes). This study contributes to understanding which aspects of paraphrasing language models have already succeeded at understanding and what remains elusive. In addition, our curated datasets can be used to develop language models with specific linguistic capabilities.
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Europe > Germany > Lower Saxony > Gottingen (0.04)
- (4 more...)