Andrews, Nicholas
Uncertainty Distillation: Teaching Language Models to Express Semantic Confidence
Hager, Sophia, Mueller, David, Duh, Kevin, Andrews, Nicholas
As large language models (LLMs) are increasingly used for factual question-answering, it becomes more important for LLMs to have the capability to communicate the likelihood that their answer is correct. For these verbalized expressions of uncertainty to be meaningful, they should reflect the error rates at the expressed level of confidence. However, when prompted to express confidence, the error rates of current LLMs are inconsistent with their communicated confidences, highlighting the need for uncertainty quantification methods. Many prior methods calculate lexical uncertainty, estimating a model's confidence in the specific string it generated. In some cases, however, it may be more useful to estimate semantic uncertainty, or the model's confidence in the answer regardless of how it is verbalized. We propose a simple procedure, uncertainty distillation, to teach an LLM to verbalize calibrated semantic confidences. Using held-out data to map initial uncertainty estimates to meaningful probabilities, we create examples annotated with verbalized probabilities for supervised fine-tuning. We demonstrate our method yields verbalized confidences that correlate with observed error rates with a small fine-tuned language model as well as with larger instruction-tuned models, and find that our semantic uncertainty correlates well with lexical uncertainty on short answers.
GenVC: Self-Supervised Zero-Shot Voice Conversion
Cai, Zexin, Xinyuan, Henry Li, Garg, Ashi, Garcรญa-Perera, Leibny Paola, Duh, Kevin, Khudanpur, Sanjeev, Wiesner, Matthew, Andrews, Nicholas
Zero-shot voice conversion has recently made substantial progress, but many models still depend on external supervised systems to disentangle speaker identity and linguistic content. Furthermore, current methods often use parallel conversion, where the converted speech inherits the source utterance's temporal structure, restricting speaker similarity and privacy. To overcome these limitations, we introduce GenVC, a generative zero-shot voice conversion model. GenVC learns to disentangle linguistic content and speaker style in a self-supervised manner, eliminating the need for external models and enabling efficient training on large, unlabeled datasets. Experimental results show that GenVC achieves state-of-the-art speaker similarity while maintaining naturalness competitive with leading approaches. Its autoregressive generation also allows the converted speech to deviate from the source utterance's temporal structure. This feature makes GenVC highly effective for voice anonymization, as it minimizes the preservation of source prosody and speaker characteristics, enhancing privacy protection.
Are Paraphrases Generated by Large Language Models Invertible?
Soto, Rafael Rivera, Chen, Barry, Andrews, Nicholas
Large language models can produce highly fluent paraphrases while retaining much of the original meaning. While this capability has a variety of helpful applications, it may also be abused by bad actors, for example to plagiarize content or to conceal their identity. This motivates us to consider the problem of paraphrase inversion: given a paraphrased document, attempt to recover the original text. To explore the feasibility of this task, we fine-tune paraphrase inversion models, both with and without additional author-specific context to help guide the inversion process. We explore two approaches to author-specific inversion: one using in-context examples of the target author's writing, and another using learned style representations that capture distinctive features of the author's style. We show that, when starting from paraphrased machine-generated text, we can recover significant portions of the document using a learned inversion model. When starting from human-written text, the variety of source writing styles poses a greater challenge for invertability. However, even when the original tokens can't be recovered, we find the inverted text is stylistically similar to the original, which significantly improves the performance of plagiarism detectors and authorship identification systems that rely on stylistic markers.
AnaloBench: Benchmarking the Identification of Abstract and Long-context Analogies
Ye, Xiao, Wang, Andrew, Choi, Jacob, Lu, Yining, Sharma, Shreya, Shen, Lingfeng, Tiyyala, Vijay, Andrews, Nicholas, Khashabi, Daniel
Humans regularly engage in analogical thinking, relating personal experiences to current situations (X is analogous to Y because of Z). Analogical thinking allows humans to solve problems in creative ways, grasp difficult concepts, and articulate ideas more effectively. Can language models (LMs) do the same? To answer this question, we propose AnaloBench, a benchmark to determine analogical reasoning ability in LMs. Our benchmarking approach focuses on aspects of this ability that are common among humans: (i) recalling related experiences from a large amount of information, and (ii) applying analogical reasoning to complex and lengthy scenarios. We test a broad collection of proprietary models (e.g., GPT family, Claude V2) and open source models such as LLaMA2. As in prior results, scaling up LMs results in some performance boosts. Surprisingly, scale offers minimal gains when, (i) analogies involve lengthy scenarios, or (ii) recalling relevant scenarios from a large pool of information, a process analogous to finding a needle in a haystack. We hope these observations encourage further research in this field.
Few-Shot Detection of Machine-Generated Text using Style Representations
Soto, Rafael Rivera, Koch, Kailin, Khan, Aleem, Chen, Barry, Bishop, Marcus, Andrews, Nicholas
The advent of instruction-tuned language models that convincingly mimic human writing poses a significant risk of abuse. For example, such models could be used for plagiarism, disinformation, spam, or phishing. However, such abuse may be counteracted with the ability to detect whether a piece of text was composed by a language model rather than a human. Some previous approaches to this problem have relied on supervised methods trained on corpora of confirmed human and machine-written documents. Unfortunately, model under-specification poses an unavoidable challenge for neural network-based detectors, making them brittle in the face of data shifts, such as the release of further language models producing still more fluent text than the models used to train the detectors. Other previous approaches require access to the models that may have generated a document in question at inference or detection time, which is often impractical. In light of these challenges, we pursue a fundamentally different approach not relying on samples from language models of concern at training time. Instead, we propose to leverage representations of writing style estimated from human-authored text. Indeed, we find that features effective at distinguishing among human authors are also effective at distinguishing human from machine authors, including state of the art large language models like Llama 2, ChatGPT, and GPT-4. Furthermore, given a handful of examples composed by each of several specific language models of interest, our approach affords the ability to predict which model generated a given document.
Learning to Generate Text in Arbitrary Writing Styles
Khan, Aleem, Wang, Andrew, Hager, Sophia, Andrews, Nicholas
Prior work in style-controlled text generation has focused on tasks such as emulating the style of prolific literary authors, producing formal or informal text, and the degree of toxicity of generated text. Plentiful demonstrations of these styles are available, and as a result modern language models are often able to emulate them, either via prompting or discriminative control. However, in applications such as writing assistants, it is desirable for language models to produce text in an author-specific style on the basis of a small writing sample. We find that instruction-tuned language models can struggle to reproduce author-specific style demonstrated in a prompt. Instead, we propose to guide a language model to generate text in a target style using contrastively-trained representations that capture stylometric features. A central challenge in doing so is that an author's writing is characterized by surprising token choices under a generic language model. To reconcile this tension, we combine generative re-scoring to achieve an author-specific model, with discriminative control to ensure style consistency at the sequence-level. The combination of these approaches is found to be particularly effective at adhering to an author-specific style in a variety of conditions, including unconditional generation and style transfer, and is applicable to any underlying language model without requiring fine-tuning.
Can Authorship Representation Learning Capture Stylistic Features?
Wang, Andrew, Aggazzotti, Cristina, Kotula, Rebecca, Soto, Rafael Rivera, Bishop, Marcus, Andrews, Nicholas
Automatically disentangling an author's style from the content of their writing is a longstanding and possibly insurmountable problem in computational linguistics. At the same time, the availability of large text corpora furnished with author labels has recently enabled learning authorship representations in a purely data-driven manner for authorship attribution, a task that ostensibly depends to a greater extent on encoding writing style than encoding content. However, success on this surrogate task does not ensure that such representations capture writing style since authorship could also be correlated with other latent variables, such as topic. In an effort to better understand the nature of the information these representations convey, and specifically to validate the hypothesis that they chiefly encode writing style, we systematically probe these representations through a series of targeted experiments. The results of these experiments suggest that representations learned for the surrogate authorship prediction task are indeed sensitive to writing style. As a consequence, authorship representations may be expected to be robust to certain kinds of data shift, such as topic drift over time. Additionally, our findings may open the door to downstream applications that require stylistic representations, such as style transfer.
Low-Resource Authorship Style Transfer: Can Non-Famous Authors Be Imitated?
Patel, Ajay, Andrews, Nicholas, Callison-Burch, Chris
Authorship style transfer involves altering text to match the style of a target author whilst preserving the original meaning. Existing unsupervised approaches like STRAP have largely focused on style transfer to target authors with many examples of their writing style in books, speeches, or other published works. This high-resource training data requirement (often greater than 100,000 words) makes these approaches primarily useful for style transfer to published authors, politicians, or other well-known figures and authorship styles, while style transfer to non-famous authors has not been well-studied. We introduce the \textit{low-resource authorship style transfer} task, a more challenging class of authorship style transfer where only a limited amount of text in the target author's style may exist. In our experiments, we specifically choose source and target authors from Reddit and style transfer their Reddit posts, limiting ourselves to just 16 posts (on average ~500 words) of the target author's style. Style transfer accuracy is typically measured by how often a classifier or human judge will classify an output as written by the target author. Recent authorship representations models excel at authorship identification even with just a few writing samples, making automatic evaluation of this task possible for the first time through evaluation metrics we propose. Our results establish an in-context learning technique we develop as the strongest baseline, though we find current approaches do not yet achieve mastery of this challenging task. We release our data and implementations to encourage further investigation.
Do Text-to-Text Multi-Task Learners Suffer from Task Conflict?
Mueller, David, Andrews, Nicholas, Dredze, Mark
Traditional multi-task learning architectures train a single model across multiple tasks through a shared encoder followed by task-specific decoders. Learning these models often requires specialized training algorithms that address task-conflict in the shared parameter updates, which otherwise can lead to negative transfer. A new type of multi-task learning within NLP homogenizes multi-task architectures as a shared encoder and language model decoder, which does surprisingly well across a range of diverse tasks. Does this new architecture suffer from task-conflicts that require specialized training algorithms? We study how certain factors in the shift towards text-to-text models affects multi-task conflict and negative transfer, finding that both directional conflict and transfer are surprisingly constant across architectures.
A Deep Metric Learning Approach to Account Linking
Khan, Aleem, Fleming, Elizabeth, Schofield, Noah, Bishop, Marcus, Andrews, Nicholas
We consider the task of linking social media accounts that belong to the same author in an automated fashion on the basis of the content and metadata of their corresponding document streams. We focus on learning an embedding that maps variable-sized samples of user activity -- ranging from single posts to entire months of activity -- to a vector space, where samples by the same author map to nearby points. The approach does not require human-annotated data for training purposes, which allows us to leverage large amounts of social media content. The proposed model outperforms several competitive baselines under a novel evaluation framework modeled after established recognition benchmarks in other domains. Our method achieves high linking accuracy, even with small samples from accounts not seen at training time, a prerequisite for practical applications of the proposed linking framework.