Goto

Collaborating Authors

 Swoopes, Chelse


CorpusStudio: Surfacing Emergent Patterns in a Corpus of Prior Work while Writing

arXiv.org Artificial Intelligence

Many communities, including the scientific community, develop implicit writing norms. Understanding them is crucial for effective communication with that community. Writers gradually develop an implicit understanding of norms by reading papers and receiving feedback on their writing. However, it is difficult to both externalize this knowledge and apply it to one's own writing. We propose two new writing support concepts that reify document and sentence-level patterns in a given text corpus: (1) an ordered distribution over section titles and (2) given the user's draft and cursor location, many retrieved contextually relevant sentences. Recurring words in the latter are algorithmically highlighted to help users see any emergent norms. Study results (N=16) show that participants revised the structure and content using these concepts, gaining confidence in aligning with or breaking norms after reviewing many examples. These results demonstrate the value of reifying distributions over other authors' writing choices during the writing process.


Supporting Sensemaking of Large Language Model Outputs at Scale

arXiv.org Artificial Intelligence

While several tools have recently been developed for structuring the generation of prompts and collecting responses [4, 41, 45], relatively little effort has been expended to help either end-users or designers of LLM-backed systems to reason about or make use of the variation seen in the multiple responses generated. We see utility in helping users make sense of multiple LLM responses. For instance, users may want to select the best option from among many, compose their own response through bricolage, consider many ideas during ideation, audit a model by looking at the variety of possible responses, or compare the functionality of different models or prompts. However, representing LLM responses at a scale necessary to see the distribution of possibilities also creates a condition where relevant variation may be hidden in plain sight: within a wall of similar text. One could turn to automatic analysis measures, but we constrain ourselves to showing the entirety of the text itself, as this does not constrict (by top-down design or button-up computation) which variations will be most useful to the user.


ChainForge: A Visual Toolkit for Prompt Engineering and LLM Hypothesis Testing

arXiv.org Artificial Intelligence

Evaluating outputs of large language models (LLMs) is challenging, requiring making -- and making sense of -- many responses. Yet tools that go beyond basic prompting tend to require knowledge of programming APIs, focus on narrow domains, or are closed-source. We present ChainForge, an open-source visual toolkit for prompt engineering and on-demand hypothesis testing of text generation LLMs. ChainForge provides a graphical interface for comparison of responses across models and prompt variations. Our system was designed to support three tasks: model selection, prompt template design, and hypothesis testing (e.g., auditing). We released ChainForge early in its development and iterated on its design with academics and online users. Through in-lab and interview studies, we find that a range of people could use ChainForge to investigate hypotheses that matter to them, including in real-world settings. We identify three modes of prompt engineering and LLM hypothesis testing: opportunistic exploration, limited evaluation, and iterative refinement.


Accurate, Explainable, and Private Models: Providing Recourse While Minimizing Training Data Leakage

arXiv.org Artificial Intelligence

Machine learning models are increasingly utilized across impactful domains to predict individual outcomes. As such, many models provide algorithmic recourse to individuals who receive negative outcomes. However, recourse can be leveraged by adversaries to disclose private information. This work presents the first attempt at mitigating such attacks. We present two novel methods to generate differentially private recourse: Differentially Private Model (DPM) and Laplace Recourse (LR). Using logistic regression classifiers and real world and synthetic datasets, we find that DPM and LR perform well in reducing what an adversary can infer, especially at low FPR. When training dataset size is large enough, we find particular success in preventing privacy leakage while maintaining model and recourse accuracy with our novel LR method.