Not enough data to create a plot.
Try a different view from the menu above.
Weld, Daniel S.
Cocoa: Co-Planning and Co-Execution with AI Agents
Feng, K. J. Kevin, Pu, Kevin, Latzke, Matt, August, Tal, Siangliulue, Pao, Bragg, Jonathan, Weld, Daniel S., Zhang, Amy X., Chang, Joseph Chee
We present Cocoa, a system that implements a novel interaction design pattern -- interactive plans -- for users to collaborate with an AI agent on complex, multi-step tasks in a document editor. Cocoa harmonizes human and AI efforts and enables flexible delegation of agency through two actions: Co-planning (where users collaboratively compose a plan of action with the agent) and Co-execution (where users collaboratively execute plan steps with the agent). Using scientific research as a sample domain, we motivate the design of Cocoa through a formative study with 9 researchers while also drawing inspiration from the design of computational notebooks. We evaluate Cocoa through a user study with 16 researchers and find that when compared to a strong chat baseline, Cocoa improved agent steerability without sacrificing ease of use. A deeper investigation of the general utility of both systems uncovered insights into usage contexts where interactive plans may be more appropriate than chat, and vice versa. Our work surfaces numerous practical implications and paves new paths for interactive interfaces that foster more effective collaboration between humans and agentic AI systems.
Scideator: Human-LLM Scientific Idea Generation Grounded in Research-Paper Facet Recombination
Radensky, Marissa, Shahid, Simra, Fok, Raymond, Siangliulue, Pao, Hope, Tom, Weld, Daniel S.
A good idea should be relevant to the scientist's interests and novel within the scientific community. Research papers are a major source of inspiration for relevant and novel ideas, as they expose scientists to relevant concepts to re-combine and form new ideas [4, 21, 36]. However, generating relevant and novel scientific ideas by recombining concepts from research papers is difficult for multiple reasons. For one, scientists must wade through an ever-expanding scientific literature to find relevant concepts [2, 19]. Moreover, the phenomenon of fixation biases scientists against considering more diverse concepts and concept recombinations for their research; instead, they are predisposed to thinking about a problem in familiar terms, which hinders the stimulation of novel ideas [11, 37]. Even if a scientist manages to identify interesting concept recombinations to form potential research ideas, assessing the ideas' novelty in comparison to the existing literature is a cumbersome yet critical task. Building a fully or semi-automated ideation system has been an ambition of researchers for decades, and Scideatorbuilds on strong prior work from many other researchers, filling a unique niche. We extend a line of work that presents systems for finding analogies between research papers [4, 21, 36], adopting their facet-based framework but using modern large language model (LLM) methods to identify relevant facets and perform facet recombinations. We are also inspired by recent work showing that LLMs have promise to assist ideation in domains outside science, helping people to generate more ideas [6] and more diverse ideas [27, 40].
Challenges in Human-Agent Communication
Bansal, Gagan, Vaughan, Jennifer Wortman, Amershi, Saleema, Horvitz, Eric, Fourney, Adam, Mozannar, Hussein, Dibia, Victor, Weld, Daniel S.
Remarkable advancements in modern generative foundation models have enabled the development of sophisticated and highly capable autonomous agents that can observe their environment, invoke tools, and communicate with other agents to solve problems. Although such agents can communicate with users through natural language, their complexity and wide-ranging failure modes present novel challenges for human-AI interaction. Building on prior research and informed by a communication grounding perspective, we contribute to the study of \emph{human-agent communication} by identifying and analyzing twelve key communication challenges that these systems pose. These include challenges in conveying information from the agent to the user, challenges in enabling the user to convey information to the agent, and overarching challenges that need to be considered across all human-agent communication. We illustrate each challenge through concrete examples and identify open directions of research. Our findings provide insights into critical gaps in human-agent communication research and serve as an urgent call for new design patterns, principles, and guidelines to support transparency and control in these systems.
ArxivDIGESTables: Synthesizing Scientific Literature into Tables using Language Models
Newman, Benjamin, Lee, Yoonjoo, Naik, Aakanksha, Siangliulue, Pao, Fok, Raymond, Kim, Juho, Weld, Daniel S., Chang, Joseph Chee, Lo, Kyle
When conducting literature reviews, scientists often create literature review tables - tables whose rows are publications and whose columns constitute a schema, a set of aspects used to compare and contrast the papers. Can we automatically generate these tables using language models (LMs)? In this work, we introduce a framework that leverages LMs to perform this task by decomposing it into separate schema and value generation steps. To enable experimentation, we address two main challenges: First, we overcome a lack of high-quality datasets to benchmark table generation by curating and releasing arxivDIGESTables, a new dataset of 2,228 literature review tables extracted from ArXiv papers that synthesize a total of 7,542 research papers. Second, to support scalable evaluation of model generations against human-authored reference tables, we develop DecontextEval, an automatic evaluation method that aligns elements of tables with the same underlying aspects despite differing surface forms. Given these tools, we evaluate LMs' abilities to reconstruct reference tables, finding this task benefits from additional context to ground the generation (e.g. table captions, in-text references). Finally, through a human evaluation study we find that even when LMs fail to fully reconstruct a reference table, their generated novel aspects can still be useful.
Designing LLM Chains by Adapting Techniques from Crowdsourcing Workflows
Grunde-McLaughlin, Madeleine, Lam, Michelle S., Krishna, Ranjay, Weld, Daniel S., Heer, Jeffrey
LLM chains enable complex tasks by decomposing work into a sequence of sub-tasks. Crowdsourcing workflows similarly decompose complex tasks into smaller tasks for human crowdworkers. Chains address LLM errors analogously to the way crowdsourcing workflows address human error. To characterize opportunities for LLM chaining, we survey 107 papers across the crowdsourcing and chaining literature to construct a design space for chain development. The design space connects an LLM designer's objectives to strategies they can use to achieve those objectives, and tactics to implement each strategy. To explore how techniques from crowdsourcing may apply to chaining, we adapt crowdsourcing workflows to implement LLM chains across three case studies: creating a taxonomy, shortening text, and writing a short story. From the design space and our case studies, we identify which techniques transfer from crowdsourcing to LLM chaining and raise implications for future research and development.
Don't Say What You Don't Know: Improving the Consistency of Abstractive Summarization by Constraining Beam Search
King, Daniel, Shen, Zejiang, Subramani, Nishant, Weld, Daniel S., Beltagy, Iz, Downey, Doug
Abstractive summarization systems today produce fluent and relevant output, but often "hallucinate" statements not supported by the source text. We analyze the connection between hallucinations and training data, and find evidence that models hallucinate because they train on target summaries that are unsupported by the source. Based on our findings, we present PINOCCHIO, a new decoding method that improves the consistency of a transformer-based abstractive summarizer by constraining beam search to avoid hallucinations. Given the model states and outputs at a given step, PINOCCHIO detects likely model hallucinations based on various measures of attribution to the source text. PINOCCHIO backtracks to find more consistent output, and can opt to produce no summary at all when no consistent generation can be found. In experiments, we find that PINOCCHIO improves the consistency of generation (in terms of F1) by an average of~67% on two abstractive summarization datasets.
In Search of Verifiability: Explanations Rarely Enable Complementary Performance in AI-Advised Decision Making
Fok, Raymond, Weld, Daniel S.
The current literature on AI-advised decision making -- involving explainable AI systems advising human decision makers -- presents a series of inconclusive and confounding results. To synthesize these findings, we propose a simple theory that elucidates the frequent failure of AI explanations to engender appropriate reliance and complementary decision making performance. We argue explanations are only useful to the extent that they allow a human decision maker to verify the correctness of an AI's prediction, in contrast to other desiderata, e.g., interpretability or spelling out the AI's reasoning process. Prior studies find in many decision making contexts AI explanations do not facilitate such verification. Moreover, most tasks fundamentally do not allow easy verification, regardless of explanation method, limiting the potential benefit of any type of explanation. We also compare the objective of complementary performance with that of appropriate reliance, decomposing the latter into the notions of outcome-graded and strategy-graded reliance.
A Computational Inflection for Scientific Discovery
Hope, Tom, Downey, Doug, Etzioni, Oren, Weld, Daniel S., Horvitz, Eric
We stand at the foot of a significant inflection in the trajectory of scientific discovery. As society continues on its fast-paced digital transformation, so does humankind's collective scientific knowledge and discourse. We now read and write papers in digitized form, and a great deal of the formal and informal processes of science are captured digitally -- including papers, preprints and books, code and datasets, conference presentations, and interactions in social networks and collaboration and communication platforms. The transition has led to the creation and growth of a tremendous amount of information -- much of which is available for public access -- opening exciting opportunities for computational models and systems that analyze and harness it. In parallel, exponential growth in data processing power has fueled remarkable advances in artificial intelligence, including large neural language models capable of learning powerful representations from unstructured text. Dramatic changes in scientific communication -- such as the advent of the first scientific journal in the 17th century -- have historically catalyzed revolutions in scientific thought. The confluence of societal and computational trends suggests that computer science is poised to ignite a revolution in the scientific process itself.
The Semantic Reader Project: Augmenting Scholarly Documents through AI-Powered Interactive Reading Interfaces
Lo, Kyle, Chang, Joseph Chee, Head, Andrew, Bragg, Jonathan, Zhang, Amy X., Trier, Cassidy, Anastasiades, Chloe, August, Tal, Authur, Russell, Bragg, Danielle, Bransom, Erin, Cachola, Isabel, Candra, Stefan, Chandrasekhar, Yoganand, Chen, Yen-Sung, Cheng, Evie Yu-Yen, Chou, Yvonne, Downey, Doug, Evans, Rob, Fok, Raymond, Hu, Fangzhou, Huff, Regan, Kang, Dongyeop, Kim, Tae Soo, Kinney, Rodney, Kittur, Aniket, Kang, Hyeonsu, Klevak, Egor, Kuehl, Bailey, Langan, Michael, Latzke, Matt, Lochner, Jaron, MacMillan, Kelsey, Marsh, Eric, Murray, Tyler, Naik, Aakanksha, Nguyen, Ngoc-Uyen, Palani, Srishti, Park, Soya, Paulic, Caroline, Rachatasumrit, Napol, Rao, Smita, Sayre, Paul, Shen, Zejiang, Siangliulue, Pao, Soldaini, Luca, Tran, Huy, van Zuylen, Madeleine, Wang, Lucy Lu, Wilhelm, Christopher, Wu, Caroline, Yang, Jiangjiang, Zamarron, Angele, Hearst, Marti A., Weld, Daniel S.
Scholarly publications are key to the transfer of knowledge from scholars to others. However, research papers are information-dense, and as the volume of the scientific literature grows, the need for new technology to support the reading process grows. In contrast to the process of finding papers, which has been transformed by Internet technology, the experience of reading research papers has changed little in decades. The PDF format for sharing research papers is widely used due to its portability, but it has significant downsides including: static content, poor accessibility for low-vision readers, and difficulty reading on mobile devices. This paper explores the question "Can recent advances in AI and HCI power intelligent, interactive, and accessible reading interfaces -- even for legacy PDFs?" We describe the Semantic Reader Project, a collaborative effort across multiple institutions to explore automatic creation of dynamic reading interfaces for research papers. Through this project, we've developed ten research prototype interfaces and conducted usability studies with more than 300 participants and real-world users showing improved reading experiences for scholars. We've also released a production reading interface for research papers that will incorporate the best features as they mature. We structure this paper around challenges scholars and the public face when reading research papers -- Discovery, Efficiency, Comprehension, Synthesis, and Accessibility -- and present an overview of our progress and remaining open challenges.
An Interactive UI to Support Sensemaking over Collections of Parallel Texts
Zhou, Joyce, Glassman, Elena, Weld, Daniel S.
Scientists and science journalists, among others, often need to make sense of a large number of papers and how they compare with each other in scope, focus, findings, or any other important factors. However, with a large corpus of papers, it's cognitively demanding to pairwise compare and contrast them all with each other. Fully automating this review process would be infeasible, because it often requires domain-specific knowledge, as well as understanding what the context and motivations for the review are. While there are existing tools to help with the process of organizing and annotating papers for literature reviews, at the core they still rely on people to serially read through papers and manually make sense of relevant information. We present AVTALER, which combines peoples' unique skills, contextual awareness, and knowledge, together with the strength of automation. Given a set of comparable text excerpts from a paper corpus, it supports users in sensemaking and contrasting paper attributes by interactively aligning text excerpts in a table so that comparable details are presented in a shared column. AVTALER is based on a core alignment algorithm that makes use of modern NLP tools. Furthermore, AVTALER is a mixed-initiative system: users can interactively give the system constraints which are integrated into the alignment construction process.