Supporting Sensemaking of Large Language Model Outputs at Scale
Gero, Katy Ilonka, Swoopes, Chelse, Gu, Ziwei, Kummerfeld, Jonathan K., Glassman, Elena L.
–arXiv.org Artificial Intelligence
While several tools have recently been developed for structuring the generation of prompts and collecting responses [4, 41, 45], relatively little effort has been expended to help either end-users or designers of LLM-backed systems to reason about or make use of the variation seen in the multiple responses generated. We see utility in helping users make sense of multiple LLM responses. For instance, users may want to select the best option from among many, compose their own response through bricolage, consider many ideas during ideation, audit a model by looking at the variety of possible responses, or compare the functionality of different models or prompts. However, representing LLM responses at a scale necessary to see the distribution of possibilities also creates a condition where relevant variation may be hidden in plain sight: within a wall of similar text. One could turn to automatic analysis measures, but we constrain ourselves to showing the entirety of the text itself, as this does not constrict (by top-down design or button-up computation) which variations will be most useful to the user.
arXiv.org Artificial Intelligence
Jan-24-2024
- Country:
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Genre:
- Questionnaire & Opinion Survey (1.00)
- Research Report
- Experimental Study > Negative Result (0.67)
- New Finding (1.00)
- Industry:
- Education > Educational Setting (0.46)
- Health & Medicine (1.00)
- Technology: