Healey, Jennifer
Fast or Better? Balancing Accuracy and Cost in Retrieval-Augmented Generation with Flexible User Control
Su, Jinyan, Healey, Jennifer, Nakov, Preslav, Cardie, Claire
Retrieval-Augmented Generation (RAG) has emerged as a powerful approach to mitigate large language model (LLM) hallucinations by incorporating external knowledge retrieval. However, existing RAG frameworks often apply retrieval indiscriminately,leading to inefficiencies-over-retrieving when unnecessary or failing to retrieve iteratively when required for complex reasoning. Recent adaptive retrieval strategies, though adaptively navigates these retrieval strategies, predict only based on query complexity and lacks user-driven flexibility, making them infeasible for diverse user application needs. In this paper, we introduce a novel user-controllable RAG framework that enables dynamic adjustment of the accuracy-cost trade-off. Our approach leverages two classifiers: one trained to prioritize accuracy and another to prioritize retrieval efficiency. Via an interpretable control parameter $\alpha$, users can seamlessly navigate between minimal-cost retrieval and high-accuracy retrieval based on their specific requirements. We empirically demonstrate that our approach effectively balances accuracy, retrieval cost, and user controllability, making it a practical and adaptable solution for real-world applications.
TextLap: Customizing Language Models for Text-to-Layout Planning
Chen, Jian, Zhang, Ruiyi, Zhou, Yufan, Healey, Jennifer, Gu, Jiuxiang, Xu, Zhiqiang, Chen, Changyou
Automatic generation of graphical layouts is crucial for many real-world applications, including designing posters, flyers, advertisements, and graphical user interfaces. Given the incredible ability of Large language models (LLMs) in both natural language understanding and generation, we believe that we could customize an LLM to help people create compelling graphical layouts starting with only text instructions from the user. We call our method TextLap (text-based layout planning). It uses a curated instruction-based layout planning dataset (InsLap) to customize LLMs as a graphic designer. We demonstrate the effectiveness of TextLap and show that it outperforms strong baselines, including GPT-4 based methods, for image generation and graphical design benchmarks.
Evaluating Nuanced Bias in Large Language Model Free Response Answers
Healey, Jennifer, Byrum, Laurie, Akhtar, Md Nadeem, Sinha, Moumita
Pre-trained large language models (LLMs) can now be easily adapted for specific business purposes using custom prompts or fine tuning. These customizations are often iteratively re-engineered to improve some aspect of performance, but after each change businesses want to ensure that there has been no negative impact on the system's behavior around such critical issues as bias. Prior methods of benchmarking bias use techniques such as word masking and multiple choice questions to assess bias at scale, but these do not capture all of the nuanced types of bias that can occur in free response answers, the types of answers typically generated by LLM systems. In this paper, we identify several kinds of nuanced bias in free text that cannot be similarly identified by multiple choice tests. We describe these as: confidence bias, implied bias, inclusion bias and erasure bias. We present a semi-automated pipeline for detecting these types of bias by first eliminating answers that can be automatically classified as unbiased and then co-evaluating name reversed pairs using crowd workers. We believe that the nuanced classifications our method generates can be used to give better feedback to LLMs, especially as LLM reasoning capabilities become more advanced.
Automatic Layout Planning for Visually-Rich Documents with Instruction-Following Models
Zhu, Wanrong, Healey, Jennifer, Zhang, Ruiyi, Wang, William Yang, Sun, Tong
Recent advancements in instruction-following models have made user interactions with models more user-friendly and efficient, broadening their applicability. In graphic design, non-professional users often struggle to create visually appealing layouts due to limited skills and resources. In this work, we introduce a novel multimodal instruction-following framework for layout planning, allowing users to easily arrange visual elements into tailored layouts by specifying canvas size and design purpose, such as for book covers, posters, brochures, or menus. We developed three layout reasoning tasks to train the model in understanding and executing layout instructions. Experiments on two benchmarks show that our method not only simplifies the design process for non-professionals but also surpasses the performance of few-shot GPT-4V models, with mIoU higher by 12% on Crello. This progress highlights the potential of multimodal instruction-following models to automate and simplify the design process, providing an approachable solution for a wide range of design tasks on visually-rich documents.
Gaud\'i: Conversational Interactions with Deep Representations to Generate Image Collections
Bursztyn, Victor S., Healey, Jennifer, Vinay, Vishwa
Right: A mood-board created by a professional designer using Gaudí for the given project briefing: "You're designing a new ecofriendly, highend coffee brand that is notorious for its floral flavors." All images are from the BAM dataset [6]. Gaudí was developed to help designers search for inspirational images using natural language. In the early stages of the design process, designers will typically create thematic image collections called "mood-boards" (example shown in Figure 1) in order to elicit and clarify a client's preferred creative direction. Creating a mood-board involves sequential image searches which are currently performed using keywords or images. Gaudí transforms this process into a conversation where the user is gradually detailing the mood-board's theme. This representation allows our AI to generate new search queries from scratch, straight from a project's briefing, following a hypothesized mood. Previous computational approaches to this process tend to oversimplify the decision space, seeking to define it by hard coded qualities like dominant color, saturation and brightness [3, 2]. Recent advances in realistic language modeling (e.g., with GPT-3 [1]) and cross-modal image retrieval (e.g., with CLIP [5]) now allow us to represent image collections in a much richer semantic space, acknowledging richer variation in the stories designers tell when presenting a creative direction to a client.
"It doesn't look good for a date": Transforming Critiques into Preferences for Conversational Recommendation Systems
Bursztyn, Victor S., Healey, Jennifer, Lipka, Nedim, Koh, Eunyee, Downey, Doug, Birnbaum, Larry
Conversations aimed at determining good recommendations are iterative in nature. People often express their preferences in terms of a critique of the current recommendation (e.g., "It doesn't look good for a date"), requiring some degree of common sense for a preference to be inferred. In this work, we present a method for transforming a user critique into a positive preference (e.g., "I prefer more romantic") in order to retrieve reviews pertaining to potentially better recommendations (e.g., "Perfect for a romantic dinner"). We leverage a large neural language model (LM) in a few-shot setting to perform critique-to-preference transformation, and we test two methods for retrieving recommendations: one that matches embeddings, and another that fine-tunes an LM for the task. We instantiate this approach in the restaurant domain and evaluate it using a new dataset of restaurant critiques. In an ablation study, we show that utilizing critique-to-preference transformation improves recommendations, and that there are at least three general cases that explain this improved performance.