questioner
Chatting Makes Perfect: Chat-based Image Retrieval Supplementary Material
In Appendix A, we start by showing more qualitative results of chats and their retrieval results, and BLIP2 chats compared to a human answerer. Next, in Appendix B, we present the few shot instructional prompts that were used by different LLMs for generating follow-up questions. Another example in Figure 2 describes two trains, searched by the text "A train that is parked next to another train". Figure 3 demonstrates a case where the description "a small and dirty kitchen with pots and food everywhere" is ambiguous, subjective to the viewer and may match many images in the corpus. In Figure 4 we show an example of a dialog between ChatIR and a human.
- North America > United States (0.04)
- Asia > Middle East > Israel (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)
- Africa > Central African Republic > Ombella-M'Poko > Bimbo (0.04)
Answerer in Questioner's Mind: Information Theoretic Approach to Goal-Oriented Visual Dialog
Goal-oriented dialog has been given attention due to its numerous applications in artificial intelligence. Goal-oriented dialogue tasks occur when a questioner asks an action-oriented question and an answerer responds with the intent of letting the questioner know a correct action to take. To ask the adequate question, deep learning and reinforcement learning have been recently applied. However, these approaches struggle to find a competent recurrent neural questioner, owing to the complexity of learning a series of sentences. Motivated by theory of mind, we propose Answerer in Questioner's Mind (AQM), a novel information theoretic algorithm for goal-oriented dialog. With AQM, a questioner asks and infers based on an approximated probabilistic model of the answerer.
Answerer in Questioner's Mind: Information Theoretic Approach to Goal-Oriented Visual Dialog
Goal-oriented dialog has been given attention due to its numerous applications in artificial intelligence. Goal-oriented dialogue tasks occur when a questioner asks an action-oriented question and an answerer responds with the intent of letting the questioner know a correct action to take. To ask the adequate question, deep learning and reinforcement learning have been recently applied. However, these approaches struggle to find a competent recurrent neural questioner, owing to the complexity of learning a series of sentences. Motivated by theory of mind, we propose Answerer in Questioner's Mind (AQM), a novel information theoretic algorithm for goal-oriented dialog. With AQM, a questioner asks and infers based on an approximated probabilistic model of the answerer.
- Asia > South Korea > Seoul > Seoul (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
MUG-Eval: A Proxy Evaluation Framework for Multilingual Generation Capabilities in Any Language
Song, Seyoung, Jeong, Seogyeong, Kim, Eunsu, Jin, Jiho, Kim, Dongkwan, Shin, Jay, Oh, Alice
Evaluating text generation capabilities of large language models (LLMs) is challenging, particularly for low-resource languages where methods for direct assessment are scarce. We propose MUG-Eval, a novel framework that evaluates LLMs' multilingual generation capabilities by transforming existing benchmarks into conversational tasks and measuring the LLMs' accuracies on those tasks. We specifically designed these conversational tasks to require effective communication in the target language. Then, we simply use task success rate as a proxy for successful conversation generation. Our approach offers two key advantages: it is independent of language-specific NLP tools or annotated datasets, which are limited for most languages, and it does not rely on LLMs-as-judges, whose evaluation quality degrades outside a few high-resource languages. We evaluate 8 LLMs across 30 languages spanning high, mid, and low-resource categories, and we find that MUG-Eval correlates strongly with established benchmarks ($r$ > 0.75) while enabling standardized comparisons across languages and models. Our framework provides a robust and resource-efficient solution for evaluating multilingual generation that can be extended to thousands of languages.
Asking Clarifying Questions for Preference Elicitation With Large Language Models
Montazeralghaem, Ali, Tennenholtz, Guy, Boutilier, Craig, Meshi, Ofer
Large Language Models (LLMs) have made it possible for recommendation systems to interact with users in open-ended conversational interfaces. In order to personalize LLM responses, it is crucial to elicit user preferences, especially when there is limited user history. One way to get more information is to present clarifying questions to the user. However, generating effective sequential clarifying questions across various domains remains a challenge. To address this, we introduce a novel approach for training LLMs to ask sequential questions that reveal user preferences. Our method follows a two-stage process inspired by diffusion models. Starting from a user profile, the forward process generates clarifying questions to obtain answers and then removes those answers step by step, serving as a way to add ``noise'' to the user profile. The reverse process involves training a model to ``denoise'' the user profile by learning to ask effective clarifying questions. Our results show that our method significantly improves the LLM's proficiency in asking funnel questions and eliciting user preferences effectively.
- Europe > Italy (0.05)
- North America > United States > California > Santa Clara County > Mountain View (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)
- Africa > Central African Republic > Ombella-M'Poko > Bimbo (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.98)
- Information Technology > Artificial Intelligence > Vision (0.94)
- Information Technology > Sensing and Signal Processing > Image Processing (0.90)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)
Instruction-tuned Self-Questioning Framework for Multimodal Reasoning
Jang, You-Won, Heo, Yu-Jung, Kim, Jaeseok, Lee, Minsu, Chang, Du-Seong, Zhang, Byoung-Tak
The field of vision-language understanding has been actively researched in recent years, thanks to the development of Large Language Models~(LLMs). However, it still needs help with problems requiring multi-step reasoning, even for very simple questions. Recent studies adopt LLMs to tackle this problem by iteratively generating sub-questions and answers. However, there are disadvantages such as 1) the fine-grained visual contents of images are not available using LLMs that cannot read visual information, 2) internal mechanisms are inaccessible and difficult to reproduce by using black-box LLMs. To solve these problems, we propose the SQ (Self-Questioning)-InstructBLIP, which improves inference performance by generating image-aware informative sub-questions and sub-answers iteratively. The SQ-InstructBLIP, which consists of a Questioner, Answerer, and Reasoner that share the same architecture. Questioner and Answerer generate sub-questions and sub-answers to help infer the main-question, and Reasoner performs reasoning on the main-question considering the generated sub-question information. Our experiments show that the proposed method SQ-InstructBLIP, which uses the generated sub-questions as additional information when solving the VQA task, performs more accurate reasoning than the previous works.