answerer
Answerer in Questioner's Mind: Information Theoretic Approach to Goal-Oriented Visual Dialog
Goal-oriented dialog has been given attention due to its numerous applications in artificial intelligence. Goal-oriented dialogue tasks occur when a questioner asks an action-oriented question and an answerer responds with the intent of letting the questioner know a correct action to take. To ask the adequate question, deep learning and reinforcement learning have been recently applied. However, these approaches struggle to find a competent recurrent neural questioner, owing to the complexity of learning a series of sentences. Motivated by theory of mind, we propose Answerer in Questioner's Mind (AQM), a novel information theoretic algorithm for goal-oriented dialog. With AQM, a questioner asks and infers based on an approximated probabilistic model of the answerer.
- Europe > Switzerland > Zürich > Zürich (0.14)
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
- Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.72)
- Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.72)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Chatting Makes Perfect: Chat-based Image Retrieval Supplementary Material
In Appendix A, we start by showing more qualitative results of chats and their retrieval results, and BLIP2 chats compared to a human answerer. Next, in Appendix B, we present the few shot instructional prompts that were used by different LLMs for generating follow-up questions. Another example in Figure 2 describes two trains, searched by the text "A train that is parked next to another train". Figure 3 demonstrates a case where the description "a small and dirty kitchen with pots and food everywhere" is ambiguous, subjective to the viewer and may match many images in the corpus. In Figure 4 we show an example of a dialog between ChatIR and a human.
- North America > United States (0.04)
- Asia > Middle East > Israel (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)
- Africa > Central African Republic > Ombella-M'Poko > Bimbo (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
Answerer in Questioner's Mind: Information Theoretic Approach to Goal-Oriented Visual Dialog
Goal-oriented dialog has been given attention due to its numerous applications in artificial intelligence. Goal-oriented dialogue tasks occur when a questioner asks an action-oriented question and an answerer responds with the intent of letting the questioner know a correct action to take. To ask the adequate question, deep learning and reinforcement learning have been recently applied. However, these approaches struggle to find a competent recurrent neural questioner, owing to the complexity of learning a series of sentences. Motivated by theory of mind, we propose Answerer in Questioner's Mind (AQM), a novel information theoretic algorithm for goal-oriented dialog. With AQM, a questioner asks and infers based on an approximated probabilistic model of the answerer.
- Asia > South Korea > Seoul > Seoul (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
MUG-Eval: A Proxy Evaluation Framework for Multilingual Generation Capabilities in Any Language
Song, Seyoung, Jeong, Seogyeong, Kim, Eunsu, Jin, Jiho, Kim, Dongkwan, Shin, Jay, Oh, Alice
Evaluating text generation capabilities of large language models (LLMs) is challenging, particularly for low-resource languages where methods for direct assessment are scarce. We propose MUG-Eval, a novel framework that evaluates LLMs' multilingual generation capabilities by transforming existing benchmarks into conversational tasks and measuring the LLMs' accuracies on those tasks. We specifically designed these conversational tasks to require effective communication in the target language. Then, we simply use task success rate as a proxy for successful conversation generation. Our approach offers two key advantages: it is independent of language-specific NLP tools or annotated datasets, which are limited for most languages, and it does not rely on LLMs-as-judges, whose evaluation quality degrades outside a few high-resource languages. We evaluate 8 LLMs across 30 languages spanning high, mid, and low-resource categories, and we find that MUG-Eval correlates strongly with established benchmarks ($r$ > 0.75) while enabling standardized comparisons across languages and models. Our framework provides a robust and resource-efficient solution for evaluating multilingual generation that can be extended to thousands of languages.
LLMs Show Surface-Form Brittleness Under Paraphrase Stress Tests
Benchmark scores for Large Language Models (LLMs) can be inflated by memorization of test items or near duplicates. We present a simple, protocol that probes generalization by re-evaluating models on paraphrased versions of benchmark questions. Using Mistral-7B-Instruct and Qwen2.5-7B-Instruct, we measure the accuracy gap between original and paraphrased items on ARC-Easy and ARC-Challenge. Our pipeline controls decoding, enforces multiple-choice output format, and includes a robust paraphrase-cleaning step to preserve semantics. We find that paraphrasing induces a non-trivial accuracy drop (original vs. paraphrased), consistent with prior concerns about contamination and brittle surface-form shortcuts.
- Europe > Switzerland > Zürich > Zürich (0.14)
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
- Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)