analogy question
Visalogy: Answering Visual Analogy Questions
Fereshteh Sadeghi, C. Lawrence Zitnick, Ali Farhadi
In this paper, we study the problem of answering visual analogy questions. These questions take the form of image A is to image B as image C is to what . Answering these questions entails discovering the mapping from image A to image B and then extending the mapping to image C and searching for the image D such that the relation from A to B holds for C to D. We pose this problem as learning an embedding that encourages pairs of analogous images with similar transformations to be close together using convolutional neural networks with a quadruple Siamese architecture. We introduce a dataset of visual analogy questions in natural images, and show first results of its kind on solving analogy questions on natural images.
VOILA: Evaluation of MLLMs For Perceptual Understanding and Analogical Reasoning
Yilmaz, Nilay, Patel, Maitreya, Luo, Yiran Lawrence, Gokhale, Tejas, Baral, Chitta, Jayasuriya, Suren, Yang, Yezhou
Multimodal Large Language Models (MLLMs) have become a powerful tool for integrating visual and textual information. Despite their exceptional performance on visual understanding benchmarks, measuring their ability to reason abstractly across multiple images remains a significant challenge. To address this, we introduce VOILA, a large-scale, open-ended, dynamic benchmark designed to evaluate MLLMs' perceptual understanding and abstract relational reasoning. VOILA employs an analogical mapping approach in the visual domain, requiring models to generate an image that completes an analogy between two given image pairs, reference and application, without relying on predefined choices. Our experiments demonstrate that the analogical reasoning tasks in VOILA present a challenge to MLLMs. Through multi-step analysis, we reveal that current MLLMs struggle to comprehend inter-image relationships and exhibit limited capabilities in high-level relational reasoning. Notably, we observe that performance improves when following a multi-step strategy of least-to-most prompting. Comprehensive evaluations on open-source models and GPT-4o show that on text-based answers, the best accuracy for challenging scenarios is 13% (LLaMa 3.2) and even for simpler tasks is only 29% (GPT-4o), while human performance is significantly higher at 70% across both difficulty levels.
Export Reviews, Discussions, Author Feedback and Meta-Reviews
We thank reviewers for acknowledging the novelty and interestingness of our paper as well as our results. We appreciate reviewers insightful comments; we will incorporate them in any final version of the paper. R2,R6: Two-classifier and attribute-based baseline Direct application of the suggested two-classifier baseline is not appropriate as we explain below. We run a modified version of it and the obtained results show that our method outperforms the suggested baseline. For solving general analogy questions, the set of properties and categories are not known at test time (Line 261, Figs 3&5).
Answering Visual Analogy Questions
In this paper, we study the problem of answering visual analogy questions. These questions take the form of image A is to image B as image C is to what. Answering these questions entails discovering the mapping from image A to image B and then extending the mapping to image C and searching for the image D such that the relation from A to B holds for C to D. We pose this problem as learning an embedding that encourages pairs of analogous images with similar transformations to be close together using convolutional neural networks with a quadruple Siamese architecture. We introduce a dataset of visual analogy questions in natural images, and show first results of its kind on solving analogy questions on natural images.
RelBERT: Embedding Relations with Language Models
Ushio, Asahi, Camacho-Collados, Jose, Schockaert, Steven
Many applications need access to background knowledge about how different concepts and entities are related. Although Knowledge Graphs (KG) and Large Language Models (LLM) can address this need to some extent, KGs are inevitably incomplete and their relational schema is often too coarse-grained, while LLMs are inefficient and difficult to control. As an alternative, we propose to extract relation embeddings from relatively small language models. In particular, we show that masked language models such as RoBERTa can be straightforwardly fine-tuned for this purpose, using only a small amount of training data. The resulting model, which we call RelBERT, captures relational similarity in a surprisingly fine-grained way, allowing us to set a new state-of-the-art in analogy benchmarks. Crucially, RelBERT is capable of modelling relations that go well beyond what the model has seen during training. For instance, we obtained strong results on relations between named entities with a model that was only trained on lexical relations between concepts, and we observed that RelBERT can recognise morphological analogies despite not being trained on such examples. Overall, we find that RelBERT significantly outperforms strategies based on prompting language models that are several orders of magnitude larger, including recent GPT-based models and open source models.
DiaLex: A Benchmark for Evaluating Multidialectal Arabic Word Embeddings
Abdul-Mageed, Muhammad, Elbassuoni, Shady, Doughman, Jad, Elmadany, AbdelRahim, Nagoudi, El Moatez Billah, Zoughby, Yorgo, Shaher, Ahmad, Gaba, Iskander, Helal, Ahmed, El-Razzaz, Mohammed
Word embeddings are a core component of modern natural language processing systems, making the ability to thoroughly evaluate them a vital task. We describe DiaLex, a benchmark for intrinsic evaluation of dialectal Arabic word embedding. DiaLex covers five important Arabic dialects: Algerian, Egyptian, Lebanese, Syrian, and Tunisian. Across these dialects, DiaLex provides a testbank for six syntactic and semantic relations, namely male to female, singular to dual, singular to plural, antonym, comparative, and genitive to past tense. DiaLex thus consists of a collection of word pairs representing each of the six relations in each of the five dialects. To demonstrate the utility of DiaLex, we use it to evaluate a set of existing and new Arabic word embeddings that we developed. Our benchmark, evaluation code, and new word embedding models will be publicly available.
Visalogy: Answering Visual Analogy Questions
Sadeghi, Fereshteh, Zitnick, C. Lawrence, Farhadi, Ali
In this paper, we study the problem of answering visual analogy questions. These questions take the form of image A is to image B as image C is to what. Answering these questions entails discovering the mapping from image A to image B and then extending the mapping to image C and searching for the image D such that the relation from A to B holds for C to D. We pose this problem as learning an embedding that encourages pairs of analogous images with similar transformations to be close together using convolutional neural networks with a quadruple Siamese architecture. We introduce a dataset of visual analogy questions in natural images, and show first results of its kind on solving analogy questions on natural images.
Learning Word Representations from Relational Graphs
Bollegala, Danushka (The University of Liverpool) | Maehara, Takanori (National Institute of Informatics) | Yoshida, Yuichi (National Institute of Informatics) | Kawarabayashi, Ken-ichi (National Institute of Informatics)
Attributes of words and relations between two words are central to numerous tasks in Artificial Intelligence such as knowledge representation, similarity measurement, and analogy detection. Often when two words share one or more attributes in common, they are con- nected by some semantic relations. On the other hand, if there are numerous semantic relations between two words, we can expect some of the attributes of one of the words to be inherited by the other. Motivated by this close connection between attributes and relations, given a relational graph in which words are inter-connected via numerous semantic relations, we propose a method to learn a latent representation for the individual words. The proposed method considers not only the co-occurrences of words as done by existing approaches for word representation learning, but also the semantic relations in which two words co-occur. To evaluate the accuracy of the word representations learnt using the proposed method, we use the learnt word representa- tions to solve semantic word analogy problems. Our experimental results show that it is possible to learn better word representations by using semantic semantics between words.
Solving and Explaining Analogy Questions Using Semantic Networks
Boteanu, Adrian (Worcester Polytechnic Institute) | Chernova, Sonia (Worcester Polytechnic Institute)
Analogies are a fundamental human reasoning pattern that relies on relational similarity. Understanding how analogies are formed facilitates the transfer of knowledge between contexts. The approach presented in this work focuses on obtaining precise interpretations of analogies. We leverage noisy semantic networks to answer and explain a wide spectrum of analogy questions. The core of our contribution, the Semantic Similarity Engine, consists of methods for extracting and comparing graph-contexts that reveal the relational parallelism that analogies are based on, while mitigating uncertainty in the semantic network. We demonstrate these methods in two tasks: answering multiple choice analogy questions and generating human readable analogy explanations. We evaluate our approach on two datasets totaling 600 analogy questions. Our results show reliable performance and low false-positive rate in question answering; human evaluators agreed with 96% of our analogy explanations.
Solving Semantic Problems Using Contexts Extracted from Knowledge Graphs
Boteanu, Adrian (Worcester Polytechnic Institute)
This thesis seeks to address word reasoning problems from a semantic standpoint, proposing a uniform approach for generating solutions while also providing human-understandable explanations. Current state of the art solvers of semantic problems rely on traditional machine learning methods. Therefore their results are not easily reusable by algorithms or interpretable by humans. We propose leveraging web-scale knowledge graphs to determine a semantic frame of interpretation. Semantic knowledge graphs are graphs in which nodes represent concepts and the edges represent the relations between them. Our approach has the following advantages: (1) it reduces the space in which the problem is to be solved; (2) sparse and noisy data can be used without relying only on the relations deducible from the data itself; (3) the output of the inference algorithm is supported by an interpretable justification. We demonstrate our approach in two domains: (1) Topic Modeling: We form topics using connectivity in semantic graphs. We use the same topic models for two very different recommendation systems, one designed for high noise interactive applications and the other for large amounts of web data. (2) Analogy Solving: For humans, analogies are a fundamental reasoning pattern, which relies on abstraction and comparative analysis. In order for an analogy to be understood, precise relations have to be identified and mapped. We introduce graph algorithms to assess the analogy strength in contexts derived from the analogy words. We demonstrate our approach by solving standardized test analogy question.