AITopics | winogavil

Collaborating Authors

winogavil

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

WinoGAViL: Gamified Association Benchmark to Challenge Vision-and-Language Models

Neural Information Processing SystemsDec-24-2025, 22:58:07 GMT

While vision-and-language models perform well on tasks such as visual question answering, they struggle when it comes to basic human commonsense reasoning skills. In this work, we introduce WinoGAViL: an online game of vision-and-language associations (e.g., between werewolves and a full moon), used as a dynamic evaluation benchmark. Inspired by the popular card game Codenames, a spymaster gives a textual cue related to several visual candidates, and another player tries to identify them. Human players are rewarded for creating associations that are challenging for a rival AI model but still solvable by other human players. We use the game to collect 3.5K instances, finding that they are intuitive for humans (>90% Jaccard index) but challenging for state-of-the-art AI models, where the best model (ViLT) achieves a score of 52%, succeeding mostly where the cue is visually salient. Our analysis as well as the feedback we collect from players indicate that the collected associations require diverse reasoning skills, including general knowledge, common sense, abstraction, and more. We release the dataset, the code and the interactive game, allowing future data collection that can be used to develop models with better association abilities.

challenge vision-and-language model, gamified association benchmark, winogavil, (4 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games > Computer Games (0.60)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.63)

Add feedback

A Appendix A.1 Dataset Supplementary Materials 1. Dataset documentation, metadata, and download instructions: https: //winogavil. github.io/download. 2

Neural Information Processing SystemsAug-17-2025, 12:33:20 GMT

I liked that it was a puzzle.

artificial intelligence, github, instruction, (15 more...)

Neural Information Processing Systems

Country:

Oceania > New Zealand (0.04)
Oceania > Australia (0.04)

Industry: Education > Educational Setting (0.47)

Technology: Information Technology > Artificial Intelligence > Vision (0.70)

Add feedback

WinoGAViL: Gamified Association Benchmark to Challenge Vision-and-Language Models

Neural Information Processing SystemsJan-18-2025, 11:46:50 GMT

While vision-and-language models perform well on tasks such as visual question answering, they struggle when it comes to basic human commonsense reasoning skills. In this work, we introduce WinoGAViL: an online game of vision-and-language associations (e.g., between werewolves and a full moon), used as a dynamic evaluation benchmark. Inspired by the popular card game Codenames, a spymaster gives a textual cue related to several visual candidates, and another player tries to identify them. Human players are rewarded for creating associations that are challenging for a rival AI model but still solvable by other human players. We use the game to collect 3.5K instances, finding that they are intuitive for humans ( 90% Jaccard index) but challenging for state-of-the-art AI models, where the best model (ViLT) achieves a score of 52%, succeeding mostly where the cue is visually salient.

challenge vision-and-language model, gamified association benchmark, winogavil, (2 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games > Computer Games (0.42)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.98)

Add feedback

Enhancing Advanced Visual Reasoning Ability of Large Language Models

Li, Zhiyuan, Liu, Dongnan, Zhang, Chaoyi, Wang, Heng, Xue, Tengfei, Cai, Weidong

arXiv.org Artificial IntelligenceSep-20-2024

Recent advancements in Vision-Language (VL) research have sparked new benchmarks for complex visual reasoning, challenging models' advanced reasoning ability. Traditional Vision-Language Models (VLMs) perform well in visual perception tasks while struggling with complex reasoning scenarios. Conversely, Large Language Models (LLMs) demonstrate robust text reasoning capabilities; however, they lack visual acuity. To bridge this gap, we propose Complex Visual Reasoning Large Language Models (CVR-LLM), capitalizing on VLMs' visual perception proficiency and LLMs' extensive reasoning capability. Unlike recent multimodal large language models (MLLMs) that require a projection layer, our approach transforms images into detailed, context-aware descriptions using an iterative self-refinement loop and leverages LLMs' text knowledge for accurate predictions without extra training. We also introduce a novel multi-modal in-context learning (ICL) methodology to enhance LLMs' contextual understanding and reasoning. Additionally, we introduce Chain-of-Comparison (CoC), a step-by-step comparison technique enabling contrasting various aspects of predictions. Our CVR-LLM presents the first comprehensive study across a wide array of complex visual reasoning tasks and achieves SOTA performance among all.

image description, llm, reasoning, (15 more...)

arXiv.org Artificial Intelligence

2409.1398

Country: North America > United States > New York (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.52)

Add feedback