semicircle
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Sweden > Värmland County > Karlstad (0.04)
- Europe > France (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
WISE: Weighted Iterative Society-of-Experts for Robust Multimodal Multi-Agent Debate
Cherian, Anoop, Doyle, River, Ben-Dov, Eyal, Lohit, Suhas, Peng, Kuan-Chuan
Recent large language models (LLMs) are trained on diverse corpora and tasks, leading them to develop complementary strengths. Multi-agent debate (MAD) has emerged as a popular way to leverage these strengths for robust reasoning, though it has mostly been applied to language-only tasks, leaving its efficacy on multimodal problems underexplored. In this paper, we study MAD for solving vision-and-language reasoning problems. Our setup enables generalizing the debate protocol with heterogeneous experts that possess single- and multi-modal capabilities. To this end, we present Weighted Iterative Society-of-Experts (WISE), a generalized and modular MAD framework that partitions the agents into Solvers, that generate solutions, and Reflectors, that verify correctness, assign weights, and provide natural language feedback. To aggregate the agents' solutions across debate rounds, while accounting for variance in their responses and the feedback weights, we present a modified Dawid-Skene algorithm for post-processing that integrates our two-stage debate model. We evaluate WISE on SMART-840, VisualPuzzles, EvoChart-QA, and a new SMART-840++ dataset with programmatically generated problem instances of controlled difficulty. Our results show that WISE consistently improves accuracy by 2-7% over the state-of-the-art MAD setups and aggregation methods across diverse multimodal tasks and LLM configurations.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Separation of Unconscious Robots with Obstructed Visibility
Pyati, Prajyot, Kaur, Navjot, Jana, Saswata, Bhattacharya, Adri, Mandal, Partha Sarathi
We study a recently introduced \textit{unconscious} mobile robot model, where each robot is associated with a \textit{color}, which is visible to other robots but not to itself. The robots are autonomous, anonymous, oblivious and silent, operating in the Euclidean plane under the conventional \textit{Look-Compute-Move} cycle. A primary task in this model is the \textit{separation problem}, where unconscious robots sharing the same color must separate from others, forming recognizable geometric shapes such as circles, points, or lines. All prior works model the robots as \textit{transparent}, enabling each to know the positions and colors of all other robots. In contrast, we model the robots as \textit{opaque}, where a robot can obstruct the visibility of two other robots, if it lies on the line segment between them. Under this obstructed visibility, we consider a variant of the separation problem in which robots, starting from any arbitrary initial configuration, are required to separate into concentric semicircles. We present a collision-free algorithm that solves the separation problem under a semi-synchronous scheduler in $O(n)$ epochs, where $n$ is the number of robots. The robots agree on one coordinate axis but have no knowledge of $n$.
- Asia > India (0.14)
- Europe > Switzerland (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Sweden > Värmland County > Karlstad (0.04)
- Europe > France (0.04)
Virgo: A Preliminary Exploration on Reproducing o1-like MLLM
Du, Yifan, Liu, Zikang, Li, Yifan, Zhao, Wayne Xin, Huo, Yuqi, Wang, Bingning, Chen, Weipeng, Liu, Zheng, Wang, Zhongyuan, Wen, Ji-Rong
Recently, slow-thinking reasoning systems, built upon large language models (LLMs), have garnered widespread attention by scaling the thinking time during inference. There is also growing interest in adapting this capability to multimodal large language models (MLLMs). Given that MLLMs handle more complex data semantics across different modalities, it is intuitively more challenging to implement multimodal slow-thinking systems. To address this issue, in this paper, we explore a straightforward approach by fine-tuning a capable MLLM with a small amount of textual long-form thought data, resulting in a multimodal slow-thinking system, Virgo (Visual reasoning with long thought). We find that these long-form reasoning processes, expressed in natural language, can be effectively transferred to MLLMs. Moreover, it seems that such textual reasoning data can be even more effective than visual reasoning data in eliciting the slow-thinking capacities of MLLMs. While this work is preliminary, it demonstrates that slow-thinking capacities are fundamentally associated with the language model component, which can be transferred across modalities or domains. This finding can be leveraged to guide the development of more powerful slow-thinking reasoning systems. We release our resources at https://github.com/RUCAIBox/Virgo.
G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model
Gao, Jiahui, Pi, Renjie, Zhang, Jipeng, Ye, Jiacheng, Zhong, Wanjun, Wang, Yufei, Hong, Lanqing, Han, Jianhua, Xu, Hang, Li, Zhenguo, Kong, Lingpeng
Large language models (LLMs) have shown remarkable proficiency in human-level reasoning and generation capabilities, which encourages extensive research on their application in mathematical problem solving. However, current work has been largely focused on text-based mathematical problems, with limited investigation in problems involving geometric information. Addressing this gap, we aim to enable LLMs to solve geometric problems by understanding image input. We first analyze the limitations of current Multimodal Large Language Models (MLLMs) in this area: they struggle to accurately comprehending basic geometric elements and their relationships. To overcome these challenges, we take advantage of the unique characteristics of geometric problems (such as unique geometric logical form, and geometric scalability) and the capacity of the textual LLMs to build an enriched multimodal geometry dataset based on existing data. The augmented dataset, Geo170K, contains more than 170K geometric image-caption and question-answer pairs. Utilizing our constructed Geo170K dataset, we develop G-LLaVA, which demonstrates exceptional performance in solving geometric problems, significantly outperforming GPT-4-V on the MathVista benchmark with only 7B parameters.
- Asia > China > Hong Kong (0.04)
- North America > United States > New York (0.04)
- Asia > Middle East > Jordan (0.04)
Wild cockatoos excel in intelligence tests, countering theory living with humans makes birds smarter
A longheld theory that animals raised in captivity perform better in cognitive testing may need to be rethought. A new study organized by the University of Veterinary Medicine in Vienna found evidence that wild animals perform just as well at intelligence tests as their lab-raised counterparts. To test the theory, researchers compared two groups of Goffin's cockatoos, a species often found in the tropical jungles of Singapore, Indonesia, and Puerto Rico. The team compared a lab-raised'colony' of 11 cockatoos at their lab in Vienna to eight wild cockatoos recently taken into captivity at a field laboratory in Indonesia. The researchers compared the performance of both groups in a series of simple problem solving tests and found the wild cockatoos were just as clever as the lab-raised ones.
- Europe > Austria > Vienna (0.49)
- Asia > Indonesia (0.48)
- North America > Puerto Rico (0.26)
- Asia > Singapore (0.26)