reasoning module
6075d47368ddf560e92efd53264b5405-Paper-Conference.pdf
Visual Reasoning (AVR) entails discerning latent patterns in visual data and inferring underlying rules. Existing solutions often lack scalability and adaptability, as deep architectures tend to overfit training data, and static neural networks fail to dynamically capture diverse rules. To tackle the challenges, we propose a Dynamic and Scalable Reasoning Framework (DSRF) that greatly enhances the reasoning ability by widening the network instead of deepening it, and dynamically adjusting the reasoning network to better fit novel samples instead of a static network. Specifically, we design a Multi-View Reasoning Pyramid (MVRP) to capture complex rules through layered reasoning to focus features at each view on distinct combinations of attributes, widening the reasoning network to cover more attribute combinations analogous to complex reasoning rules. Additionally, we propose a Dynamic Domain-Contrast Prediction (DDCP) block to handle varying task-specific relationships dynamically by introducing a Gram matrix to model feature distributions, and a gate matrix to capture subtle domain differences between context and target features. Extensive experiments on six AVR tasks demonstrate DSRF's superior performance, achieving state-of-the-art results under various settings. Code is available here: https://github.com/UNNCRoxLi/DSRF.
TART: A plug-and-play Transformer module for task-agnostic reasoning
Large language models (LLMs) exhibit in-context learning abilities which enable the same model to perform several tasks without any task-specific training. In contrast, traditional adaptation approaches, such as fine-tuning, modify the underlying models for each specific task. In-context learning, however, consistently underperforms task-specific tuning approaches even when presented with the same examples. While most existing approaches (e.g., prompt engineering) focus on the LLM's learned representations to patch this performance gap, our experiments actually reveal that LLM representations contain sufficient information to make good predictions. As such, we focus on the LLM's reasoning abilities and demonstrate that this performance gap exists due to their inability to perform simple probabilistic reasoning tasks. This raises an intriguing question: Are LLMs actually capable of learning how to reason in a task-agnostic manner? We answer this in the affirmative and, as a proof of concept, propose TART which generically improves an LLM's reasoning abilities using a synthetically trained reasoning module.
TART: A plug-and-play Transformer module for task-agnostic reasoning
Large language models (LLMs) exhibit in-context learning abilities which enable the same model to perform several tasks without any task-specific training. In contrast, traditional adaptation approaches, such as fine-tuning, modify the underlying models for each specific task. In-context learning, however, consistently underperforms task-specific tuning approaches even when presented with the same examples. While most existing approaches (e.g., prompt engineering) focus on the LLM's learned representations to patch this performance gap, our experiments actually reveal that LLM representations contain sufficient information to make good predictions. As such, we focus on the LLM's reasoning abilities and demonstrate that this performance gap exists due to their inability to perform simple probabilistic reasoning tasks. This raises an intriguing question: Are LLMs actually capable of learning how to reason in a task-agnostic manner? We answer this in the affirmative and, as a proof of concept, propose TART which generically improves an LLM's reasoning abilities using a synthetically trained reasoning module.