AITopics | Problem Solving

Collaborating Authors

Problem Solving

News Overviews Instructional Materials AI-Alerts Classics

Understanding Factual Recall in Transformers via Associative Memories

Nichani, Eshaan, Lee, Jason D., Bietti, Alberto

arXiv.org Machine LearningDec-9-2024

Large language models have demonstrated an impressive ability to perform factual recall. Prior work has found that transformers trained on factual recall tasks can store information at a rate proportional to their parameter count. In our work, we show that shallow transformers can use a combination of associative memories to obtain such near optimal storage capacity. We begin by proving that the storage capacities of both linear and MLP associative memories scale linearly with parameter count. We next introduce a synthetic factual recall task, and prove that a transformer with a single layer of self-attention followed by an MLP can obtain 100% accuracy on the task whenever either the total number of self-attention parameters or MLP parameters scales (up to log factors) linearly with the number of facts. In particular, the transformer can trade off between using the value matrices or the MLP as an associative memory to store the dataset of facts. We complement these expressivity results with an analysis of the gradient flow trajectory of a simplified linear attention model trained on our factual recall task, where we show that the model exhibits sequential learning behavior.

alpha, machine learning, natural language, (16 more...)

arXiv.org Machine Learning

2412.06538

Country:

North America > United States (0.14)
Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.49)

Technology:

Information Technology > Artificial Intelligence > Systems & Languages > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Add feedback

o1-Coder: an o1 Replication for Coding

Zhang, Yuxiang, Wu, Shangxi, Yang, Yuqi, Shu, Jiangming, Xiao, Jinlin, Kong, Chao, Sang, Jitao

arXiv.org Artificial IntelligenceDec-9-2024

The technical report introduces O1-CODER, an attempt to replicate OpenAI's o1 model with a focus on coding tasks. It integrates reinforcement learning (RL) and Monte Carlo Tree Search (MCTS) to enhance the model's System-2 thinking capabilities. The framework includes training a Test Case Generator (TCG) for standardized code testing, using MCTS to generate code data with reasoning processes, and iteratively fine-tuning the policy model to initially produce pseudocode and then generate the full code. The report also addresses the opportunities and challenges in deploying o1-like models in real-world applications, suggesting transitioning to the System-2 paradigm and highlighting the imperative for world model construction. Updated model progress and experimental results will be reported in subsequent versions. OpenAI recently introduced the o1 model (OpenAI, 2024), which has demonstrated impressive system-2 thinking capabilities. This model represents a significant advancement in AI's ability to perform complex reasoning tasks that require higher-order cognitive functions. Following its release, numerous analysis and replication efforts have emerged, demonstrating the growing interest in reasoning models. Prior to the o1 model, large language models (LLMs) primarily exhibited System-1 capabilities, characterized by fast, intuitive responses.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2412.00154

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Beijing > Beijing (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)

Genre:

Workflow (1.00)
Research Report (0.82)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.65)

Add feedback

MM-PoE: Multiple Choice Reasoning via. Process of Elimination using Multi-Modal Models

Chakrabarty, Sayak, Pal, Souradip

arXiv.org Artificial IntelligenceDec-9-2024

This paper introduces Multiple Choice Reasoning via. Process of Elimination using Multi-Modal models, herein referred to as Multi-Modal Process of Elimination (MM-PoE). This novel methodology is engineered to augment the efficacy of Vision-Language Models (VLMs) in multiple-choice visual reasoning tasks. Diverging from conventional approaches that evaluate each option independently, MM-PoE employs a dual-step scoring paradigm that initially identifies and excludes implausible choices, subsequently concentrating on the most probable remaining options. This method emulates human test-taking strategies, where individuals typically eliminate clearly incorrect answers prior to selecting the optimal response. Our empirical evaluations, conducted across three benchmark datasets, reveal that MM-PoE significantly improves both zero-shot and few-shot performance of contemporary state-of-the-art VLMs. Critically, this approach not only broadens the application of the elimination process to multi-modal contexts but also allows few-shot experiments, thereby addressing two principal limitations concerning usage of PoE only in zero-shot settings and only with a language-only framework. As a result, MM-PoE not only refines the reasoning capabilities of VLMs but also broadens their applicability to complex visual question-answering scenarios. All code and documentation supporting our work are available at https://pypi.org/project/mm-poe/, enabling researchers and practitioners to easily integrate and further develop these techniques.

computational linguistic, large language model, natural language, (14 more...)

arXiv.org Artificial Intelligence

2412.07148

Country:

North America > United States > West Virginia (0.05)
North America > United States > Louisiana (0.04)
North America > United States > Oklahoma (0.04)
(4 more...)

Genre: Research Report (0.82)

Industry: Education (0.85)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.67)

Add feedback

System 2 Reasoning via Generality and Adaptation

Kim, Sejin, Kim, Sundong

arXiv.org Artificial IntelligenceDec-9-2024

While significant progress has been made in task-specific applications, current models struggle with deep reasoning, generality, and adaptation -- key components of System 2 reasoning that are crucial for achieving Artificial General Intelligence (AGI). Despite the promise of approaches such as program synthesis, language models, and transformers, these methods often fail to generalize beyond their training data and to adapt to novel tasks, limiting their ability to perform human-like reasoning. This paper explores the limitations of existing approaches in achieving advanced System 2 reasoning and highlights the importance of generality and adaptation for AGI. Moreover, we propose four key research directions to address these gaps: (1) learning human intentions from action sequences, (2) combining symbolic and neural models, (3) meta-learning for unfamiliar environments, and (4) reinforcement learning to reason multi-step. Through these directions, we aim to advance the ability to generalize and adapt, bringing computational models closer to the reasoning capabilities required for AGI.

large language model, logic & formal reasoning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2410.07866

Country:

North America > Montserrat (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Industry:

Education (0.68)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.97)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
(2 more...)

Add feedback

How Transformers Solve Propositional Logic Problems: A Mechanistic Analysis

Hong, Guan Zhe, Dikkala, Nishanth, Luo, Enming, Rashtchian, Cyrus, Wang, Xin, Panigrahy, Rina

arXiv.org Artificial IntelligenceDec-9-2024

Large language models (LLMs) have shown amazing performance on tasks that require planning and reasoning. Motivated by this, we investigate the internal mechanisms that underpin a network's ability to perform complex logical reasoning. We first construct a synthetic propositional logic problem that serves as a concrete test-bed for network training and evaluation. Crucially, this problem demands nontrivial planning to solve. We perform our study on two fronts. First, we pursue an understanding of precisely how a three-layer transformer, trained from scratch and attains perfect test accuracy, solves this problem. We are able to identify certain "planning" and "reasoning" mechanisms in the network that necessitate cooperation between the attention blocks to implement the desired logic. Second, we study how pretrained LLMs, namely Mistral-7B and Gemma-2-9B, solve this problem. We characterize their reasoning circuits through causal intervention experiments, providing necessity and sufficiency evidence for the circuits. We find evidence suggesting that the two models' latent reasoning strategies are surprisingly similar, and human-like. Overall, our work systemically uncovers novel aspects of small and large transformers, and continues the study of how they plan and reason.

activation, attention head, attention weight, (16 more...)

arXiv.org Artificial Intelligence

2411.04105

Country:

North America > United States (0.04)
North America > Mexico > Mexico City > Mexico City (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Perception Tokens Enhance Visual Reasoning in Multimodal Language Models

Bigverdi, Mahtab, Luo, Zelun, Hsieh, Cheng-Yu, Shen, Ethan, Chen, Dongping, Shapiro, Linda G., Krishna, Ranjay

arXiv.org Artificial IntelligenceDec-8-2024

Multimodal language models (MLMs) still face challenges in fundamental visual perception tasks where specialized models excel. Tasks requiring reasoning about 3D structures benefit from depth estimation, and reasoning about 2D object instances benefits from object detection. Yet, MLMs can not produce intermediate depth or boxes to reason over. Finetuning MLMs on relevant data doesn't generalize well and outsourcing computation to specialized vision tools is too compute-intensive and memory-inefficient. To address this, we introduce Perception Tokens, intrinsic image representations designed to assist reasoning tasks where language is insufficient. Perception tokens act as auxiliary reasoning tokens, akin to chain-of-thought prompts in language models. For example, in a depth-related task, an MLM augmented with perception tokens can reason by generating a depth map as tokens, enabling it to solve the problem effectively. We propose AURORA, a training method that augments MLMs with perception tokens for improved reasoning over visual inputs. AURORA leverages a VQVAE to transform intermediate image representations, such as depth maps into a tokenized format and bounding box tokens, which is then used in a multi-task training framework. AURORA achieves notable improvements across counting benchmarks: +10.8% on BLINK, +11.3% on CVBench, and +8.3% on SEED-Bench, outperforming finetuning approaches in generalization across datasets. It also improves on relative depth: over +6% on BLINK. With perception tokens, AURORA expands the scope of MLMs beyond language-based reasoning, paving the way for more effective visual reasoning capabilities.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2412.03548

Country:

North America > United States > Massachusetts (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)
(3 more...)

Add feedback

HiBO: Hierarchical Bayesian Optimization via Adaptive Search Space Partitioning

Li, Wenxuan, Wang, Taiyi, Yoneki, Eiko

arXiv.org Artificial IntelligenceDec-8-2024

Optimizing black-box functions in high-dimensional search spaces has been known to be challenging for traditional Bayesian Optimization (BO). In this paper, we introduce HiBO, a novel hierarchical algorithm integrating global-level search space partitioning information into the acquisition strategy of a local BO-based optimizer. HiBO employs a search-tree-based global-level navigator to adaptively split the search space into partitions with different sampling potential. The local optimizer then utilizes this global-level information to guide its acquisition strategy towards most promising regions within the search space. A comprehensive set of evaluations demonstrates that HiBO outperforms state-of-the-art methods in high-dimensional synthetic benchmarks and presents significant practical effectiveness in the real-world task of tuning configurations of database management systems (DBMSs).

artificial intelligence, configuration, search space, (16 more...)

arXiv.org Artificial Intelligence

2410.23148

Country:

North America > United States > California > Alameda County > Oakland (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Add feedback

A Comprehensive Survey and Guide to Multimodal Large Language Models in Vision-Language Tasks

Liang, Chia Xin, Tian, Pu, Yin, Caitlyn Heqi, Yua, Yao, An-Hou, Wei, Ming, Li, Wang, Tianyang, Bi, Ziqian, Liu, Ming

arXiv.org Artificial IntelligenceDec-8-2024

This survey and application guide to multimodal large language models(MLLMs) explores the rapidly developing field of MLLMs, examining their architectures, applications, and impact on AI and Generative Models. Starting with foundational concepts, we delve into how MLLMs integrate various data types, including text, images, video and audio, to enable complex AI systems for cross-modal understanding and generation. It covers essential topics such as training methods, architectural components, and practical applications in various fields, from visual storytelling to enhanced accessibility. Through detailed case studies and technical analysis, the text examines prominent MLLM implementations while addressing key challenges in scalability, robustness, and cross-modal learning. Concluding with a discussion of ethical considerations, responsible AI development, and future directions, this authoritative resource provides both theoretical frameworks and practical insights. It offers a balanced perspective on the opportunities and challenges in the development and deployment of MLLMs, and is highly valuable for researchers, practitioners, and students interested in the intersection of natural language processing and computer vision.

large language model, machine learning, natural language, (25 more...)

arXiv.org Artificial Intelligence

2411.06284

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Indiana (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
(5 more...)

Genre:

Workflow (1.00)
Research Report > Promising Solution (1.00)
Overview (1.00)
(2 more...)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)
Information Technology > Services (1.00)
(9 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(8 more...)

Add feedback

Asymptotically Optimal Sampling-Based Path Planning Using Bidirectional Guidance Heuristic

Wang, Yi, Mu, Bingxian

arXiv.org Artificial IntelligenceDec-7-2024

This paper introduces Bidirectional Guidance Informed Trees (BIGIT*),~a new asymptotically optimal sampling-based motion planning algorithm. Capitalizing on the strengths of \emph{meet-in-the-middle} property in bidirectional heuristic search with a new lazy strategy, and uniform-cost search, BIGIT* constructs an implicitly bidirectional preliminary motion tree on an implicit random geometric graph (RGG). This efficiently tightens the informed search region, serving as an admissible and accurate bidirectional guidance heuristic. This heuristic is subsequently utilized to guide a bidirectional heuristic search in finding a valid path on the given RGG. Experiments show that BIGIT* outperforms the existing informed sampling-based motion planners both in faster finding an initial solution and converging to the optimum on simulated abstract problems in $\mathbb{R}^{16}$. Practical drone flight path planning tasks across a campus also verify our results.

algorithm, artificial intelligence, planning & scheduling, (16 more...)

arXiv.org Artificial Intelligence

2412.05754

Country:

Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
Africa > Togo (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.90)

Add feedback

MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale

Guo, Jarvis, Zheng, Tuney, Bai, Yuelin, Li, Bo, Wang, Yubo, Zhu, King, Li, Yizhi, Neubig, Graham, Chen, Wenhu, Yue, Xiang

arXiv.org Artificial IntelligenceDec-6-2024

Open-source multimodal large language models (MLLMs) have shown significant potential in a broad range of multimodal tasks. However, their reasoning capabilities remain constrained by existing instruction-tuning datasets, which were predominately repurposed from academic datasets such as VQA, AI2D, and ChartQA. These datasets target simplistic tasks, and only provide phrase-level answers without any intermediate rationales. To address these challenges, we introduce a scalable and cost-effective method to construct a large-scale multimodal instruction-tuning dataset with rich intermediate rationales designed to elicit CoT reasoning. Using only open models, we create a dataset containing 12M instruction-response pairs to cover diverse, reasoning-intensive tasks with detailed and faithful rationales. Experiments demonstrate that training MLLMs on this dataset significantly improves reasoning capabilities, achieving state-of-the-art performance on benchmarks such as MathVerse (+8.1%), MMMU-Pro (+7%), and MuirBench (+13.3%). Additionally, the model demonstrates notable improvements of up to 4% on non-reasoning-based benchmarks. Ablation studies further highlight the importance of key components, such as rewriting and self-filtering, in the dataset construction process.

arxiv preprint, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2412.05237

Country:

North America > United States (1.00)
Europe (1.00)
Asia (1.00)

Genre: Research Report > New Finding (0.67)

Industry:

Health & Medicine (1.00)
Government > Military (1.00)
Banking & Finance (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(3 more...)

Add feedback