AITopics

2412.05528

Country:

Europe > Slovenia > Central Slovenia > Municipality of Komenda > Komenda (0.04)
North America > United States > Massachusetts (0.04)
North America > United States > Arizona (0.04)
(3 more...)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.34)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
(6 more...)

Arai, Hidehisa, Ishihara, Keishi, Takahashi, Tsubasa, Yamaguchi, Yu

ACT-Bench: Towards Action Controllable World Models for Autonomous Driving

World models have emerged as promising neural simulators for autonomous driving, with the potential to supplement scarce real-world data and enable closed-loop evaluations. However, current research primarily evaluates these models based on visual realism or downstream task performance, with limited focus on fidelity to specific action instructions - a crucial property for generating targeted simulation scenes. Although some studies address action fidelity, their evaluations rely on closed-source mechanisms, limiting reproducibility. To address this gap, we develop an open-access evaluation framework, ACT-Bench, for quantifying action fidelity, along with a baseline world model, Terra. Our benchmarking framework includes a large-scale dataset pairing short context videos from nuScenes with corresponding future trajectory data, which provides conditional input for generating future video frames and enables evaluation of action fidelity for executed motions. Furthermore, Terra is trained on multiple large-scale trajectory-annotated datasets to enhance action fidelity. Leveraging this framework, we demonstrate that the state-of-the-art model does not fully adhere to given instructions, while Terra achieves improved action fidelity. All components of our benchmark framework will be made publicly available to support future research.

machine learning, natural language, trajectory, (17 more...)

2412.05337

Country: Asia > Middle East > Qatar > Ad-Dawhah > Doha (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry:

Transportation > Ground > Road (0.62)
Information Technology > Robotics & Automation (0.62)
Automobiles & Trucks (0.62)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

Simonds, Toby, Lau, Jey Han, Bandi, Chaithanya

REL: Working out is all you need

Recent developments, particularly OpenAI's O1 model, have demonstrated the remarkable potential of Large Language Models (LLMs) for complex reasoning tasks. Through analysis of O1's outputs and provided sample Chain-of-Thought (CoT) demonstrations, we observe that it approaches problem-solving in a distinctly human-like manner, systematically brainstorming ideas, testing hypotheses, verifying results, and planning comprehensive solutions. These sophisticated reasoning capabilities remain notably absent in other state-of-the-art language models. In this paper, we hypothesize that this performance gap stems from the limited availability of high-quality reasoning process data in current training sets. We demonstrate that by constructing a specialized dataset focused on explicit problem-solving workflows ("worked solutions"), we can elicit substantially improved planning capabilities from existing models. Additionally, we propose the Reasoning Enhancement Loop (REL), a method for generating synthetic worked solutions.

large language model, machine learning, natural language, (16 more...)

2412.04645

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.55)

Bharadwaj, Aryasomayajula Ram

Understanding Hidden Computations in Chain-of-Thought Reasoning

Chain-of-Thought (CoT) prompting has significantly enhanced the reasoning abilities of large language models. However, recent studies have shown that models can still perform complex reasoning tasks even when the CoT is replaced with filler(hidden) characters (e.g., "..."), leaving open questions about how models internally process and represent reasoning steps. In this paper, we investigate methods to decode these hidden characters in transformer models trained with filler CoT sequences. By analyzing layer-wise representations using the logit lens method and examining token rankings, we demonstrate that the hidden characters can be recovered without loss of performance. Our findings provide insights into the internal mechanisms of transformer models and open avenues for improving interpretability and transparency in language model reasoning.

filler token, large language model, machine learning, (17 more...)

2412.04537

Genre: Research Report > New Finding (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.52)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.35)

Gomaa, Eyad, Salah, Gomaa

Guidance is All You Need: Temperature-Guided Reasoning in Large Language Models

We present Quasar-1, a novel architecture that introduces temperature-guided reasoning to large language models through the Token Temperature Mechanism (TTM) and Guided Sequence of Thought (GSoT). Our approach leverages the concept of hot and cold tokens, where hot tokens are prioritized for their contextual relevance, while cold tokens provide supplementary information. This dynamic modulation of token importance enables the model to achieve superior logical reasoning capabilities compared to traditional chain-of-thought approaches. Through rigorous mathematical analysis, we prove that our temperature-guided attention mechanism converges to optimal reasoning paths with exponential guarantees. Empirical results show significant improvements in reasoning accuracy and computational efficiency across a wide range of tasks, making advanced AI reasoning accessible to a broader range of applications.

large language model, machine learning, natural language, (16 more...)

2412.06822

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.66)

Videau, Mathurin, Leite, Alessandro, Schoenauer, Marc, Teytaud, Olivier

Evolutionary Pre-Prompt Optimization for Mathematical Reasoning

However, despite their size and complexity, these models still face challenges in multi-step reasoning, particularly in tasks that require arithmetic, logic, and/or mathematical reasoning [Cobbe et al. 2021; Rae et al. 2021]. To address this limitation, recent works have focused on enhancing the reasoning abilities of LLMs. A significant advancement in this direction is the chain-of-thought (CoT) prompting method [Wei et al. 2022b]. This approach involves guiding LLMs to articulate intermediate reasoning steps in a manner akin to human thought processes, leading to more accurate and interpretable solutions. This method has shown substantial improvements on complex tasks, including mathematics and commonsense reasoning [Lu et al. 2022b; Suzgun et al. 2022; Wei et al. 2022b]. The advancement of the CoT prompting has opened new pathways in the design of effective CoT prompts [Fu et al. 2022; Jiang et al. 2023; Kojima et al. 2022; Zhou et al. 2022].

eppo, evolutionary pre-prompt optimization, publication date, (13 more...)

2412.04291

Country:

Europe > France (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > New York (0.04)
Asia > Indonesia > Bali (0.04)

Genre: Research Report > New Finding (0.67)

Industry: Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
(2 more...)

arXiv.org Artificial IntelligenceDec-4-2024

Navigation World Models

Bar, Amir, Zhou, Gaoyue, Tran, Danny, Darrell, Trevor, LeCun, Yann

Navigation is a fundamental skill of agents with visual-motor capabilities. We introduce a Navigation World Model (NWM), a controllable video generation model that predicts future visual observations based on past observations and navigation actions. To capture complex environment dynamics, NWM employs a Conditional Diffusion Transformer (CDiT), trained on a diverse collection of egocentric videos of both human and robotic agents, and scaled up to 1 billion parameters. In familiar environments, NWM can plan navigation trajectories by simulating them and evaluating whether they achieve the desired goal. Unlike supervised navigation policies with fixed behavior, NWM can dynamically incorporate constraints during planning. Experiments demonstrate its effectiveness in planning trajectories from scratch or by ranking trajectories sampled from an external policy. Furthermore, NWM leverages its learned visual priors to imagine trajectories in unfamiliar environments from a single input image, making it a flexible and powerful tool for next-generation navigation systems.

artificial intelligence, machine learning, trajectory, (19 more...)

2412.03572

Country:

North America > United States > New York (0.04)
Europe > Switzerland (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
(3 more...)

Genre:

Research Report (0.82)
Workflow (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
(2 more...)

arXiv.org Artificial IntelligenceDec-3-2024

VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning

Wu, Xueqing, Ding, Yuheng, Li, Bingxuan, Lu, Pan, Yin, Da, Chang, Kai-Wei, Peng, Nanyun

The ability of large vision-language models (LVLMs) to critique and correct their reasoning is an essential building block towards their self-improvement. However, a systematic analysis of such capabilities in LVLMs is still lacking. We propose VISCO, the first benchmark to extensively analyze the fine-grained critique and correction capabilities of LVLMs. Compared to existing work that uses a single scalar value to critique the entire reasoning [4], VISCO features dense and fine-grained critique, requiring LVLMs to evaluate the correctness of each step in the chain-of-thought and provide natural language explanations to support their judgments. Extensive evaluation of 24 LVLMs demonstrates that human-written critiques significantly enhance the performance after correction, showcasing the potential of the self-improvement strategy. However, the model-generated critiques are less helpful and sometimes detrimental to the performance, suggesting that critique is the crucial bottleneck. We identified three common patterns in critique failures: failure to critique visual perception, reluctance to "say no", and exaggerated assumption of error propagation. To address these issues, we propose an effective LookBack strategy that revisits the image to verify each piece of information in the initial reasoning. LookBack significantly improves critique and correction performance by up to 13.5%.

critique, lvlm, reasoning, (15 more...)

2412.02172

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > Thailand > Bangkok > Bangkok (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)
(2 more...)

arXiv.org Artificial IntelligenceDec-3-2024

BPP-Search: Enhancing Tree of Thought Reasoning for Mathematical Modeling Problem Solving

Wang, Teng, Yu, Wing-Yin, He, Zhenqi, Liu, Zehua, Han, Xiongwei, Gong, Hailei, Wu, Han, Shi, Wei, She, Ruifeng, Zhu, Fangzhou, Zhong, Tao

LLMs exhibit advanced reasoning capabilities, offering the potential to transform natural language questions into mathematical models. However, existing open-source datasets in operations research domain lack detailed annotations of the modeling process, such as variable definitions, focusing solely on objective values, which hinders reinforcement learning applications. To address this, we release the StructuredOR dataset, annotated with comprehensive labels that capture the complete mathematical modeling process. We further propose BPP-Search, a algorithm that integrates reinforcement learning into a tree-of-thought structure using Beam search, a Process reward model, and a pairwise Preference algorithm. This approach enables efficient exploration of tree structures, avoiding exhaustive search while improving accuracy. Extensive experiments on StructuredOR, NL4OPT, and MAMO-ComplexLP datasets show that BPP-Search significantly outperforms state-of-the-art methods. In tree-based reasoning, BPP-Search excels in accuracy and efficiency, enabling faster retrieval of correct solutions.

convention, dataset, modeling process, (13 more...)

2411.17404

Country:

Asia > China > Hong Kong (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
Asia > Middle East > Oman > Muscat Governorate > Muscat (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry: Energy (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

arXiv.org Artificial IntelligenceDec-3-2024

From Isolated Conversations to Hierarchical Schemas: Dynamic Tree Memory Representation for LLMs

Rezazadeh, Alireza, Li, Zichao, Wei, Wei, Bao, Yujia

Recent advancements in large language models have significantly improved their context windows, yet challenges in effective long-term memory management remain. We introduce MemTree, an algorithm that leverages a dynamic, tree-structured memory representation to optimize the organization, retrieval, and integration of information, akin to human cognitive schemas. MemTree organizes memory hierarchically, with each node encapsulating aggregated textual content, corresponding semantic embeddings, and varying abstraction levels across the tree's depths. Our algorithm dynamically adapts this memory structure by computing and comparing semantic embeddings of new and existing information to enrich the model's context-awareness. This approach allows MemTree to handle complex reasoning and extended interactions more effectively than traditional memory augmentation methods, which often rely on flat lookup tables. Evaluations on benchmarks for multi-turn dialogue understanding and document question answering show that MemTree significantly enhances performance in scenarios that demand structured memory management.

information, memtree, node, (16 more...)

2410.14052

Country:

North America > United States > Illinois > Cook County > Chicago (0.05)
Europe > Germany (0.04)
North America > Canada (0.04)
(6 more...)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Sports > Football (1.00)
Banking & Finance (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)