AITopics | Problem Solving

Collaborating Authors

Problem Solving

News Overviews Instructional Materials AI-Alerts Classics

One-shot World Models Using a Transformer Trained on a Synthetic Prior

Ferreira, Fabio, Schlageter, Moreno, Rajan, Raghu, Biedenkapp, Andre, Hutter, Frank

arXiv.org Artificial IntelligenceOct-24-2024

A World Model is a compressed spatial and temporal representation of a real world environment that allows one to train an agent or execute planning methods. However, world models are typically trained on observations from the real world environment, and they usually do not enable learning policies for other real environments. We propose One-Shot World Model (OSWM), a transformer world model that is learned in an in-context learning fashion from purely synthetic data sampled from a prior distribution. Our prior is composed of multiple randomly initialized neural networks, where each network models the dynamics of each state and reward dimension of a desired target environment. We adopt the supervised learning procedure of Prior-Fitted Networks by masking next-state and reward at random context positions and query OSWM to make probabilistic predictions based on the remaining transition context. During inference time, OSWM is able to quickly adapt to the dynamics of a simple grid world, as well as the CartPole gym and a custom control environment by providing 1k transition steps as context and is then able to successfully train environment-solving agent policies. However, transferring to more complex environments remains a challenge, currently. Despite these limitations, we see this work as an important stepping-stone in the pursuit of learning world models purely from synthetic data.

artificial intelligence, machine learning, oswm, (17 more...)

arXiv.org Artificial Intelligence

2409.14084

Country:

Europe > Germany > Baden-Württemberg > Freiburg (0.05)
Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)

Genre: Research Report (0.83)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)

Add feedback

Markov Chain of Thought for Efficient Mathematical Reasoning

Yang, Wen, Fan, Kai, Liao, Minpeng

arXiv.org Artificial IntelligenceOct-23-2024

Chain of Thought (CoT) of multi-step benefits from the logical structure of the reasoning steps and task-specific actions, significantly enhancing the mathematical reasoning capabilities of large language models. As the prevalence of long CoT, the number of reasoning steps exceeds manageable token limits and leads to higher computational demands. Inspired by the fundamental logic of human cognition, ``derive, then reduce'', we conceptualize the standard multi-step CoT as a novel Markov Chain of Thought (MCoT). In this study, we consider the mathematical reasoning task, defining each reasoning step as text accompanied by a Python code snippet. To facilitate a longer reasoning path, self-correction is enabled through interactions with the code interpreter. Our MCoT aims to compress previous reasoning steps into a simplified question, enabling efficient next-step inference without relying on a lengthy KV cache. In our experiments, we curate the \texttt{MCoTInstruct} dataset, and the empirical results indicate that MCoT not only significantly enhances efficiency but also maintains comparable accuracy. While much remains to be explored, this work paves the way for exploring the long CoT reasoning abilities of LLMs.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2410.17635

Genre: Research Report > New Finding (0.54)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(3 more...)

Add feedback

Proof of Thought : Neurosymbolic Program Synthesis allows Robust and Interpretable Reasoning

Ganguly, Debargha, Iyengar, Srinivasan, Chaudhary, Vipin, Kalyanaraman, Shivkumar

arXiv.org Artificial IntelligenceOct-23-2024

Large Language Models (LLMs) have revolutionized natural language processing, yet they struggle with inconsistent reasoning, particularly in novel domains and complex logical sequences. This research introduces Proof of Thought, a framework that enhances the reliability and transparency of LLM outputs. Our approach bridges LLM-generated ideas with formal logic verification, employing a custom interpreter to convert LLM outputs into First Order Logic constructs for theorem prover scrutiny. Central to our method is an intermediary JSON-based Domain-Specific Language, which by design balances precise logical structures with intuitive human concepts. This hybrid representation enables both rigorous validation and accessible human comprehension of LLM reasoning processes. Key contributions include a robust type system with sort management for enhanced logical integrity, explicit representation of rules for clear distinction between factual and inferential knowledge, and a flexible architecture that allows for easy extension to various domain-specific applications. We demonstrate Proof of Thought's effectiveness through benchmarking on StrategyQA and a novel multimodal reasoning task, showing improved performance in open-ended scenarios. By providing verifiable and interpretable results, our technique addresses critical needs for AI system accountability and sets a foundation for human-in-the-loop oversight in high-stakes domains.

large language model, logic & formal reasoning, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2409.1727

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
(2 more...)

Add feedback

A Theoretical Understanding of Chain-of-Thought: Coherent Reasoning and Error-Aware Demonstration

Cui, Yingqian, He, Pengfei, Tang, Xianfeng, He, Qi, Luo, Chen, Tang, Jiliang, Xing, Yue

arXiv.org Machine LearningOct-21-2024

Few-shot Chain-of-Thought (CoT) prompting has demonstrated strong performance in improving the reasoning capabilities of large language models (LLMs). While theoretical investigations have been conducted to understand CoT, the underlying transformer used in these studies isolates the CoT reasoning process into separated in-context learning steps (Stepwise ICL). In this work, we theoretically show that, compared to Stepwise ICL, the transformer gains better error correction ability and more accurate predictions if the reasoning from earlier steps (Coherent CoT) is integrated. Given that this coherent reasoning changes the behavior of the transformer, we further investigate the sensitivity of the transformer with Coherent CoT when the demonstration examples are corrupted at the inference stage. Our theoretical results indicate that the transformer is more sensitive to errors in intermediate reasoning steps than the final outcome. Building upon this observation, we propose an improvement on CoT by incorporating both correct and incorrect reasoning paths in the demonstration. Our experiments validate the effectiveness of the proposed approach.

large language model, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

2410.1654

Country: North America > United States > Michigan (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.89)

Add feedback

SMART: Self-learning Meta-strategy Agent for Reasoning Tasks

Liu, Rongxing, Shridhar, Kumar, Prajapat, Manish, Xia, Patrick, Sachan, Mrinmaya

arXiv.org Artificial IntelligenceOct-21-2024

Tasks requiring deductive reasoning, especially those involving multiple steps, often demand adaptive strategies such as intermediate generation of rationales or programs, as no single approach is universally optimal. While Language Models (LMs) can enhance their outputs through iterative self-refinement and strategy adjustments, they frequently fail to apply the most effective strategy in their first attempt. This inefficiency raises the question: Can LMs learn to select the optimal strategy in the first attempt, without a need for refinement? To address this challenge, we introduce SMART (Self-learning Meta-strategy Agent for Reasoning Tasks), a novel framework that enables LMs to autonomously learn and select the most effective strategies for various reasoning tasks. We model the strategy selection process as a Markov Decision Process and leverage reinforcement learning-driven continuous self-improvement to allow the model to find the suitable strategy to solve a given task. Unlike traditional self-refinement methods that rely on multiple inference passes or external feedback, SMART allows an LM to internalize the outcomes of its own reasoning processes and adjust its strategy accordingly, aiming for correct solutions on the first attempt. Our experiments across various reasoning datasets and with different model architectures demonstrate that SMART significantly enhances the ability of models to choose optimal strategies without external guidance (+15 points on the GSM8K dataset). By achieving higher accuracy with a single inference pass, SMART not only improves performance but also reduces computational costs for refinement-based strategies, paving the way for more efficient and intelligent reasoning in LMs.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2410.16128

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)

Genre:

Research Report (0.65)
Workflow (0.46)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.63)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.49)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.49)

Add feedback

Integrating Reinforcement Learning with Foundation Models for Autonomous Robotics: Methods and Perspectives

Moroncelli, Angelo, Soni, Vishal, Shahid, Asad Ali, Maccarini, Marco, Forgione, Marco, Piga, Dario, Spahiu, Blerina, Roveda, Loris

arXiv.org Artificial IntelligenceOct-21-2024

Foundation models (FMs), large deep learning models pre-trained on vast, unlabeled datasets, exhibit powerful capabilities in understanding complex patterns and generating sophisticated outputs. However, they often struggle to adapt to specific tasks. Reinforcement learning (RL), which allows agents to learn through interaction and feedback, offers a compelling solution. Integrating RL with FMs enables these models to achieve desired outcomes and excel at particular tasks. Additionally, RL can be enhanced by leveraging the reasoning and generalization capabilities of FMs. This synergy is revolutionizing various fields, including robotics. FMs, rich in knowledge and generalization, provide robots with valuable information, while RL facilitates learning and adaptation through real-world interactions. This survey paper comprehensively explores this exciting intersection, examining how these paradigms can be integrated to advance robotic intelligence. We analyze the use of foundation models as action planners, the development of robotics-specific foundation models, and the mutual benefits of combining FMs with RL. Furthermore, we present a taxonomy of integration approaches, including large language models, vision-language models, diffusion models, and transformer-based RL models. We also explore how RL can utilize world representations learned from FMs to enhance robotic task execution. Our survey aims to synthesize current research and highlight key challenges in robotic reasoning and control, particularly in the context of integrating FMs and RL--two rapidly evolving technologies. By doing so, we seek to spark future research and emphasize critical areas that require further investigation to enhance robotics. We provide an updated collection of papers based on our taxonomy, accessible on our open-source project website at: https://github.com/clmoro/Robotics-RL-FMs-Integration.

large language model, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2410.16411

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
Europe > Switzerland (0.04)
(4 more...)

Genre:

Research Report > Promising Solution (1.00)
Overview (1.00)

Industry:

Energy (0.67)
Leisure & Entertainment > Games > Computer Games (0.46)
Education > Educational Setting (0.45)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(3 more...)

Add feedback

Accelerated Sub-Image Search For Variable-Size Patches Identification Based On Virtual Time Series Transformation And Segmentation

Plessen, Mogens

arXiv.org Artificial IntelligenceOct-20-2024

This paper addresses two tasks: (i) fixed-size objects such as hay bales are to be identified in an aerial image for a given reference image of the object, and (ii) variable-size patches such as areas on fields requiring spot spraying or other handling are to be identified in an image for a given small-scale reference image. Both tasks are related. The second differs in that identified sub-images similar to the reference image are further clustered before patches contours are determined by solving a traveling salesman problem. Both tasks are complex in that the exact number of similar sub-images is not known a priori. The main discussion of this paper is presentation of an acceleration mechanism for sub-image search that is based on a transformation of an image to multivariate time series along the RGB-channels and subsequent segmentation to reduce the 2D search space in the image. Two variations of the acceleration mechanism are compared to exhaustive search on diverse synthetic and real-world images. Quantitatively, proposed method results in solve time reductions of up to 2 orders of magnitude, while qualitatively delivering comparative visual results. Proposed method is neural network-free and does not use any image pre-processing.

artificial intelligence, machine learning, pattern recognition, (19 more...)

arXiv.org Artificial Intelligence

2410.15425

Country:

North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
Europe > Switzerland (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (0.64)
(2 more...)

Add feedback

Neural Search Space in Gboard Decoder

Zhang, Yanxiang, Zhang, Yuanbo, Sun, Haicheng, Wang, Yun, Dou, Billy, Sivek, Gary, Zhai, Shumin

arXiv.org Artificial IntelligenceOct-20-2024

Gboard Decoder produces suggestions by looking for paths that best match input touch points on the context aware search space, which is backed by the language Finite State Transducers (FST). The language FST is currently an N-gram language model (LM). However, N-gram LMs, limited in context length, are known to have sparsity problem under device model size constraint. In this paper, we propose \textbf{Neural Search Space} which substitutes the N-gram LM with a Neural Network LM (NN-LM) and dynamically constructs the search space during decoding. Specifically, we integrate the long range context awareness of NN-LM into the search space by converting its outputs given context, into the language FST at runtime. This involves language FST structure redesign, pruning strategy tuning, and data structure optimizations. Online experiments demonstrate improved quality results, reducing Words Modified Ratio by [0.26\%, 1.19\%] on various locales with acceptable latency increases. This work opens new avenues for further improving keyboard decoding quality by enhancing neural LM more directly.

artificial intelligence, nn-lm, search space, (14 more...)

arXiv.org Artificial Intelligence

2410.15575

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Spain (0.04)
Europe > Portugal (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Add feedback

EVA: An Embodied World Model for Future Video Anticipation

Chi, Xiaowei, Zhang, Hengyuan, Fan, Chun-Kai, Qi, Xingqun, Zhang, Rongyu, Chen, Anthony, Chan, Chi-min, Xue, Wei, Luo, Wenhan, Zhang, Shanghang, Guo, Yike

arXiv.org Artificial IntelligenceOct-20-2024

World models integrate raw data from various modalities, such as images and language to simulate comprehensive interactions in the world, thereby displaying crucial roles in fields like mixed reality and robotics. Yet, applying the world model for accurate video prediction is quite challenging due to the complex and dynamic intentions of the various scenes in practice. In this paper, inspired by the human rethinking process, we decompose the complex video prediction into four meta-tasks that enable the world model to handle this issue in a more fine-grained manner. Alongside these tasks, we introduce a new benchmark named Embodied Video Anticipation Benchmark (EVA-Bench) to provide a well-rounded evaluation. EVA-Bench focused on evaluating the video prediction ability of human and robot actions, presenting significant challenges for both the language model and the generation model. Targeting embodied video prediction, we propose the Embodied Video Anticipator (EVA), a unified framework aiming at video understanding and generation. EVA integrates a video generation model with a visual language model, effectively combining reasoning capabilities with high-quality generation. Moreover, to enhance the generalization of our framework, we tailor-designed a multi-stage pretraining paradigm that adaptatively ensembles LoRA to produce high-fidelity results. Extensive experiments on EVA-Bench highlight the potential of EVA to significantly improve performance in embodied scenes, paving the way for large-scale pre-trained models in real-world prediction tasks.

dataset, prediction, video, (15 more...)

arXiv.org Artificial Intelligence

2410.15461

Country:

Europe > Netherlands > North Holland > Amsterdam (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Hong Kong (0.04)

Genre:

Research Report (0.64)
Workflow (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Add feedback

Semantically Safe Robot Manipulation: From Semantic Scene Understanding to Motion Safeguards

Brunke, Lukas, Zhang, Yanni, Römer, Ralf, Naimer, Jack, Staykov, Nikola, Zhou, Siqi, Schoellig, Angela P.

arXiv.org Artificial IntelligenceOct-19-2024

Ensuring safe interactions in human-centric environments requires robots to understand and adhere to constraints recognized by humans as "common sense" (e.g., "moving a cup of water above a laptop is unsafe as the water may spill" or "rotating a cup of water is unsafe as it can lead to pouring its content"). Recent advances in computer vision and machine learning have enabled robots to acquire a semantic understanding of and reason about their operating environments. While extensive literature on safe robot decision-making exists, semantic understanding is rarely integrated into these formulations. In this work, we propose a semantic safety filter framework to certify robot inputs with respect to semantically defined constraints (e.g., unsafe spatial relationships, behaviours, and poses) and geometrically defined constraints (e.g., environment-collision and self-collision constraints). In our proposed approach, given perception inputs, we build a semantic map of the 3D environment and leverage the contextual reasoning capabilities of large language models to infer semantically unsafe conditions. These semantically unsafe conditions are then mapped to safe actions through a control barrier certification formulation. We evaluated our semantic safety filter approach in teleoperated tabletop manipulation tasks and pick-and-place tasks, demonstrating its effectiveness in incorporating semantic constraints to ensure safe robot operation beyond collision avoidance.

constraint, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2410.15185

Country:

North America > Canada (0.29)
Europe (0.28)

Genre: Research Report (0.64)

Industry: Energy > Oil & Gas (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.65)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.54)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.54)

Add feedback