AITopics

2501.17704

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Colorado (0.04)
North America > United States > California (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Raza, Mohammad, Milic-Frayling, Natasa

Instantiation-based Formalization of Logical Reasoning Tasks using Language Models and Logical Solvers

arXiv.org Artificial IntelligenceJan-28-2025

Robustness of reasoning remains a significant challenge for large language models, and addressing it is essential for the practical applicability of AI-driven reasoning systems. We introduce Semantic Self-Verification (SSV), a novel approach that addresses the key challenge in combining language models with the rigor of logical solvers: to accurately formulate the reasoning problem from natural language to the formal language of the solver. SSV uses a consistency-based approach to produce strong abstract formalizations of problems using concrete instantiations that are generated by the model and verified by the solver. In addition to significantly advancing the overall reasoning accuracy over the state-of-the-art, a key novelty that this approach presents is a feature of verification that has near-perfect precision over a significant coverage of cases, as we demonstrate on open reasoning benchmarks. We propose such *near-certain reasoning* as a new approach to reduce the need for manual verification in many cases, taking us closer to more dependable and autonomous AI reasoning systems.

artificial intelligence, logic & formal reasoning, natural language, (17 more...)

2501.16961

Country:

Asia > Indonesia > Bali (0.04)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Indiana > Lake County > Gary (0.04)
(2 more...)

Genre: Research Report > Promising Solution (0.48)

Industry: Education (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

arXiv.org Artificial IntelligenceJan-28-2025

Rethinking External Slow-Thinking: From Snowball Errors to Probability of Correct Reasoning

Gan, Zeyu, Liao, Yun, Liu, Yong

Test-time scaling, which is also often referred to as slow-thinking, has been demonstrated to enhance multi-step reasoning in large language models (LLMs). However, despite its widespread utilization, the mechanisms underlying slow-thinking methods remain poorly understood. This paper explores the mechanisms of external slow-thinking from a theoretical standpoint. We begin by examining the snowball error effect within the LLM reasoning process and connect it to the likelihood of correct reasoning using information theory. Building on this, we show that external slow-thinking methods can be interpreted as strategies to mitigate the error probability. We further provide a comparative analysis of popular external slow-thinking approaches, ranging from simple to complex, highlighting their differences and interrelationships. Our findings suggest that the efficacy of these methods is not primarily determined by the specific framework employed, and that expanding the search scope or the model's internal reasoning capacity may yield more sustained improvements in the long term. We open-source our code at https://github.com/ZyGan1999/Snowball-Errors-and-Probability.

large language model, machine learning, natural language, (16 more...)

2501.15602

Country:

Asia > China > Tianjin Province > Tianjin (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)
Asia > Singapore (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

arXiv.org Artificial IntelligenceJan-28-2025

PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding

Chow, Wei, Mao, Jiageng, Li, Boyi, Seita, Daniel, Guizilini, Vitor, Wang, Yue

Understanding the physical world is a fundamental challenge in embodied AI, critical for enabling agents to perform complex tasks and operate safely in real-world environments. While Vision-Language Models (VLMs) have shown great promise in reasoning and task planning for embodied agents, their ability to comprehend physical phenomena remains extremely limited. To close this gap, we introduce PhysBench, a comprehensive benchmark designed to evaluate VLMs' physical world understanding capability across a diverse set of tasks. PhysBench contains 10,002 entries of interleaved video-image-text data, categorized into four major domains: physical object properties, physical object relationships, physical scene understanding, and physics-based dynamics, further divided into 19 subclasses and 8 distinct capability dimensions. Our extensive experiments, conducted on 75 representative VLMs, reveal that while these models excel in common-sense reasoning, they struggle with understanding the physical world -- likely due to the absence of physical knowledge in their training data and the lack of embedded physical priors. To tackle the shortfall, we introduce PhysAgent, a novel framework that combines the generalization strengths of VLMs with the specialized expertise of vision models, significantly enhancing VLMs' physical understanding across a variety of tasks, including an 18.4\% improvement on GPT-4o. Furthermore, our results demonstrate that enhancing VLMs' physical world understanding capabilities can help embodied agents such as MOKA. We believe that PhysBench and PhysAgent offer valuable insights and contribute to bridging the gap between VLMs and physical world understanding.

ieee cvf international conference, large language model, machine learning, (19 more...)

2501.16411

Country:

North America > United States > California (0.13)
North America > United States > New York > New York County > New York City (0.04)
Europe > Netherlands > South Holland > Delft (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment (1.00)
Information Technology (1.00)
Education > Educational Setting (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(3 more...)

Neural Information Processing SystemsJan-27-2025, 14:18:03 GMT

Reviews: Modeling Conceptual Understanding in Image Reference Games

Summary --- Consider a speaker agent and many listeners where listeners perceive differently (e.g., some know what cat furr looks like and others don't). This paper proposes an image reference game and develops a speaker that performs better at the reference game by modeling listener abilities. For example, one person might be able to visually classify many specific dog breeds wheras another person might not know anything about what dogs look like. The speaker utters image attributes which the listener uses to distinguish between the two images. Reference Game Flow: There are two stages of interaction analogous to meta-learning setups: practice and evaluation.

agent, image reference game, listener, (12 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.81)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.65)
Information Technology > Artificial Intelligence > Robots (0.51)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.40)

Neural Information Processing SystemsJan-27-2025, 14:17:52 GMT

Reviews: Modeling Conceptual Understanding in Image Reference Games

Reviewers all voted to accept this submission and had their concerns generally addressed by the rebuttal. They were impressed by the clarity of the experimental setting and empirical results.

image reference game

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.40)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.40)

Neural Information Processing SystemsJan-27-2025, 02:41:15 GMT

Reviews: Bayesian Optimization with Unknown Search Space

Applying Bayesian optimization to expensive black-box problems needs to specify the bound of search space. However, when tackling a completely new problem, there is no prior knowledge to guarantee that the specified search space contains the global optimum. The paper proposes an approach to deal with this situation. In the approach, the user first specifies an initial search space; then the bound of search space automatically expands as the iteration proceeds; finally the algorithm will return a solution achieving \epsilon-accuracy. The key is how to expand the search space.

bayesian optimization, search space, unknown search space, (3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Neural Information Processing SystemsJan-27-2025, 02:41:04 GMT

Reviews: Bayesian Optimization with Unknown Search Space

This paper proposes an algorithm to expand the search space for Bayesian optimization. The reviewers thought the work tackles an important problem and would be of interest to the community. The claims are well supported by empirical evidence and the paper is clearly written. There were concerns about the practicality of the method and that the work is a combination of well-known techniques. Because the paper presents a relatively novel approach and substantiates the claims with strong supporting evidence, it seems to be above the bar of acceptance.

bayesian optimization, unknown search space

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.71)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.71)

Kirchweger, Markus, Xia, Hai, Peitl, Tomáš, Szeider, Stefan

Smart Cubing for Graph Search: A Comparative Study

arXiv.org Artificial IntelligenceJan-27-2025

Propositional satisfiability (SAT) solvers based on conflict-driven clause learning can solve huge instances with millions of variables and clauses [Fichte et al., 2023a]. However, for hard instances, particularly in combinatorial problems, parallelization becomes necessary. The cube-and-conquer technique has proven highly effective for such problems, most notably in resolving the Pythagorean triples conjecture [Heule et al., 2016]. In cube-and-conquer, a look-ahead solver first partitions the search space into disjoint subproblems via cubes (partial assignments), which are then solved independently by CDCL solvers. This independence enables efficient parallel solving. When encoding combinatorial problems into SAT, particularly those involving graphs, we often encounter highly symmetric search spaces. Many mutually isomorphic graphs satisfy the same constraints, but a solver needs to check only one representative, the canonical element, from each isomorphism class. Standard CDCL solvers cannot leverage these symmetries, and static symmetry breaking methods cannot break all symmetries [Codish et al., 2019]. SAT Modulo Symmetries (SMS) [Kirchweger and Szeider, 2021; Kirchweger and Szeider, 2024] addresses this limitation through dynamic symmetry breaking, using a custom propagator that learns symmetry-breaking predicates during the search.

artificial intelligence, graph, machine learning, (18 more...)

2501.17201

Country:

Europe > Austria > Vienna (0.14)
Europe > Italy (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(8 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Vujinovic, Aleksandar, Kovacevic, Aleksandar

ACT-JEPA: Joint-Embedding Predictive Architecture Improves Policy Representation Learning

arXiv.org Artificial IntelligenceJan-27-2025

Learning efficient representations for decision-making policies is a challenge in imitation learning (IL). Current IL methods require expert demonstrations, which are expensive to collect. Consequently, they often have underdeveloped world models. Self-supervised learning (SSL) offers an alternative by allowing models to learn from diverse, unlabeled data, including failures. However, SSL methods often operate in raw input space, making them inefficient. In this work, we propose ACT-JEPA, a novel architecture that integrates IL and SSL to enhance policy representations. We train a policy to predict (1) action sequences and (2) abstract observation sequences. The first objective uses action chunking to improve action prediction and reduce compounding errors. The second objective extends this idea of chunking by predicting abstract observation sequences. We utilize Joint-Embedding Predictive Architecture to predict in abstract representation space, allowing the model to filter out irrelevant details, improve efficiency, and develop a robust world model. Our experiments show that ACT-JEPA improves the quality of representations by learning temporal environment dynamics. Additionally, the model's ability to predict abstract observation sequences results in representations that effectively generalize to action sequence prediction. ACT-JEPA performs on par with established baselines across a range of decision-making tasks.

artificial intelligence, machine learning, representation, (17 more...)

2501.14622

Country:

Europe > Serbia > Vojvodina > South Bačka District > Novi Sad (0.04)
Asia > Japan > Honshū > Kantō > Kanagawa Prefecture > Yokohama (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.78)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)