AITopics

1407.0286

Country:

North America > United States (0.45)
North America > Canada (0.28)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.55)

Giorlandino, Alessio, Goldt, Sebastian, Maillard, Antoine

Factual recall in linear associative memories: sharp asymptotics and mechanistic insights

arXiv.org Machine LearningMay-12-2026

Large language models demonstrate remarkable ability in factual recall, yet the fundamental limits of storing and retrieving input--output associations with neural networks remain unclear. We study these limits in a minimal setting: a linear associative memory that maps $p$ input embeddings in $\mathbb{R}^d$ to their corresponding~$d$-dimensional targets via a single layer, requiring each mapped input to be well separated from all other targets. Unlike in supervised classification, this strict separation induces~$p$ constraints per association and produces strong correlations between constraints that make a direct characterisation of the storage capacity difficult. Here, we provide a precise characterisation of this capacity in the following way. We first introduce a decoupled model in which each input has its own independent set of competing outputs, and provide numerical and analytical evidence that this decoupled model is equivalent to the original model in terms of storage capacity, spectra of the learnt weights, and storage mechanism. Using tools from statistical physics, we show that the decoupled model can store up to $p_c \log p_c / d^2 = 1 / 2$ associations, and generalise the computation of $p_c$ to linear two-layer architectures. Our analysis also gives mechanistic insight into how the optimal solution improves over a naïve Hebbian learning rule: rather than boosting input-output alignments with broad fluctuations, the optimal solution raises the correct scores just above the extreme-value threshold set by the competing outputs. These findings give a sharp statistical-physics characterisation of factual storage in linear networks and provide a baseline for understanding the memory capacity of more realistic neural architectures.

cit, machine learning, natural language, (19 more...)

2605.10795

Country: Europe (0.92)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Artificial IntelligenceDec-10-2025

Sparse Variable Projection in Robotic Perception: Exploiting Separable Structure for Efficient Nonlinear Optimization

Papalia, Alan, Sanderson, Nikolas, Han, Haoyu, Yang, Heng, Singh, Hanumant, Everett, Michael

Robotic perception often requires solving large nonlinear least-squares (NLS) problems. While sparsity has been well-exploited to scale solvers, a complementary and underexploited structure is \emph{separability} -- where some variables (e.g., visual landmarks) appear linearly in the residuals and, for any estimate of the remaining variables (e.g., poses), have a closed-form solution. Variable projection (VarPro) methods are a family of techniques that exploit this structure by analytically eliminating the linear variables and presenting a reduced problem in the remaining variables that has favorable properties. However, VarPro has seen limited use in robotic perception; a major challenge arises from gauge symmetries (e.g., cost invariance to global shifts and rotations), which are common in perception and induce specific computational challenges in standard VarPro approaches. We present a VarPro scheme designed for problems with gauge symmetries that jointly exploits separability and sparsity. Our method can be applied as a one-time preprocessing step to construct a \emph{matrix-free Schur complement operator}. This operator allows efficient evaluation of costs, gradients, and Hessian-vector products of the reduced problem and readily integrates with standard iterative NLS solvers. We provide precise conditions under which our method applies, and describe extensions when these conditions are only partially met. Across synthetic and real benchmarks in SLAM, SNL, and SfM, our approach achieves up to \textbf{2$\times$--35$\times$ faster runtimes} than state-of-the-art methods while maintaining accuracy. We release an open-source C++ implementation and all datasets from our experiments.

artificial intelligence, matrix, optimization problem, (17 more...)

2512.07969

Country: North America > United States (0.68)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

arXiv.org Artificial IntelligenceNov-25-2025

Don't Take the Premise for Granted: Evaluating the Premise Critique Ability of Large Language Models

Li, Jinzhe, Li, Gengxu, Chang, Yi, Wu, Yuan

Large language models (LLMs) have witnessed rapid advancements, demonstrating remarkable capabilities. However, a notable vulnerability persists: LLMs often uncritically accept flawed or contradictory premises, leading to inefficient reasoning and unreliable outputs. This emphasizes the significance of possessing the \textbf{Premise Critique Ability} for LLMs, defined as the capacity to proactively identify and articulate errors in input premises. Most existing studies assess LLMs' reasoning ability in ideal settings, largely ignoring their vulnerabilities when faced with flawed premises. Thus, we introduce the \textbf{Premise Critique Bench (PCBench)}, designed by incorporating four error types across three difficulty levels, paired with multi-faceted evaluation metrics. We conducted systematic evaluations of 15 representative LLMs. Our findings reveal: (1) Most models rely heavily on explicit prompts to detect errors, with limited autonomous critique; (2) Premise critique ability depends on question difficulty and error type, with direct contradictions being easier to detect than complex or procedural errors; (3) Reasoning ability does not consistently correlate with the premise critique ability; (4) Flawed premises trigger overthinking in reasoning models, markedly lengthening responses due to repeated attempts at resolving conflicts. These insights underscore the urgent need to enhance LLMs' proactive evaluation of input validity, positioning premise critique as a foundational capability for developing reliable, human-centric systems. The code is available at https://github.com/MLGroupJLU/Premise_Critique.

arxiv preprint arxiv, large language model, machine learning, (15 more...)

2505.23715

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceNov-4-2025

Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on Elementary School-Level Reasoning Problems?

Yan, Kai, Xu, Yufei, Du, Zhengyin, Yao, Xuesong, Wang, Zheyu, Guo, Xiaowen, Chen, Jiecao

The rapid escalation from elementary school-level to frontier problems of the difficulty for LLM benchmarks in recent years have weaved a miracle for researchers that we are only inches away from surpassing human intelligence. However, is the LLMs' remarkable reasoning ability indeed comes from true intelligence by human standards, or are they simply reciting solutions witnessed during training at an Internet level? To study this problem, we propose RoR-Bench, a novel, multi-modal benchmark for detecting LLM's recitation behavior when asked simple reasoning problems but with conditions subtly shifted, and conduct empirical analysis on our benchmark. Surprisingly, we found existing cutting-edge LLMs unanimously exhibits extremely severe recitation behavior; by changing one phrase in the condition, top models such as OpenAI-o1 and DeepSeek-R1 can suffer 60 percent performance loss on elementary school-level arithmetic and reasoning problems. Such findings are a wake-up call to the LLM community that compels us to re-evaluate the true intelligence level of cutting-edge LLMs.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

2504.00509

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.68)

Industry: Education > Educational Setting > K-12 Education (0.81)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningNov-4-2025

A Saddle Point Remedy: Power of Variable Elimination in Non-convex Optimization

Gan, Min, Chen, Guang-Yong, Yi, Yang, Yang, Lin

The proliferation of saddle points, rather than poor local minima, is increasingly understood to be a primary obstacle in large-scale non-convex optimization for machine learning. Variable elimination algorithms, like Variable Projection (VarPro), have long been observed to exhibit superior convergence and robustness in practice, yet a principled understanding of why they so effectively navigate these complex energy landscapes has remained elusive. In this work, we provide a rigorous geometric explanation by comparing the optimization landscapes of the original and reduced formulations. Through a rigorous analysis based on Hessian inertia and the Schur complement, we prove that variable elimination fundamentally reshapes the critical point structure of the objective function, revealing that local maxima in the reduced landscape are created from, and correspond directly to, saddle points in the original formulation. Our findings are illustrated on the canonical problem of non-convex matrix factorization, visualized directly on two-parameter neural networks, and finally validated in training deep Residual Networks, where our approach yields dramatic improvements in stability and convergence to superior minima. This work goes beyond explaining an existing method; it establishes landscape simplification via saddle point transformation as a powerful principle that can guide the design of a new generation of more robust and efficient optimization algorithms.

artificial intelligence, machine learning, saddle point, (18 more...)

2511.01234

Country:

Asia > China (0.46)
North America (0.46)
Europe (0.46)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Neural Information Processing SystemsOct-3-2025, 07:38:38 GMT

2291d2ec3b3048d1a6f86c2c4591b7e0-Reviews.html

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. The goal of this work is to automatically discover latent domains in a training set, which is subsequently used in a domain adaptation framework to yield improved classification performance on a test set. The paper defines a function that measures the difference between two feature vectors over a specified kernel. The goal is to partition the data points into domains such that the function is maximized over the set of points across each pair of domains. The problem is formulated as an integer programming problem with two constraints: each point is assigned to exactly one domain and the distribution over class labels in each domain must match the input distribution over the entire point set.

constraint, dataset, latent domain, (14 more...)

Neural Information Processing Systems

Country: North America > United States > Nevada (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Wang, Fangji, Tsiotras, Panagiotis

BaB-prob: Branch and Bound with Preactivation Splitting for Probabilistic Verification of Neural Networks

arXiv.org Machine LearningOct-1-2025

Branch-and-bound with preactivation splitting has been shown highly effective for deterministic verification of neural networks. In this paper, we extend this framework to the probabilistic setting. We propose BaB-prob that iteratively divides the original problem into subproblems by splitting preactivations and leverages linear bounds computed by linear bound propagation to bound the probability for each subproblem. We prove soundness and completeness of BaB-prob for feedforward-ReLU neural networks. Furthermore, we introduce the notion of uncertainty level and design two efficient strategies for preactivation splitting, yielding BaB-prob-ordered and BaB+BaBSR-prob. We evaluate BaB-prob on untrained networks, MNIST and CIFAR-10 models, respectively, and VNN-COMP 2025 benchmarks. Across these settings, our approach consistently outperforms state-of-the-art approaches in medium- to high-dimensional input problems.

equation, preactivation, probability, (17 more...)

2509.25647

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Heidelberg (0.04)

Genre:

Research Report (1.00)
Overview > Innovation (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Artificial IntelligenceAug-6-2025

Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction

Lin, Yong, Tang, Shange, Lyu, Bohan, Yang, Ziran, Chung, Jui-Hui, Zhao, Haoyu, Jiang, Lai, Geng, Yihan, Ge, Jiawei, Sun, Jingruo, Wu, Jiayun, Gesi, Jiri, Lu, Ximing, Acuna, David, Yang, Kaiyu, Lin, Hongzhou, Choi, Yejin, Chen, Danqi, Arora, Sanjeev, Jin, Chi

We introduce Goedel-Prover-V2, a series of open-source language models that set a new state-of-the-art in automated theorem proving. Built on the standard expert iteration and reinforcement learning pipeline, our approach incorporates three key innovations: (1) Scaffolded data synthesis: We generate synthetic tasks of increasing difficulty to train the model to master increasingly complex theorems; (2) Verifier-guided self-correction: We enable the model to iteratively revise its proofs by leveraging feedback from the Lean compiler; (3) Model averaging: We merge model checkpoints to mitigate the decrease in model output diversity in later stages of training. Our small model, Goedel-Prover-V2-8B, reaches 84.6% pass@32 on MiniF2F and outperforms DeepSeek-Prover-V2-671B under the same metric, despite being 80X smaller. Our flagship model, Goedel-Prover-V2-32B, achieves 88.1% on MiniF2F at pass@32 in standard mode and 90.4% in self-correction mode, outperforming prior SOTA by a large margin. Additionally, our flagship model solves 86 problems on PutnamBench at pass@184, securing the first place among open-source models on the leaderboard, surpassing DeepSeek-Prover-V2-671B's record of solving 47 problems by pass@1024 with a significantly smaller model size and compute budget. At the time of its release (July-August 2025), Goedel-Prover-V2 achieves the strongest overall performance among all open-source theorem provers. It also ranks among the top-performing models--including closed-source systems with publicly reported performance--under a constrained test-time compute budget. Our models, code, and data are released at https://github.com/Goedel-LM/Goedel-Prover-V2.

arxiv preprint arxiv, large language model, machine learning, (20 more...)

2508.03613

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.88)

Colle, Vincenzo, Sana, Mohamed, Piovesan, Nicola, De Domenico, Antonio, Ayed, Fadhel, Debbah, Merouane

TeleMath: A Benchmark for Large Language Models in Telecom Mathematical Problem Solving

arXiv.org Artificial IntelligenceJun-13-2025

The increasing adoption of artificial intelligence in telecommunications has raised interest in the capability of Large Language Models (LLMs) to address domain-specific, mathematically intensive tasks. Although recent advancements have improved the performance of LLMs in general mathematical reasoning, their effectiveness within specialized domains, such as signal processing, network optimization, and performance analysis, remains largely unexplored. To address this gap, we introduce TeleMath, the first benchmark dataset specifically designed to evaluate LLM performance in solving mathematical problems with numerical solutions in the telecommunications domain. Comprising 500 question-answer (QnA) pairs, TeleMath covers a wide spectrum of topics in the telecommunications field. This paper outlines the proposed QnAs generation pipeline, starting from a selected seed of problems crafted by Subject Matter Experts. The evaluation of a wide range of open-source LLMs reveals that best performance on TeleMath is achieved by recent models explicitly designed for mathematical or logical reasoning. In contrast, general-purpose models, even those with a large number of parameters, often struggle with these challenges. We have released the dataset and the evaluation code to ease result reproducibility and support future research.

artificial intelligence, large language model, natural language, (16 more...)

2506.10674

Country:

Europe (0.28)
Asia > Middle East > UAE (0.28)

Genre: Research Report (0.82)

Industry: Telecommunications (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)