Goto

Collaborating Authors

 midpoint



Reversing Large Language Models for Efficient Training and Fine-Tuning

Gal, Eshed, Eliasof, Moshe, Turek, Javier, Ascher, Uri, Treister, Eran, Haber, Eldad

arXiv.org Artificial Intelligence

Large Language Models (LLMs) are known for their expensive and time-consuming training. Thus, oftentimes, LLMs are fine-tuned to address a specific task, given the pretrained weights of a pre-trained LLM considered a foundation model. In this work, we introduce memory-efficient, reversible architectures for LLMs, inspired by symmetric and symplectic differential equations, and investigate their theoretical properties. Different from standard, baseline architectures that store all intermediate activations, the proposed models use time-reversible dynamics to retrieve hidden states during backpropagation, relieving the need to store activations. This property allows for a drastic reduction in memory consumption, allowing for the processing of larger batch sizes for the same available memory, thereby offering improved throughput. In addition, we propose an efficient method for converting existing, non-reversible LLMs into reversible architectures through fine-tuning, rendering our approach practical for exploiting existing pre-trained models. Our results show comparable or improved performance on several datasets and benchmarks, on several LLMs, building a scalable and efficient path towards reducing the memory and computational costs associated with both training from scratch and fine-tuning of LLMs.


The Unified Cognitive Consciousness Theory for Language Models: Anchoring Semantics, Thresholds of Activation, and Emergent Reasoning

Chang, Edward Y., Kaya, Zeyneb N., Chang, Ethan

arXiv.org Artificial Intelligence

We propose semantic anchoring, a unified account of how large language models turn pretrained capacity into goal-directed behavior: external structure (in-context examples, retrieval, or light tuning) binds the model's latent patterns to desired targets. Unified Contextual Control Theory (UCCT) formalizes this via anchoring strength $S = ρ_d - d_r - \log k$, where $ρ_d$ measures target cohesion in representation space, $d_r$ measures mismatch from prior knowledge, and $k$ is the anchor budget. UCCT predicts threshold-like performance flips and strictly generalizes in-context learning, reading retrieval and fine-tuning as anchoring variants. Three controlled studies provide evidence. Experiment 1 demonstrates cross-domain anchoring rebinding strong priors in text and vision. Experiment 2 varies representational familiarity via numeral bases (base-10/8/9) at fixed complexity, yielding ordered thresholds and transfer patterns tracking $ρ_d$, $d_r$, and $S$. Experiment 3 establishes a geometry-to-behavior correlate: layer-wise peak anchoring and trajectory area predict few-shot thresholds $θ_{50}$. UCCT offers testable theory and practical metrics for optimizing prompts, retrieval, and tuning.


Gold-Medal-Level Olympiad Geometry Solving with Efficient Heuristic Auxiliary Constructions

Duan, Boyan, Liang, Xiao, Lu, Shuai, Wang, Yaoxiang, Shen, Yelong, Chang, Kai-Wei, Wu, Ying Nian, Yang, Mao, Chen, Weizhu, Gong, Yeyun

arXiv.org Artificial Intelligence

Automated theorem proving in Euclidean geometry, particularly for International Mathematical Olympiad (IMO) level problems, remains a major challenge and an important research focus in Artificial Intelligence. In this paper, we present a highly efficient method for geometry theorem proving that runs entirely on CPUs without relying on neural network-based inference. Our initial study shows that a simple random strategy for adding auxiliary points can achieve silver-medal level human performance on IMO. Building on this, we propose HAGeo, a Heuristic-based method for adding Auxiliary constructions in Geometric deduction that solves 28 of 30 problems on the IMO-30 benchmark, achieving gold-medal level performance and surpassing AlphaGeometry, a competitive neural network-based approach, by a notable margin. To evaluate our method and existing approaches more comprehensively, we further construct HAGeo-409, a benchmark consisting of 409 geometry problems with human-assessed difficulty levels. Compared with the widely used IMO-30, our benchmark poses greater challenges and provides a more precise evaluation, setting a higher bar for geometry theorem proving.



Automated proving in planar geometry based on the complex number identity method and elimination

Kovács, Zoltán, Peng, Xicheng

arXiv.org Artificial Intelligence

We improve the complex number identity proving method to a fully automated procedure, based on elimination ideals. By using declarative equations or rewriting each real-relational hypothesis $h_i$ to $h_i-r_i$, and the thesis $t$ to $t-r$, clearing the denominators and introducing an extra expression with a slack variable, we eliminate all free and relational point variables. From the obtained ideal $I$ in $\mathbb{Q}[r,r_1,r_2,\ldots]$ we can find a conclusive result. It plays an important role that if $r_1,r_2,\ldots$ are real, $r$ must also be real if there is a linear polynomial $p(r)\in I$, unless division by zero occurs when expressing $r$. Our results are presented in Mathematica, Maple and in a new version of the Giac computer algebra system. Finally, we present a prototype of the automated procedure in an experimental version of the dynamic geometry software GeoGebra.


Supplementary Materials Roadmap

Neural Information Processing Systems

In this supplementary material, we provide "full versions" of Sections 2-4 from the main submission, Fact 2.5 (Uniform bound on entries of Gaussian vector) . For g N (0, Id), g/ nullg null is identical in distribution to v . We can expand the expectation and apply Fact B.2 to get We will also need the following stability result for affine linear thresholds. Putting all of these ingredients together, we can now complete the proof of the main Lemma B.1 of By Lemma B.6 applied to the projection of f to the two-dimensional This notion is motivated by Lemma 4.4 in Section C.1 where we study the critical points We first collect some elementary consequences of closeness. Suppose < 2/ 2. If ( v In the rest of the paper we will take to be small, so Lemma 3.3 will always apply.



GeoThought: A Dataset for Enhancing Mathematical Geometry Reasoning in Vision-Language Models

Shi, Nannan, Qin, Chuanyu, Song, Shipeng, Luo, Man

arXiv.org Artificial Intelligence

Large language models (LLMs) have demonstrated strong reasoning capabilities in text-based mathematical problem solving; however, when adapted to visual reasoning tasks, particularly geometric problem solving, their performance substantially declines because geometric problems present unique challenges. Specifically, these challenges stem from two key factors: first, the intrinsic complexity of geometry requiring detailed image comprehension and multi-step reasoning, and second, the limitations of existing datasets which lack sufficient scale, diversity, and explicit reasoning traces, consequently hindering effective model training. To address these challenges, we developed the GeoThoughts dataset, a comprehensive geometric reasoning corpus with two subsets: Geo-Thought-6K with 6,243 samples and its augmented version Geo-Thought-Augmented-10K containing 10,834 samples. Each entry includes visual descriptions, step-by-step solutions, explicit reasoning chains, reflection steps, and final answers. Using this dataset, we developed GeoThought-MLLM, a mathematical reasoning multimodal model that generates detailed thinking processes during problem-solving. Our model outperforms existing benchmarks in geometric tasks, demonstrating that training with our Chain-of-Thought dataset improves geometric reasoning capabilities across both in-domain and out-of-domain settings. Finally, we analyze failure cases and observe that errors primarily arise from incorrect interpretation of mathematical concepts or spatial misjudgment. By invoking CoT to correct these mistakes, the model produces correct answers.