Goto

Collaborating Authors

 subtraction


Neural Arithmetic Logic Units

Andrew Trask, Felix Hill, Scott E. Reed, Jack Rae, Chris Dyer, Phil Blunsom

Neural Information Processing Systems

Specifically,one frequently observes failures when quantities that lie outside the numerical range used during training are encountered at test time, even when the target functionissimple (e.g., itdepends only onaggregating counts orlinear extrapolation). This failure patternindicates that the learned behavior is better characterized by memorization than by systematic abstraction.


Appendix: Representing Hyperbolic Space Accurately using Multi-Component Floats

Neural Information Processing Systems

Renormalize algorithm to reduce the number of components.Algorithm 4: Scale-Expansion, modified from [4] Input: m-components expansion (a More importantly, we show in Alg. At the start of the training, we train models with an initial "burn-in" phase We mention an interesting tuning result here, take the training of the halfspace model over the WordNet Mammal for example, we varies the learning rates for different batchsize as shown in Table. 1. We found that, if trained with a larger batchsize, when the learning rate is adjusted (increased) properly, the embedding performance of the converged model with a large batchsize can nearly match the best performance of the converged model with a smaller batchsize.


Can LLMs subtract numbers?

Jobanputra, Mayank, Walter, Nils Philipp, Mehta, Maitrey, Veseli, Blerta, Chapple, Evan Parker Kelly, Wang, Yifan, Chetani, Sneha, Pavlick, Ellie, Vergari, Antonio, Demberg, Vera

arXiv.org Artificial Intelligence

We present a systematic study of subtraction in large language models (LLMs). While prior benchmarks emphasize addition and multiplication, subtraction has received comparatively little attention despite being structurally distinct as a non-commutative operation. We evaluate eight pretrained LLMs spanning four families on addition and subtraction problems. Our experiments reveal that subtraction accuracy lags behind addition by a wide margin. We find that the errors for ($a-b$) are concentrated in cases where ($a


Pre-trained Language Models Learn Remarkably Accurate Representations of Numbers

Kadlčík, Marek, Štefánik, Michal, Mickus, Timothee, Spiegel, Michal, Kuchař, Josef

arXiv.org Artificial Intelligence

Pretrained language models (LMs) are prone to arithmetic errors. Existing work showed limited success in probing numeric values from models' representations, indicating that these errors can be attributed to the inherent unreliability of distributionally learned embeddings in representing exact quantities. However, we observe that previous probing methods are inadequate for the emergent structure of learned number embeddings with sinusoidal patterns. In response, we propose a novel probing technique that decodes numeric values from input embeddings with near-perfect accuracy across a range of open-source LMs. This proves that after the sole pre-training, LMs represent numbers with remarkable precision. Finally, we find that the embeddings' precision, judged by our probe's accuracy, explains a large portion of LM's errors in elementary arithmetic, and show that aligning the embeddings with the pattern our probes discover can mitigate these errors.



Reveal and Release: Iterative LLM Unlearning with Self-generated Data

Xie, Linxi, Teng, Xin, Ke, Shichang, Wen, Hongyi, Wang, Shengjie

arXiv.org Artificial Intelligence

Large language model (LLM) unlearning has demonstrated effectiveness in removing the influence of undesirable data (also known as forget data). Existing approaches typically assume full access to the forget dataset, overlooking two key challenges: (1) Forget data is often privacy-sensitive, rare, or legally regulated, making it expensive or impractical to obtain (2) The distribution of available forget data may not align with how that information is represented within the model. To address these limitations, we propose a ``Reveal-and-Release'' method to unlearn with self-generated data, where we prompt the model to reveal what it knows using optimized instructions. To fully utilize the self-generated forget data, we propose an iterative unlearning framework, where we make incremental adjustments to the model's weight space with parameter-efficient modules trained on the forget data. Experimental results demonstrate that our method balances the tradeoff between forget quality and utility preservation.


The Domain Mixed Unit: A New Neural Arithmetic Layer

Curry, Paul

arXiv.org Artificial Intelligence

The Domain Mixed Unit (DMU) is a new neural arithmetic unit that learns a single parameter gate G that mixes a state between log-space and linear-space representations while performing either addition (DMU add) or subtraction (DMU sub) in said space. These are the two initializations proposed for the DMU: one covering addition and multiplication, and another covering subtraction and division. The DMU achieves state-of-the-art performance on the NALM Benchmark, a dataset designed to test the ability of neural arithmetic units to generalize arithmetic operations, specifically performing with the highest percentage solved over all seeds on multiplication and division. Neural Arithmetic Units (NAUs) are specialized sub-units or networks designed to interpretably represent arithmetic operations while maintaining differentiability, allowing gradients to flow through them during training. These units can be integrated into larger neural architectures to provide explicit arithmetic capabilities.


State Algebra for Propositional Logic

Lesnik, Dmitry, Schäfer, Tobias

arXiv.org Artificial Intelligence

This paper presents State Algebra, a novel framework designed to represent and manipulate propositional logic using algebraic methods. The framework is structured as a hierarchy of three representations: Set, Coordinate, and Row Decomposition. These representations anchor the system in well-known semantics while facilitating the computation using a powerful algebraic engine. A key aspect of State Algebra is its flexibility in representation. We show that although the default reduction of a state vector is not canonical, a unique canonical form can be obtained by applying a fixed variable order during the reduction process. This highlights a trade-off: by foregoing guaranteed canonicity, the framework gains increased flexibility, potentially leading to more compact representations of certain classes of problems. We explore how this framework provides tools to articulate both search-based and knowledge compilation algorithms and discuss its natural extension to probabilistic logic and Weighted Model Counting.


Addition in Four Movements: Mapping Layer-wise Information Trajectories in LLMs

Yan, Yao

arXiv.org Artificial Intelligence

Multi-digit addition is a clear probe of the computational power of large language models. To dissect the internal arithmetic processes in LLaMA-3-8B-Instruct, we combine linear probing with logit-lens inspection. Inspired by the step-by-step manner in which humans perform addition, we propose and analyze a coherent four-stage trajectory in the forward pass:Formula-structure representations become linearly decodable first, while the answer token is still far down the candidate list.Core computational features then emerge prominently.At deeper activation layers, numerical abstractions of the result become clearer, enabling near-perfect detection and decoding of the individual digits in the sum.Near the output, the model organizes and generates the final content, with the correct token reliably occupying the top rank.This trajectory suggests a hierarchical process that favors internal computation over rote memorization. We release our code and data to facilitate reproducibility.