arithmetic operation
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.92)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- South America > Colombia > Meta Department > Villavicencio (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (3 more...)
- Research Report > Experimental Study (0.93)
- Workflow (0.67)
- Government (0.67)
- Information Technology (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.99)
- (2 more...)
- Europe > Switzerland > Zürich > Zürich (0.04)
- North America > United States > Virginia (0.04)
- North America > United States > Tennessee (0.04)
- (2 more...)
- Workflow (1.00)
- Research Report > Experimental Study (0.92)
- North America > Canada (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > China > Hong Kong (0.04)
- (6 more...)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Czechia > Prague (0.04)
OccamLLM: Fast and Exact Language Model Arithmetic in a Single Step
Language model systems often enable LLMs to generate code for arithmetic operations to achieve accurate calculations. However, this approach compromises speed and security, and fine-tuning risks the language model losing prior capabilities. We propose a framework that enables exact arithmetic in *a single autoregressive step*, providing faster, more secure, and more interpretable LLM systems with arithmetic capabilities. We use the hidden states of a LLM to control a symbolic architecture that performs arithmetic.
Composing Parameter-Efficient Modules with Arithmetic Operation
As an efficient alternative to conventional full fine-tuning, parameter-efficient fine-tuning (PEFT) is becoming the prevailing method to adapt pretrained language models. In PEFT, a lightweight module is learned on each dataset while the underlying pretrained language model remains unchanged, resulting in multiple compact modules representing diverse skills when applied to various domains and tasks. In this paper, we propose to compose these parameter-efficient modules through linear arithmetic operations in the weight space, thereby integrating different module capabilities. Specifically, we first define an addition and negation operator for the module, and then further compose these two basic operators to perform flexible arithmetic. Our approach requires no additional training and enables highly flexible module composition. We apply different arithmetic operations to compose the parameter-efficient modules for (1) distribution generalization, (2) multi-tasking, (3) detoxifying, and (4) domain transfer. Additionally, we extend our approach to detoxify Alpaca-LoRA, the latest instruction-tuned large language model based on LLaMA. Empirical results demonstrate that our approach produces new and effective parameter-efficient modules that significantly outperform existing ones across all settings.
Neural Power Units
Conventional Neural Networks can approximate simple arithmetic operations, but fail to generalize beyond the range of numbers that were seen during training. Neural Arithmetic Units aim to overcome this difficulty, but current arithmetic units are either limited to operate on positive numbers or can only represent a subset of arithmetic operations. We introduce the Neural Power Unit (NPU) that operates on the full domain of real numbers and is capable of learning arbitrary power functions in a single layer. The NPU thus fixes the shortcomings of existing arithmetic units and extends their expressivity. We achieve this by using complex arithmetic without requiring a conversion of the network to complex numbers. A simplification of the unit to the RealNPU yields a highly transparent model. We show that the NPUs outperform their competitors in terms of accuracy and sparsity on artificial arithmetic datasets, and that the RealNPU can discover the governing equations of a dynamical system only from data.
Dissecting the Ledger: Locating and Suppressing "Liar Circuits" in Financial Large Language Models
Large Language Models (LLMs) are increasingly deployed in high-stakes financial domains, yet they suffer from specific, reproducible hallucinations when performing arithmetic operations. Current mitigation strategies often treat the model as a black box. In this work, we propose a mechanistic approach to intrinsic hallucination detection. By applying Causal Tracing to the GPT-2 XL architecture on the ConvFinQA benchmark, we identify a dual-stage mechanism for arithmetic reasoning: a distributed computational scratchpad in middle layers (L12-L30) and a decisive aggregation circuit in late layers (specifically Layer 46). We verify this mechanism via an ablation study, demonstrating that suppressing Layer 46 reduces the model's confidence in hallucinatory outputs by 81.8%. Furthermore, we demonstrate that a linear probe trained on this layer generalizes to unseen financial topics with 98% accuracy, suggesting a universal geometry of arithmetic deception.