AITopics | metamathqa

Collaborating Authors

metamathqa

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

POME: Post Optimization Model Edit via Muon-style Projection

Liu, Yong, Fu, Di, Luo, Yang, Zhu, Zirui, Cheng, Minhao, Hsieh, Cho-Jui, You, Yang

arXiv.org Artificial IntelligenceOct-9-2025

We introduce Post-Optimization Model Edit (POME), a new algorithm that enhances the performance of fine-tuned large language models using only their pretrained and fine-tuned checkpoints, without requiring extra data or further optimization. The core idea is to apply a muon-style projection to $ΔW$, the difference between the fine-tuned and pretrained weights. This projection uses truncated singular value decomposition (SVD) to equalize the influence of dominant update directions and prune small singular values, which often represent noise. As a simple post-processing step, POME is completely decoupled from the training pipeline. It requires zero modifications and imposes no overhead, making it universally compatible with any optimizer or distributed framework. POME delivers consistent gains, boosting average performance by +2.5\% on GSM8K and +1.0\% on code generation. Its broad applicability -- from 7B foundation models to 72B RLHF-instructed models -- establishes it as a practical, zero-cost enhancement for any fine-tuning pipeline. Code is available at https://github.com/NUS-HPC-AI-Lab/POME.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2510.06627

Country: North America > United States > California (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.99)

Add feedback

Context-Free Synthetic Data Mitigates Forgetting

Bansal, Parikshit, Sanghavi, Sujay

arXiv.org Artificial IntelligenceMay-21-2025

Fine-tuning a language model often results in a degradation of its existing performance on other tasks, due to a shift in the model parameters; this phenomenon is often referred to as (catastrophic) forgetting. We are interested in mitigating this, in settings where we only have access to the model weights but no access to its training data/recipe. A natural approach is to penalize the KL divergence between the original model and the new one. Our main realization is that a simple process - which we term context-free generation - allows for an approximate unbiased estimation of this KL divergence. We show that augmenting a fine-tuning dataset with context-free generations mitigates forgetting, in two settings: (a) preserving the zero-shot performance of pretrained-only models, and (b) preserving the reasoning performance of thinking models. We show that contextual synthetic data, and even a portion of the pretraining data, are less effective. We also investigate the effect of choices like generation temperature, data ratios etc. We present our results for OLMo-1B for pretrained-only setting and R1-Distill-Llama-8B for the reasoning setting.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2505.13811

Country:

Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)

Genre: Research Report > New Finding (0.66)

Industry:

Health & Medicine (1.00)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

MathFimer: Enhancing Mathematical Reasoning by Expanding Reasoning Steps through Fill-in-the-Middle Task

Yan, Yuchen, Shen, Yongliang, Liu, Yang, Jiang, Jin, Xu, Xin, Zhang, Mengdi, Shao, Jian, Zhuang, Yueting

arXiv.org Artificial IntelligenceFeb-17-2025

Mathematical reasoning represents a critical frontier in advancing large language models (LLMs). While step-by-step approaches have emerged as the dominant paradigm for mathematical problem-solving in LLMs, the quality of reasoning steps in training data fundamentally constrains the performance of the models. Recent studies has demonstrated that more detailed intermediate steps can enhance model performance, yet existing methods for step expansion either require more powerful external models or incur substantial computational costs. In this paper, we introduce MathFimer, a novel framework for mathematical reasoning step expansion inspired by the "Fill-in-the-middle" task from code completion. By decomposing solution chains into prefix-suffix pairs and training models to reconstruct missing intermediate steps, we develop a specialized model, MathFimer-7B, on our carefully curated NuminaMath-FIM dataset. We then apply these models to enhance existing mathematical reasoning datasets by inserting detailed intermediate steps into their solution chains, creating MathFimer-expanded versions. Through comprehensive experiments on multiple mathematical reasoning datasets, including MathInstruct, MetaMathQA and etc., we demonstrate that models trained on MathFimer-expanded data consistently outperform their counterparts trained on original data across various benchmarks such as GSM8K and MATH. Our approach offers a practical, scalable solution for enhancing mathematical reasoning capabilities in LLMs without relying on powerful external models or expensive inference procedures.

arxiv preprint, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2502.11684

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Soup to go: mitigating forgetting during continual learning with model averaging

Kleiman, Anat, Dziugaite, Gintare Karolina, Frankle, Jonathan, Kakade, Sham, Paul, Mansheej

arXiv.org Artificial IntelligenceJan-9-2025

In continual learning, where task data arrives in a sequence, fine-tuning on later tasks will often lead to performance degradation on earlier tasks. This is especially pronounced when these tasks come from diverse domains. In this setting, how can we mitigate catastrophic forgetting of earlier tasks and retain what the model has learned with minimal computational expenses? Inspired by other merging methods, and L2-regression, we propose Sequential Fine-tuning with Averaging (SFA), a method that merges currently training models with earlier checkpoints during the course of training. SOTA approaches typically maintain a data buffer of past tasks or impose a penalty at each gradient step. In contrast, our method achieves comparable results without the need to store past data, or multiple copies of parameters for each gradient step. Furthermore, our method outperforms common merging techniques such as Task Arithmetic, TIES Merging, and WiSE-FT, as well as other penalty methods like L2 and Elastic Weight Consolidation. In turn, our method offers insight into the benefits of merging partially-trained models during training across both image and language domains.

fine-tuning, metamathqa, sfa, (10 more...)

arXiv.org Artificial Intelligence

2501.05559

Country: Europe > Italy > Tuscany > Florence (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

S$^3$c-Math: Spontaneous Step-level Self-correction Makes Large Language Models Better Mathematical Reasoners

Yan, Yuchen, Jiang, Jin, Liu, Yang, Cao, Yixin, Xu, Xin, zhang, Mengdi, Cai, Xunliang, Shao, Jian

arXiv.org Artificial IntelligenceSep-2-2024

Self-correction is a novel method that can stimulate the potential reasoning abilities of large language models (LLMs). It involves detecting and correcting errors during the inference process when LLMs solve reasoning problems. However, recent works do not regard self-correction as a spontaneous and intrinsic capability of LLMs. Instead, such correction is achieved through post-hoc generation, external knowledge introduction, multi-model collaboration, and similar techniques. In this paper, we propose a series of mathematical LLMs called S$^3$c-Math, which are able to perform Spontaneous Step-level Self-correction for Mathematical reasoning. This capability helps LLMs to recognize whether their ongoing inference tends to contain errors and simultaneously correct these errors to produce a more reliable response. We proposed a method, which employs a step-level sampling approach to construct step-wise self-correction data for achieving such ability. Additionally, we implement a training strategy that uses above constructed data to equip LLMs with spontaneous step-level self-correction capacities. Our data and methods have been demonstrated to be effective across various foundation LLMs, consistently showing significant progress in evaluations on GSM8K, MATH, and other mathematical benchmarks. To the best of our knowledge, we are the first to introduce the spontaneous step-level self-correction ability of LLMs in mathematical reasoning.

dog food, llm, self-correction data, (16 more...)

arXiv.org Artificial Intelligence

2409.01524

Country:

Europe > Middle East > Malta > Eastern Region > Northern Harbour District > St. Julian's (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)
Asia > Singapore (0.04)
(2 more...)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

LoRA Learns Less and Forgets Less

Biderman, Dan, Ortiz, Jose Gonzalez, Portes, Jacob, Paul, Mansheej, Greengard, Philip, Jennings, Connor, King, Daniel, Havens, Sam, Chiley, Vitaliy, Frankle, Jonathan, Blakeney, Cody, Cunningham, John P.

arXiv.org Artificial IntelligenceMay-15-2024

Low-Rank Adaptation (LoRA) is a widely-used parameter-efficient finetuning method for large language models. LoRA saves memory by training only low rank perturbations to selected weight matrices. In this work, we compare the performance of LoRA and full finetuning on two target domains, programming and mathematics. We consider both the instruction finetuning ($\approx$100K prompt-response pairs) and continued pretraining ($\approx$10B unstructured tokens) data regimes. Our results show that, in most settings, LoRA substantially underperforms full finetuning. Nevertheless, LoRA exhibits a desirable form of regularization: it better maintains the base model's performance on tasks outside the target domain. We show that LoRA provides stronger regularization compared to common techniques such as weight decay and dropout; it also helps maintain more diverse generations. We show that full finetuning learns perturbations with a rank that is 10-100X greater than typical LoRA configurations, possibly explaining some of the reported gaps. We conclude by proposing best practices for finetuning with LoRA.

arxiv preprint arxiv, dataset, lora, (15 more...)

arXiv.org Artificial Intelligence

2405.09673

Genre: Research Report > New Finding (0.68)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.67)

Add feedback

Augmenting Math Word Problems via Iterative Question Composing

Liu, Haoxiong, Zhang, Yifan, Luo, Yifan, Yao, Andrew Chi-Chih

arXiv.org Artificial IntelligenceFeb-10-2024

Despite the advancements in large language models (LLMs) for mathematical reasoning, solving competition-level math problems remains a significant challenge, especially for open-source LLMs without external tools. We introduce the MMIQC dataset, comprising a mixture of processed web data and synthetic question-response pairs, aimed at enhancing the mathematical reasoning capabilities of base language models. Models fine-tuned on MMIQC consistently surpass their counterparts in performance on the MATH benchmark across various model sizes. Notably, Qwen-72B-MMIQC achieves a 45.0% accuracy, exceeding the previous open-source state-of-the-art by 8.2% and outperforming the initial version GPT-4 released in 2023. Extensive evaluation results on Hungarian high school finals suggest that such improvement can generalize to unseen data. Our ablation study on MMIQC reveals that a large part of the improvement can be attributed to our novel augmentation method, Iterative Question Composing (IQC), which involves iteratively composing new questions from seed problems using an LLM and applying rejection sampling through another LLM. The MMIQC dataset is available on the HuggingFace hub at https://huggingface.co/datasets/Vivacem/MMIQC. Our code is available at https://github.com/iiis-ai/IterativeQuestionComposing.

augmenting math word problem, dataset, language model, (10 more...)

arXiv.org Artificial Intelligence

2401.09003

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.82)

Industry: Education (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

Yu, Longhui, Jiang, Weisen, Shi, Han, Yu, Jincheng, Liu, Zhengying, Zhang, Yu, Kwok, James T., Li, Zhenguo, Weller, Adrian, Liu, Weiyang

arXiv.org Artificial IntelligenceOct-9-2023

Large language models (LLMs) have pushed the limits of natural language understanding and exhibited excellent problem-solving ability. Despite the great success, most existing open-source LLMs (e.g., LLaMA-2) are still far away from satisfactory for solving mathematical problems due to the complex reasoning procedures. To bridge this gap, we propose MetaMath, a finetuned language model that specializes in mathematical reasoning. Specifically, we start by bootstrapping mathematical questions by rewriting the question from multiple perspectives, which results in a new dataset called MetaMathQA. Experimental results on two popular benchmarks (i.e., GSM8K and MATH) for mathematical reasoning demonstrate that MetaMath outperforms a suite of open-source LLMs by a significant margin. Our MetaMath-7B model achieves 66.5% on GSM8K and 19.8% on MATH, exceeding the state-ofthe-art models of the same size by 11.5% and 8.7%. Particularly, MetaMath-70B achieves an accuracy of 82.3% on GSM8K, slightly better than GPT-3.5-Turbo. We release all the MetaMathQA dataset, the MetaMath models with different model sizes and the training code for public use. What is the total amount that James paid when he purchased 5 packs of beef, each weighing 4 pounds, at a price of $5.50 per pound? James buys x packs of beef that are 4 packs of beef that are 4 pounds each. The price of beef is $5.50 per pound. What is The price of beef is $5.50 per pound. James buys x packs of beef that are 4 pounds each.

dataset, language model, preprint arxiv, (14 more...)

arXiv.org Artificial Intelligence

2309.12284

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback