AITopics | math word problem

Collaborating Authors

math word problem

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Chain-of-ThoughtPromptingElicits Reasoning inLargeLanguageModels

Neural Information Processing SystemsFeb-11-2026, 00:53:37 GMT

The empirical gains can be striking.

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)

Add feedback

522ef98b1e52f5918e5abc868651175d-Paper-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 23:42:35 GMT

computational linguistic, elastic, reasoning program, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Denmark > Capital Region > Copenhagen (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(9 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Structured Reasoning with Tree-of-Thoughts for Bengali Math Word Problems

Mahmood, Aurprita, alam, Sabrin, Sagor, Neloy kumer, Hadi, Md. Abdul, Islam, Md. Sehab Al, Islam, Minhajul

arXiv.org Artificial IntelligenceDec-8-2025

Mathematical Word Problems (MWPs) are among the most challenging tasks in natural language processing because they require both linguistic understanding and multi-step numerical reasoning. While Chain-of-Thought (CoT) prompting has shown promise, its linear structure often propagates errors, limiting overall effectiveness. To address this limitation, we present the a systematic study of Tree-of-Thought (ToT) reasoning for Bengali MWPs using the SOMADHAN dataset. Owing to computational and token-cost constraints, we evaluate a curated set of 100 representative problems across multiple large language models (LLMs), including GPT-OSS and LLaMA variants, under standard prompting, CoT, and ToT strategies. Our results show that CoT improves baseline accuracy from 78% (standard prompting) to 83% on average, while ToT further increases performance by up to 5 percentage points, achieving 88% accuracy with GPT-OSS-120B. These improvements highlight that ToT is particularly effective in medium-to-large-scale models but may offer less advantage for smaller ones. Overall, our findings establish ToT as a robust framework for solving mathematical problems in low-resource languages such as Bengali. More broadly, this study shows that structured reasoning methods like ToT can provide more reliable and globally consistent outcomes than CoT, paving the way for better reasoning strategies in multilingual NLP.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2512.0558

Country: Asia > Bangladesh (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.73)

Add feedback

Mathematics Isn't Culture-Free: Probing Cultural Gaps via Entity and Scenario Perturbations

Tomar, Aditya, Sahoo, Nihar Ranjan, Mittal, Ashish, Murthy, Rudra, Bhattacharyya, Pushpak

arXiv.org Artificial IntelligenceNov-3-2025

Although mathematics is often considered culturally neutral, the way mathematical problems are presented can carry implicit cultural context. Existing benchmarks like GSM8K are predominantly rooted in Western norms, including names, currencies, and everyday scenarios. In this work, we create culturally adapted variants of the GSM8K test set for five regions Africa, India, China, Korea, and Japan using prompt-based transformations followed by manual verification. We evaluate six large language models (LLMs), ranging from 8B to 72B parameters, across five prompting strategies to assess their robustness to cultural variation in math problem presentation. Our findings reveal a consistent performance gap: models perform best on the original US-centric dataset and comparatively worse on culturally adapted versions. However, models with reasoning capabilities are more resilient to these shifts, suggesting that deeper reasoning helps bridge cultural presentation gaps in mathematical tasks

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2507.00883

Country:

Africa (1.00)
North America (0.69)
Asia > India (0.25)
(2 more...)

Genre: Research Report > New Finding (0.88)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

Foundation of Intelligence: Review of Math Word Problems from Human Cognition Perspective

Huang, Zhenya, Liu, Jiayu, Lin, Xin, Ma, Zhiyuan, Xue, Shangzi, Xiao, Tong, Liu, Qi, Teh, Yee Whye, Chen, Enhong

arXiv.org Artificial IntelligenceOct-28-2025

Math word problem (MWP) serves as a fundamental research topic in artificial intelligence (AI) dating back to 1960s. This research aims to advance the reasoning abilities of AI by mirroring the human-like cognitive intelligence. The mainstream technological paradigm has evolved from the early rule-based methods, to deep learning models, and is rapidly advancing towards large language models. However, the field still lacks a systematic taxonomy for the MWP survey along with a discussion of current development trends. Therefore, in this paper, we aim to comprehensively review related research in MWP solving through the lens of human cognition, to demonstrate how recent AI models are advancing in simulating human cognitive abilities. Specifically, we summarize 5 crucial cognitive abilities for MWP solving, including Problem Understanding, Logical Organization, Associative Memory, Critical Thinking, and Knowledge Learning. Focused on these abilities, we review two mainstream MWP models in recent 10 years: neural network solvers, and LLM based solvers, and discuss the core human-like abilities they demonstrated in their intricate problem-solving process. Moreover, we rerun all the representative MWP solvers and supplement their performance on 5 mainstream benchmarks for a unified comparison. To the best of our knowledge, this survey first comprehensively analyzes the influential MWP research of the past decade from the perspective of human reasoning cognition and provides an integrative overall comparison across existing approaches. We hope it can inspire further research in AI reasoning. Our repository is released on https://github.com/Ljyustc/FoI-MWP.

large language model, logic & formal reasoning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2510.21999

Country:

Asia (0.67)
Europe > United Kingdom > England (0.27)

Genre:

Workflow (1.00)
Research Report (1.00)
Overview (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(3 more...)

Add feedback

Iterative LLM-Based Generation and Refinement of Distracting Conditions in Math Word Problems

Yang, Kaiqi, Li, Hang, Chu, Yucheng, Liu, Zitao, Tian, Mi, Liu, Hui

arXiv.org Artificial IntelligenceOct-17-2025

Mathematical reasoning serves as a crucial testbed for the intelligence of large language models (LLMs), and math word problems (MWPs) are a popular type of math problems. Most MWP datasets consist of problems containing only the necessary information, while problems with distracting and excessive conditions are often overlooked. Prior works have tested popular LLMs and found a dramatic performance drop in the presence of distracting conditions. However, datasets of MWPs with distracting conditions are limited, and most suffer from lower levels of difficulty and out-of-context expressions. This makes distracting conditions easy to identify and exclude, thus reducing the credibility of benchmarking on them. Moreover, when adding distracting conditions, the reasoning and answers may also change, requiring intensive labor to check and write the solutions. To address these issues, we design an iterative framework to generate distracting conditions using LLMs. We develop a set of prompts to revise MWPs from different perspectives and cognitive levels, encouraging the generation of distracting conditions as well as suggestions for further revision. Another advantage is the shared solutions between original and revised problems: we explicitly guide the LLMs to generate distracting conditions that do not alter the original solutions, thus avoiding the need to generate new solutions. This framework is efficient and easy to deploy, reducing the overhead of generating MWPs with distracting conditions while maintaining data quality.

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2510.08615

Country: Asia > China (0.28)

Genre: Research Report (0.84)

Industry: Education (0.93)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Bridging the Culture Gap: A Framework for LLM-Driven Socio-Cultural Localization of Math Word Problems in Low-Resource Languages

Azime, Israel Abebe, Belay, Tadesse Destaw, Klakow, Dietrich, Slusallek, Philipp, Chhabra, Anshuman

arXiv.org Artificial IntelligenceOct-8-2025

Large language models (LLMs) have demonstrated significant capabilities in solving mathematical problems expressed in natural language. However, multilingual and culturally-grounded mathematical reasoning in low-resource languages lags behind English due to the scarcity of socio-cultural task datasets that reflect accurate native entities such as person names, organization names, and currencies. Existing multilingual benchmarks are predominantly produced via translation and typically retain English-centric entities, owing to the high cost associated with human annotater-based localization. Moreover, automated localization tools are limited, and hence, truly localized datasets remain scarce. To bridge this gap, we introduce a framework for LLM-driven cultural localization of math word problems that automatically constructs datasets with native names, organizations, and currencies from existing sources. We find that translated benchmarks can obscure true multilingual math ability under appropriate socio-cultural contexts. Through extensive experiments, we also show that our framework can help mitigate English-centric entity bias and improves robustness when native entities are introduced across various languages.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2508.14913

Country:

North America > Mexico (0.28)
North America > United States > Florida (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

Solving Math Word Problems Using Estimation Verification and Equation Generation

Piehl, Mitchell, Wilson, Dillon, Kalita, Ananya, Kalita, Jugal

arXiv.org Artificial IntelligenceSep-24-2025

Large Language Models (LLMs) excel at various tasks, including problem-solving and question-answering. However, LLMs often find Math Word Problems (MWPs) challenging because solving them requires a range of reasoning and mathematical abilities with which LLMs seem to struggle. Recent efforts have helped LLMs solve more complex MWPs with improved prompts. This study proposes a novel method that initially prompts an LLM to create equations from a decomposition of the question, followed by using an external symbolic equation solver to produce an answer. To ensure the accuracy of the obtained answer, inspired by an established recommendation of math teachers, the LLM is instructed to solve the MWP a second time, but this time with the objective of estimating the correct answer instead of solving it exactly. The estimation is then compared to the generated answer to verify. If verification fails, an iterative rectification process is employed to ensure the correct answer is eventually found. This approach achieves new state-of-the-art results on datasets used by prior published research on numeric and algebraic MWPs, improving the previous best results by nearly two percent on average. In addition, the approach obtains satisfactory results on trigonometric MWPs, a task not previously attempted to the authors' best knowledge. This study also introduces two new datasets, SVAMPClean and Trig300, to further advance the testing of LLMs' reasoning abilities.

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2509.18565

Country: North America > United States > Colorado (0.14)

Genre: