Reveal-Bangla: A Dataset for Cross-Lingual Multi-Step Reasoning Evaluation

Islam, Khondoker Ittehadul, Sarti, Gabriele

Dec-4-2025–arXiv.org Artificial Intelligence

Language models have demonstrated remarkable performance on complex multi-step reasoning tasks. However, their evaluation has been predominantly confined to high-resource languages such as English. In this paper, we introduce a manually translated Bangla multi-step reasoning dataset derived from the English Reveal dataset, featuring both binary and non-binary question types. We conduct a controlled evaluation of English-centric and Bangla-centric multilingual small language models on the original dataset and our translated version to compare their ability to exploit relevant reasoning steps to produce correct answers. Our results show that, in comparable settings, reasoning context is beneficial for more challenging non-binary questions, but models struggle to employ relevant Bangla reasoning steps effectively. We conclude by exploring how reasoning steps contribute to models' predictions, highlighting different trends across models and languages.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Dec-4-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.94)
- Asia (0.68)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Education > Curriculum > Subject-Specific Education (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Natural Language > Large Language Model (1.00)
  - Cognitive Science > Problem Solving (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found