Reasoning With a Star: A Heliophysics Dataset and Benchmark for Agentic Scientific Reasoning

Lee, Kevin, Spiewak, Russell, Walsh, James

Nov-27-2025–arXiv.org Artificial Intelligence

Scientific reasoning through Large Language Models in heliophysics involves more than just recalling facts: it requires incorporating physical assumptions, maintaining consistent units, and providing clear scientific formats through coordinated approaches. To address these challenges, we present Reasoning With a Star, a newly contributed heliophysics dataset applicable to reasoning; we also provide an initial benchmarking approach. Our data are constructed from National Aeronautics and Space Administration & University Corporation for Atmospheric Research Living With a Star summer school problem sets and compiled into a readily consumable question-and-answer structure with question contexts, reasoning steps, expected answer type, ground-truth targets, format hints, and metadata. A programmatic grader checks the predictions using unit-aware numerical tolerance, symbolic equivalence, and schema validation. We benchmark a single-shot baseline and four multi-agent patterns, finding that decomposing workflows through systems engineering principles outperforms direct prompting on problems requiring deductive reasoning rather than pure inductive recall.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

Nov-27-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States > California > Los Angeles County (0.28)

Genre:
- Workflow (1.00)
- Research Report (1.00)
- Instructional Material > Course Syllabus & Notes (0.34)

Industry:
- Energy (0.46)
- Aerospace & Defense (0.46)
- Government
  - Space Agency (0.34)
  - Regional Government > North America Government
    - United States Government (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Agents (1.00)
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (0.94)
  - Machine Learning > Neural Networks
    - Deep Learning (0.94)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found