Reasoning With a Star: A Heliophysics Dataset and Benchmark for Agentic Scientific Reasoning
Lee, Kevin, Spiewak, Russell, Walsh, James
–arXiv.org Artificial Intelligence
Scientific reasoning through Large Language Models in heliophysics involves more than just recalling facts: it requires incorporating physical assumptions, maintaining consistent units, and providing clear scientific formats through coordinated approaches. To address these challenges, we present Reasoning With a Star, a newly contributed heliophysics dataset applicable to reasoning; we also provide an initial benchmarking approach. Our data are constructed from National Aeronautics and Space Administration & University Corporation for Atmospheric Research Living With a Star summer school problem sets and compiled into a readily consumable question-and-answer structure with question contexts, reasoning steps, expected answer type, ground-truth targets, format hints, and metadata. A programmatic grader checks the predictions using unit-aware numerical tolerance, symbolic equivalence, and schema validation. We benchmark a single-shot baseline and four multi-agent patterns, finding that decomposing workflows through systems engineering principles outperforms direct prompting on problems requiring deductive reasoning rather than pure inductive recall.
arXiv.org Artificial Intelligence
Nov-27-2025
- Country:
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.14)
- North America > United States
- California > Los Angeles County
- Los Angeles (0.14)
- Pasadena (0.04)
- New Jersey > Hudson County
- Hoboken (0.04)
- Texas > Collin County
- Frisco (0.40)
- California > Los Angeles County
- Europe > United Kingdom
- Genre:
- Instructional Material > Course Syllabus & Notes (0.34)
- Research Report (1.00)
- Workflow (1.00)
- Industry:
- Aerospace & Defense (0.46)
- Energy (0.46)
- Government
- Technology: