SciInstruct: a Self-Reflective Instruction Annotated Dataset for Training Scientific Language Models Dan Zhang
–Neural Information Processing Systems
Large Language Models (LLMs) have shown promise in assisting scientific discovery. However, such applications are currently limited by LLMs' deficiencies in understanding intricate scientific concepts, deriving symbolic equations, and solving advanced numerical calculations. To bridge these gaps, we introduce SciInstruct, a suite of scientific instructions for training scientific language models capable of college-level scientific reasoning. Central to our approach is a novel self-reflective instruction annotation framework to address the data scarcity challenge in the science domain. This framework leverages existing LLMs to generate step-by-step reasoning for unlabelled scientific questions, followed by a process of self-reflective critic-and-revise.
Neural Information Processing Systems
May-28-2025, 07:02:36 GMT
- Country:
- North America > United States (0.28)
- Genre:
- Research Report > New Finding (0.45)
- Industry:
- Education (0.92)
- Technology: