OmniScience: A Domain-Specialized LLM for Scientific Reasoning and Discovery

Prabhakar, Vignesh, Islam, Md Amirul, Atanas, Adam, Wang, Yao-Ting, Han, Joah, Jhunjhunwala, Aastha, Apte, Rucha, Clark, Robert, Xu, Kang, Wang, Zihan, Liu, Kai

Mar-21-2025–arXiv.org Artificial Intelligence

Large Language Models (LLMs) have demonstrated remarkable potential in advancing scientific knowledge and addressing complex challenges. In this work, we introduce OmniScience, a specialized large reasoning model for general science, developed through three key components: (1) domain adaptive pretraining on a carefully curated corpus of scientific literature, (2) instruction tuning on a specialized dataset to guide the model in following domain-specific tasks, and (3) reasoning-based knowledge distillation through fine-tuning to significantly enhance its ability to generate contextually relevant and logically sound responses. We demonstrate the versatility of OmniScience by developing a battery agent that efficiently ranks molecules as potential electrolyte solvents or additives. Comprehensive evaluations reveal that OmniScience is competitive with state-of-the-art large reasoning models on the GPQA Diamond and domain-specific battery benchmarks, while outperforming all public reasoning and non-reasoning models with similar parameter counts. We further demonstrate via ablation experiments that domain adaptive pretraining and reasoning-based knowledge distillation are critical to attain our performance levels, across benchmarks.

large language model, machine learning, resistance, (17 more...)

arXiv.org Artificial Intelligence

Mar-21-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report (1.00)

Industry:
- Electrical Industrial Apparatus (1.00)
- Energy > Energy Storage (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)
  - Representation & Reasoning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found