Multi-Model Synthetic Training for Mission-Critical Small Language Models

Sep-17-2025–arXiv.org Artificial Intelligence

Abstract--Large Language Models (LLMs) have demonstrated remarkable capabilities across many domains, yet their application to specialized fields remains constrained by the scarcity and complexity of domain-specific training data. We present a novel approach that achieves a 261x cost reduction for maritime intelligence by using LLMs as one-time teachers rather than using them directly for inference. Our method transforms 3.2 billion Automatic Identification System (AIS) vessel tracking records into 21,543 synthetic question and answer pairs through multi-model generation (GPT -4o and o3-mini), preventing over-fitting and ensuring accurate reasoning. We show that smaller, cheaper models - when fine tuned properly - can provide similar accuracy compared to larger models that are prohibitively expensive. Our work contributes to the growing field of synthetic dataset generation for specialized AI applications and presents a highly reproducible framework for domains where manual annotation is infeasible. Beyond expanding research in the growing field of specialized small language models, our approach has immediate applications in maritime safety, security operations, and vessel traffic management systems in various industries. In recent years, Large Language Models (LLMs) have proven successful across diverse natural language tasks, but their usage for specialized domains faces a large challenge: the cost of continuous LLM inference, often reaching thousands of dollars per day for real-time systems [1].

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Sep-17-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States > Virginia (0.28)

Genre:
- Research Report
  - New Finding (0.94)
  - Experimental Study (0.68)

Industry:
- Transportation > Marine (0.66)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.46)
- Government
  - Military (0.68)
  - Regional Government > North America Government
    - United States Government (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found