ixi-GEN: Efficient Industrial sLLMs through Domain Adaptive Continual Pretraining

Kim, Seonwu, Na, Yohan, Kim, Kihun, Cho, Hanhee, Lim, Geun, Kim, Mintae, Park, Seongik, Kim, Ki Hyun, Han, Youngsub, Jeon, Byoung-Ki

Oct-24-2025–arXiv.org Artificial Intelligence

The emergence of open-source large language models (LLMs) has expanded opportunities for enterprise applications; however, many organizations still lack the infrastructure to deploy and maintain large-scale models. As a result, small LLMs (sLLMs) have become a practical alternative despite inherent performance limitations. While Domain Adaptive Continual Pretraining (DACP) has been explored for domain adaptation, its utility in commercial settings remains under-examined. In this study, we validate the effectiveness of a DACP-based recipe across diverse foundation models and service domains, producing DACP-applied sLLMs (ixi-GEN). Through extensive experiments and real-world evaluations, we demonstrate that ixi-GEN models achieve substantial gains in target-domain performance while preserving general capabilities, offering a cost-efficient and scalable solution for enterprise-level deployment.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

Oct-24-2025

arXiv.org PDF

Add feedback

Country:
- Asia (0.28)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Banking & Finance (1.00)
- Information Technology (0.93)
- Telecommunications (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.94)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found