Towards a Small Language Model Lifecycle Framework

Miraghaei, Parsa, Moreschini, Sergio, Kolehmainen, Antti, Hästbacka, David

Jun-10-2025–arXiv.org Artificial Intelligence

Benchmark suites such as MMLU and HellaSwag measure core capabilities but are vulnerable to data contamination, making careful curation and transparent reporting essential [OS21], [OS2], [OS13], [OS6]. Trustworthiness evaluation covers robustness to adversarial inputs, privacy protection, reliability (including hallucination and consistency), and safety concerns such as toxicity and bias [OS2], [OS6], all of which are vital for user-facing or high-stakes deployments. Resource efficiency--spanning computational cost, memory, energy, and deployment overhead--is particularly important for SLMs and shapes deployment strategies in constrained environments [OS5], [OS6]. Automated evaluation methods range from statistical scorers like BLEU and ROUGE to model-based and hybrid approaches, with the latter providing stronger alignment with human judgment and greater scalability [OS29], [OS30]. Ultimately, evaluation should be an integrated, continuous process that informs model iteration, balances performance with sustainability and safety, and supports real-world usability at scale.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

Jun-10-2025

arXiv.org PDF

Add feedback

Country:
- Europe > Finland
  - Northern Ostrobothnia > Oulu (0.04)
  - Pirkanmaa > Tampere (0.04)

Genre:
- Overview (1.00)
- Research Report > New Finding (0.46)

Industry:
- Education (0.68)
- Information Technology > Security & Privacy (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language
    - Chatbot (0.68)
    - Large Language Model (1.00)
  - Representation & Reasoning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found