Bielik v3 Small: Technical Report

Ociepa, Krzysztof, Flis, Łukasz, Kinas, Remigiusz, Wróbel, Krzysztof, Gwoździej, Adrian

May-12-2025–arXiv.org Artificial Intelligence

We introduce Bielik v3, a series of parameter-efficient generative text models (1.5B and 4.5B) optimized for Polish language processing. These models demonstrate that smaller, well-optimized architectures can achieve performance comparable to much larger counterparts while requiring substantially fewer computational resources. Our approach incorporates several key innovations: a custom Polish tokenizer (APT4) that significantly improves token efficiency, Weighted Instruction Cross-Entropy Loss to balance learning across instruction types, and Adaptive Learning Rate that dynamically adjusts based on training progress. Trained on a meticulously curated corpus of 292 billion tokens spanning 303 million documents, these models excel across multiple benchmarks, including the Open PL LLM Leaderboard, Complex Polish Text Understanding Benchmark, Polish EQ-Bench, and Polish Medical Leaderboard. The 4.5B parameter model achieves results competitive with models 2-3 times its size, while the 1.5B model delivers strong performance despite its extremely compact profile. These advances establish new benchmarks for parameter-efficient language modeling in less-represented languages, making high-quality Polish language AI more accessible for resource-constrained applications.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

May-12-2025

arXiv.org PDF

Add feedback

Country:
- Asia
  - China > Hong Kong (0.04)
  - Indonesia > Bali (0.04)
  - Singapore (0.04)
  - Thailand > Bangkok
    - Bangkok (0.04)
- Europe
  - France > Provence-Alpes-Côte d'Azur
    - Bouches-du-Rhône > Marseille (0.04)
  - Poland (0.04)
- North America
  - Canada > British Columbia
    - Vancouver (0.04)
  - Mexico > Mexico City
    - Mexico City (0.04)
  - United States (0.04)

Genre:
- Research Report (1.00)

Industry:
- Education (0.67)
- Information Technology (0.46)
- Leisure & Entertainment (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language
    - Chatbot (1.00)
    - Large Language Model (1.00)
    - Text Processing (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found