Bielik v3 Small: Technical Report
Ociepa, Krzysztof, Flis, Łukasz, Kinas, Remigiusz, Wróbel, Krzysztof, Gwoździej, Adrian
–arXiv.org Artificial Intelligence
We introduce Bielik v3, a series of parameter-efficient generative text models (1.5B and 4.5B) optimized for Polish language processing. These models demonstrate that smaller, well-optimized architectures can achieve performance comparable to much larger counterparts while requiring substantially fewer computational resources. Our approach incorporates several key innovations: a custom Polish tokenizer (APT4) that significantly improves token efficiency, Weighted Instruction Cross-Entropy Loss to balance learning across instruction types, and Adaptive Learning Rate that dynamically adjusts based on training progress. Trained on a meticulously curated corpus of 292 billion tokens spanning 303 million documents, these models excel across multiple benchmarks, including the Open PL LLM Leaderboard, Complex Polish Text Understanding Benchmark, Polish EQ-Bench, and Polish Medical Leaderboard. The 4.5B parameter model achieves results competitive with models 2-3 times its size, while the 1.5B model delivers strong performance despite its extremely compact profile. These advances establish new benchmarks for parameter-efficient language modeling in less-represented languages, making high-quality Polish language AI more accessible for resource-constrained applications.
arXiv.org Artificial Intelligence
May-12-2025
- Country:
- Asia
- Europe
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Poland (0.04)
- France > Provence-Alpes-Côte d'Azur
- North America
- Canada > British Columbia
- Vancouver (0.04)
- Mexico > Mexico City
- Mexico City (0.04)
- United States (0.04)
- Canada > British Columbia
- Genre:
- Research Report (1.00)
- Industry:
- Education (0.67)
- Information Technology (0.46)
- Leisure & Entertainment (0.67)
- Technology: