PLLuM: A Family of Polish Large Language Models
Kocoń, Jan, Piasecki, Maciej, Janz, Arkadiusz, Ferdinan, Teddy, Radliński, Łukasz, Koptyra, Bartłomiej, Oleksy, Marcin, Woźniak, Stanisław, Walkowiak, Paweł, Wojtasik, Konrad, Moska, Julia, Naskręt, Tomasz, Walkowiak, Bartosz, Gniewkowski, Mateusz, Szyc, Kamil, Motyka, Dawid, Banach, Dawid, Dalasiński, Jonatan, Rudnicka, Ewa, Alberski, Bartłomiej, Walkowiak, Tomasz, Szczęsny, Aleksander, Markiewicz, Maciej, Bernaś, Tomasz, Mazur, Hubert, Żyta, Kamil, Tykierko, Mateusz, Chodak, Grzegorz, Kajdanowicz, Tomasz, Kazienko, Przemysław, Karlińska, Agnieszka, Seweryn, Karolina, Kołos, Anna, Chrabąszcz, Maciej, Lorenc, Katarzyna, Krasnodębska, Aleksandra, Wilczek, Artur, Dziewulska, Katarzyna, Betscher, Paula, Cieślińska, Zofia, Kowol, Katarzyna, Mikoś, Daria, Trzciński, Maciej, Krutul, Dawid, Kozłowski, Marek, Dadas, Sławomir, Poświata, Rafał, Perełkiewicz, Michał, Grębowiec, Małgorzata, Kazuła, Maciej, Białas, Marcin, Roszko, Roman, Roszko, Danuta, Vaičenonienė, Jurgita, Utka, Andrius, Levchuk, Paweł, Kowalski, Paweł, Prawdzic-Jankowska, Irena, Ogrodniczuk, Maciej, Borys, Monika, Bulińska, Anna, Gumienna, Wiktoria, Kieraś, Witold, Komosińska, Dorota, Krasnowska-Kieraś, Katarzyna, Kobyliński, Łukasz, Lewandowska, Martyna, Łaziński, Marek, Łątkowski, Mikołaj, Mastalerz, Dawid, Milewicz, Beata, Mykowiecka, Agnieszka Anna, Peljak-Łapińska, Angelika, Penno, Sandra, Przybysz, Zuzanna, Rudolf, Michał, Rybak, Piotr, Saputa, Karolina, Tomaszewska, Aleksandra, Wawer, Aleksander, Woliński, Marcin, Wołoszyn, Joanna, Wróblewska, Alina, Żuk, Bartosz, Żarnecki, Filip, Kaczyński, Konrad, Cichosz, Anna, Deckert, Zuzanna, Garnys, Monika, Grabarczyk, Izabela, Janowski, Wojciech, Karasińska, Sylwia, Kujawiak, Aleksandra, Misztela, Piotr, Szymańska, Maria, Walkusz, Karolina, Siek, Igor, Kwiatkowski, Jakub, Pęzik, Piotr
–arXiv.org Artificial Intelligence
Large Language Models (LLMs) play a central role in modern artificial intelligence, yet their development has been primarily focused on English, resulting in limited support for other languages. We present PLLuM (Polish Large Language Model), the largest open-source family of foundation models tailored specifically for the Polish language. Developed by a consortium of major Polish research institutions, PLLuM addresses the need for high-quality, transparent, and culturally relevant language models beyond the English-centric commercial landscape. We describe the development process, including the construction of a new 140-billion-token Polish text corpus for pre-training, a 77k custom instructions dataset, and a 100k preference optimization dataset. A key component is a Responsible AI framework that incorporates strict data governance and a hybrid module for output correction and safety filtering. We detail the models' architecture, training procedures, and alignment techniques for both base and instruction-tuned variants, and demonstrate their utility in a downstream task within public administration. By releasing these models publicly, PLLuM aims to foster open research and strengthen sovereign AI technologies in Poland.
arXiv.org Artificial Intelligence
Nov-7-2025
- Country:
- Africa > Mozambique (0.04)
- Asia
- Europe
- Belarus (0.04)
- Iceland > Capital Region
- Reykjavik (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Poland
- Greater Poland Province > Poznań (0.04)
- Lesser Poland Province > Kraków (0.04)
- Lower Silesia Province > Wroclaw (0.04)
- Masovia Province > Warsaw (0.04)
- Łódź Province > Łódź (0.04)
- Świętokrzyskie Province > Kielce (0.04)
- Romania > Sud - Muntenia Development Region
- Giurgiu County > Giurgiu (0.04)
- Russia (0.04)
- Switzerland (0.04)
- North America
- Mexico > Mexico City
- Mexico City (0.04)
- United States
- Florida > Miami-Dade County
- Miami (0.04)
- New Mexico > Bernalillo County
- Albuquerque (0.04)
- Florida > Miami-Dade County
- Mexico > Mexico City
- South America > Colombia
- Meta Department > Villavicencio (0.04)
- Genre:
- Overview (1.00)
- Research Report > New Finding (0.92)
- Industry:
- Education (1.00)
- Energy (0.67)
- Government > Regional Government
- Europe Government (0.46)
- Information Technology > Security & Privacy (1.00)
- Law
- Criminal Law (0.67)
- Intellectual Property & Technology Law (1.00)
- Law Enforcement & Public Safety (1.00)
- Technology: