AITopics | eurollm-1

Collaborating Authors

eurollm-1

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

BenCzechMark : A Czech-centric Multitask and Multimetric Benchmark for Large Language Models with Duel Scoring Mechanism

Fajcik, Martin, Docekal, Martin, Dolezal, Jan, Ondrej, Karel, Beneš, Karel, Kapsa, Jan, Smrz, Pavel, Polok, Alexander, Hradis, Michal, Neverilova, Zuzana, Horak, Ales, Sabol, Radoslav, Stefanik, Michal, Jirkovsky, Adam, Adamczyk, David, Hyner, Petr, Hula, Jan, Kydlicek, Hynek

arXiv.org Artificial IntelligenceDec-23-2024

We present BenCzechMark (BCM), the first comprehensive Czech language benchmark designed for large language models, offering diverse tasks, multiple task formats, and multiple evaluation metrics. Its scoring system is grounded in statistical significance theory and uses aggregation across tasks inspired by social preference theory. Our benchmark encompasses 50 challenging tasks, with corresponding test datasets, primarily in native Czech, with 11 newly collected ones. These tasks span 8 categories and cover diverse domains, including historical Czech news, essays from pupils or language learners, and spoken word. Furthermore, we collect and clean BUT-Large Czech Collection, the largest publicly available clean Czech language corpus, and use it for (i) contamination analysis, (ii) continuous pretraining of the first Czech-centric 7B language model, with Czech-specific tokenization. We use our model as a baseline for comparison with publicly available multilingual models. Lastly, we release and maintain a leaderboard, with existing 44 model submissions, where new model submissions can be made at https://huggingface.co/spaces/CZLC/BenCzechMark.

large language model, machine learning, meta-llama-3, (18 more...)

arXiv.org Artificial Intelligence

2412.17933

Country:

Europe > Russia (0.04)
Europe > Czechia > South Moravian Region > Brno (0.04)
Asia > Russia (0.04)
(21 more...)

Genre: Research Report > Experimental Study (0.34)

Industry:

Media (1.00)
Government (1.00)
Education (1.00)
Leisure & Entertainment > Sports (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.92)

Add feedback

Towards Multilingual LLM Evaluation for European Languages

Thellmann, Klaudia, Stadler, Bernhard, Fromm, Michael, Buschhoff, Jasper Schulze, Jude, Alex, Barth, Fabio, Leveling, Johannes, Flores-Herr, Nicolas, Köhler, Joachim, Jäkel, René, Ali, Mehdi

arXiv.org Artificial IntelligenceOct-17-2024

The rise of Large Language Models (LLMs) has revolutionized natural language processing across numerous languages and tasks. However, evaluating LLM performance in a consistent and meaningful way across multiple European languages remains challenging, especially due to the scarcity of language-parallel multilingual benchmarks. We introduce a multilingual evaluation approach tailored for European languages. We employ translated versions of five widely-used benchmarks to assess the capabilities of 40 LLMs across 21 European languages. Our contributions include examining the effectiveness of translated benchmarks, assessing the impact of different translation services, and offering a multilingual evaluation framework for LLMs that includes newly created datasets: EU20-MMLU, EU20-HellaSwag, EU20-ARC, EU20-TruthfulQA, and EU20-GSM8K. The benchmarks and results are made publicly available to encourage further research in multilingual LLM evaluation.

meta-llama-3, mistral-7b-instruct-v0, mistral-7b-v0, (17 more...)

arXiv.org Artificial Intelligence

2410.08928

Country:

Asia > Singapore (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States (0.04)
(11 more...)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.92)

Industry:

Health & Medicine (0.45)
Leisure & Entertainment (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.77)

Add feedback

EuroLLM: Multilingual Language Models for Europe

Martins, Pedro Henrique, Fernandes, Patrick, Alves, João, Guerreiro, Nuno M., Rei, Ricardo, Alves, Duarte M., Pombal, José, Farajian, Amin, Faysse, Manuel, Klimaszewski, Mateusz, Colombo, Pierre, Haddow, Barry, de Souza, José G. C., Birch, Alexandra, Martins, André F. T.

arXiv.org Artificial IntelligenceSep-24-2024

The quality of open-weight LLMs has seen significant improvement, yet they remain predominantly focused on English. In this paper, we introduce the EuroLLM project, aimed at developing a suite of open-weight multilingual LLMs capable of understanding and generating text in all official European Union languages, as well as several additional relevant languages. We outline the progress made to date, detailing our data collection and filtering process, the development of scaling laws, the creation of our multilingual tokenizer, and the data mix and modeling configurations. Additionally, we release our initial models: EuroLLM-1.7B and EuroLLM-1.7B-Instruct and report their performance on multilingual general benchmarks and machine translation.

arxiv preprint arxiv, dataset, eurollm-1, (11 more...)

arXiv.org Artificial Intelligence

2409.16235

Country:

Europe > Spain (0.04)
Europe > Portugal > Lisbon > Lisbon (0.04)
Asia > Thailand > Phuket > Phuket (0.04)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)

Genre: Research Report (0.83)

Industry: Government > Regional Government > Europe Government (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback