crimean tatar
The Crimean Tatar movement trying to ruin Russia's army from within
Could Ukraine hold a presidential election right now? Will Europe use frozen Russian assets to fund war? How can Ukraine rebuild China ties? 'Ukraine is running out of men, money and time' The Crimean Tatar movement trying to ruin Russia's army from within On the weekend, a power cut shut down a train line carrying Russian weapons and supplies to the front line through the region of Bryansk in western Russia near the Ukrainian border. But this was no ordinary blackout.
- Asia > Russia (1.00)
- South America (0.41)
- North America > United States (0.41)
- (11 more...)
- Government > Regional Government > Europe Government > Russia Government (1.00)
- Government > Regional Government > Asia Government > Russia Government (1.00)
- Government > Military (1.00)
Deep Language Geometry: Constructing a Metric Space from LLM Weights
Shamrai, Maksym, Hamolia, Vladyslav
We introduce a novel framework that utilizes the internal weight activations of modern Large Language Models (LLMs) to construct a metric space of languages. Unlike traditional approaches based on hand-crafted linguistic features, our method automatically derives high-dimensional vector representations by computing weight importance scores via an adapted pruning algorithm. Our approach captures intrinsic language characteristics that reflect linguistic phenomena. We validate our approach across diverse datasets and multilingual LLMs, covering 106 languages. The results align well with established linguistic families while also revealing unexpected inter-language connections that may indicate historical contact or language evolution. The source code, computed language latent vectors, and visualization tool are made publicly available at https://github.com/mshamrai/deep-language-geometry.
- Europe > Ukraine > Kyiv Oblast > Kyiv (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.73)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)
TUMLU: A Unified and Native Language Understanding Benchmark for Turkic Languages
Isbarov, Jafar, Akhundjanova, Arofat, Hajili, Mammad, Huseynova, Kavsar, Gaynullin, Dmitry, Rzayev, Anar, Tursun, Osman, Saetov, Ilshat, Kharisov, Rinat, Belginova, Saule, Kenbayeva, Ariana, Alisheva, Amina, Turdubaeva, Aizirek, Köksal, Abdullatif, Rustamov, Samir, Ataman, Duygu
Being able to thoroughly assess massive multi-task language understanding (MMLU) capabilities is essential for advancing the applicability of multilingual language models. However, preparing such benchmarks in high quality native language is often costly and therefore limits the representativeness of evaluation datasets. While recent efforts focused on building more inclusive MMLU benchmarks, these are conventionally built using machine translation from high-resource languages, which may introduce errors and fail to account for the linguistic and cultural intricacies of the target languages. In this paper, we address the lack of native language MMLU benchmark especially in the under-represented Turkic language family with distinct morphosyntactic and cultural characteristics. We propose two benchmarks for Turkic language MMLU: TUMLU is a comprehensive, multilingual, and natively developed language understanding benchmark specifically designed for Turkic languages. It consists of middle- and high-school level questions spanning 11 academic subjects in Azerbaijani, Crimean Tatar, Karakalpak, Kazakh, Tatar, Turkish, Uyghur, and Uzbek. We also present TUMLU-mini, a more concise, balanced, and manually verified subset of the dataset. Using this dataset, we systematically evaluate a diverse range of open and proprietary multilingual large language models (LLMs), including Claude, Gemini, GPT, and LLaMA, offering an in-depth analysis of their performance across different languages, subjects, and alphabets. To promote further research and development in multilingual language understanding, we release TUMLU-mini and all corresponding evaluation scripts.
- North America > United States > California (0.14)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- Oceania > Australia > Queensland (0.04)
- (11 more...)