Simferopol
KyrgyzNLP: Challenges, Progress, and Future
Alekseev, Anton, Turatali, Timur
Large language models (LLMs) have excelled in numerous benchmarks, advancing AI applications in both linguistic and non-linguistic tasks. However, this has primarily benefited well-resourced languages, leaving less-resourced ones (LRLs) at a disadvantage. In this paper, we highlight the current state of the NLP field in the specific LRL: kyrgyz tili. Human evaluation, including annotated datasets created by native speakers, remains an irreplaceable component of reliable NLP performance, especially for LRLs where automatic evaluations can fall short. In recent assessments of the resources for Turkic languages, Kyrgyz is labeled with the status 'Scraping By', a severely under-resourced language spoken by millions. This is concerning given the growing importance of the language, not only in Kyrgyzstan but also among diaspora communities where it holds no official status. We review prior efforts in the field, noting that many of the publicly available resources have only recently been developed, with few exceptions beyond dictionaries (the processed data used for the analysis is presented at https://kyrgyznlp.github.io/). While recent papers have made some headway, much more remains to be done. Despite interest and support from both business and government sectors in the Kyrgyz Republic, the situation for Kyrgyz language resources remains challenging. We stress the importance of community-driven efforts to build these resources, ensuring the future advancement sustainability. We then share our view of the most pressing challenges in Kyrgyz NLP. Finally, we propose a roadmap for future development in terms of research topics and language resources.
- Asia > Russia (0.14)
- Europe > Germany > Saxony > Leipzig (0.05)
- Asia > Kyrgyzstan > Chüy Region > Bishkek (0.04)
- (19 more...)
- Research Report (1.00)
- Overview > Growing Problem (0.34)
- Government (1.00)
- Media > News (0.46)
Table-LLM-Specialist: Language Model Specialists for Tables using Iterative Generator-Validator Fine-tuning
Xing, Junjie, He, Yeye, Zhou, Mengyu, Dong, Haoyu, Han, Shi, Zhang, Dongmei, Chaudhuri, Surajit
In this work, we propose Table-LLM-Specialist, or Table-Specialist for short, as a new self-trained fine-tuning paradigm specifically designed for table tasks. Our insight is that for each table task, there often exist two dual versions of the same task, one generative and one classification in nature. Leveraging their duality, we propose a Generator-Validator paradigm, to iteratively generate-then-validate training data from language-models, to fine-tune stronger \sys models that can specialize in a given task, without requiring manually-labeled data. Our extensive evaluations suggest that our Table-Specialist has (1) \textit{strong performance} on diverse table tasks over vanilla language-models -- for example, Table-Specialist fine-tuned on GPT-3.5 not only outperforms vanilla GPT-3.5, but can often match or surpass GPT-4 level quality, (2) \textit{lower cost} to deploy, because when Table-Specialist fine-tuned on GPT-3.5 achieve GPT-4 level quality, it becomes possible to deploy smaller models with lower latency and inference cost, with comparable quality, and (3) \textit{better generalizability} when evaluated across multiple benchmarks, since \sys is fine-tuned on a broad range of training data systematically generated from diverse real tables. Our code and data will be available at https://github.com/microsoft/Table-LLM-Specialist.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Europe > Russia (0.14)
- Asia > Russia (0.14)
- (73 more...)
- Transportation > Passenger (1.00)
- Transportation > Ground > Road (1.00)
- Media (1.00)
- (3 more...)
Ukraine attacked Russian village with cluster munitions: Governor
The governor of Russia's Belgorod region has said that Ukraine fired cluster munitions at a village near the Ukrainian border on Friday, but that there were no casualties or damage. The governor made the statement on Saturday during a daily briefing on his Telegram channel, without providing visual evidence. There was no immediate comment from Ukrainian authorities. "In Belgorod district, 21 artillery shells and three cluster munitions from a multiple-launch rocket system were fired at the village of Zhuravlevka," Governor Vyacheslav Gladkov said. Ukraine received cluster bombs from the United States this month, but it has pledged to use them only to dislodge concentrations of enemy soldiers. They contain dozens of small bomblets that rain shrapnel over a wide area, but are banned in many countries due to the potential danger they pose to civilians.
- Europe > Russia > Central Federal District > Belgorod Oblast > Belgorod (0.50)
- Asia > Russia (0.35)
- North America > United States (0.26)
- (4 more...)
- Government > Regional Government > Europe Government > Ukraine Government (0.37)
- Government > Military > Army (0.37)
Efficient and Flexible Topic Modeling using Pretrained Embeddings and Bag of Sentences
Pre-trained language models have led to a new state-of-the-art in many NLP tasks. However, for topic modeling, statistical generative models such as LDA are still prevalent, which do not easily allow incorporating contextual word vectors. They might yield topics that do not align very well with human judgment. In this work, we propose a novel topic modeling and inference algorithm. We suggest a bag of sentences (BoS) approach using sentences as the unit of analysis. We leverage pre-trained sentence embeddings by combining generative process models with clustering. We derive a fast inference algorithm based on expectation maximization, hard assignments, and an annealing process. Our evaluation shows that our method yields state-of-the art results with relatively little computational demands. Our methods is more flexible compared to prior works leveraging word embeddings, since it provides the possibility to customize topic-document distributions using priors. Code is at \url{https://github.com/JohnTailor/BertSenClu}.
- Asia > Middle East > Iraq > Baghdad Governorate > Baghdad (0.04)
- Asia > Middle East > Iran (0.04)
- North America > Canada > Manitoba > Winnipeg Metropolitan Region > Winnipeg (0.04)
- (13 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Russian forces in Kherson alert as Ukraine plans next move
After recapturing Kherson city, Ukraine kept Russian forces guessing about their next move, pinning down occupying troops in defensive positions and rendering them unavailable for offensive operations. Some 30,000 Russian troops that withdrew from the west bank of the Dnieper river earlier this month were entrenching themselves in the Zaporizhia and Kherson regions during the 39th week of the war, deputy head of Ukrainian military intelligence Major-General Vadym Skibitskyi, told the Kyiv Post. "[The Russians] are waiting for our liberation offensive, that's why they have created a defensive line in Kherson, another on the administrative border of [Kherson and] Crimea, and another in the northern Crimea region," Skibitskiy said. "The enemy is on the defensive in the Zaporizhzhia direction," said Ukraine's general staff. "In the Kryvyi Rih and Kherson directions, the enemy is creating an echeloned defence system, improving fortification equipment and logistical support of advanced units, and not stopping artillery fire at the positions of our troops and settlements on the right bank of the Dnipro River."
- Europe > Ukraine > Kherson Oblast > Kherson (1.00)
- Asia > Russia (1.00)
- Europe > Ukraine > Zaporizhia Oblast > Zaporizhia (0.28)
- (10 more...)
- Government > Military (1.00)
- Government > Regional Government > Europe Government > Russia Government (0.69)
- Government > Regional Government > Asia Government > Russia Government (0.69)
Parallel Stochastic Mirror Descent for MDPs
Tiapkin, Daniil, Stonyakin, Fedor, Gasnikov, Alexander
We consider the problem of learning the optimal policy for infinite-horizon Markov decision processes (MDPs). For this purpose, some variant of Stochastic Mirror Descent is proposed for convex programming problems with Lipschitz-continuous functionals. An important detail is the ability to use inexact values of functional constraints. We analyze this algorithm in a general case and obtain an estimate of the convergence rate that does not accumulate errors during the operation of the method. Using this algorithm, we get the first parallel algorithm for average-reward MDPs with a generative model. One of the main features of the presented method is low communication costs in a distributed centralized setting.
- Asia > Russia (0.05)
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
- North America > United States (0.04)
- (3 more...)
Twitter data could have been a source of Kremlin intelligence during the 2014 Ukraine conflict
Kremlin analysts could have used Twitter as a source of military intelligence to inform their actions in the 2014 Russia–Ukraine conflict, a study has found. University of California experts showed that location-tagged tweets by Ukraine residents could have been used to map out sentiments towards Russia in real-time. The map they made of pro-Kremlin regions turned out to bear a striking resemblance to the actual areas to which Russia dispatched its special forces. Specifically, this included Crimea and regions in the far east of Ukraine -- where the incoming forces would have been most likely to be seen as liberators. In contrast, the data could also reveal those areas where dispatching forces would have lead to greater resistance and corresponding casualties and costs.
- Asia > Russia (1.00)
- Europe > Ukraine > Luhansk Oblast > Luhansk (0.15)
- Europe > Ukraine > Kyiv Oblast > Kyiv (0.07)
- (10 more...)
- Government > Regional Government > Europe Government > Russia Government (1.00)
- Government > Regional Government > Asia Government > Russia Government (1.00)