Goto

Collaborating Authors

 Sevastopol


Russia-Ukraine war: List of key events, day 1,457

Al Jazeera

How the US left Ukraine exposed to Russia's winter war Will Europe use frozen Russian assets to fund war? How can Ukraine rebuild China ties? Russian forces launched 448 attacks on 34 settlements in Ukraine's front-line Zaporizhia region in a single day, injuring a six-year-old child and damaging homes, cars and other infrastructure, regional governor Ivan Fedorov wrote on the Telegram app. Russian drone, missile and artillery attacks on Ukraine's Kherson region injured five people and damaged homes, including seven high-rise buildings, the local military administration said on Telegram. Russian attacks also continued in Ukraine's Dnipropetrovsk and Sumy regions, but local officials there noted that "fortunately, no people were injured".


Ukraine says it carried out first-ever underwater drone strike on Russian submarine in Novorossiysk

FOX News

This material may not be published, broadcast, rewritten, or redistributed. Quotes displayed in real-time or delayed by at least 15 minutes. Market data provided by Factset . Powered and implemented by FactSet Digital Solutions . Mutual Fund and ETF data provided by Refinitiv Lipper .


Democratizing Diplomacy: A Harness for Evaluating Any Large Language Model on Full-Press Diplomacy

Duffy, Alexander, Paech, Samuel J, Shastri, Ishana, Karpinski, Elizabeth, Alloui-Cros, Baptiste, Marques, Tyler, Olson, Matthew Lyle

arXiv.org Artificial Intelligence

We present the first evaluation harness that enables any out-of-the-box, local, Large Language Models (LLMs) to play full-press Diplomacy without fine-tuning or specialized training. Previous work required frontier LLMs, or fine-tuning, due to the high complexity and information density of Diplomacy's game state. Combined with the high variance of matches, these factors made Diplomacy prohibitive for study. In this work, we used data-driven iteration to optimize a textual game state representation such that a 24B model can reliably complete matches without any fine tuning. We develop tooling to facilitate hypothesis testing and statistical analysis, and we present case studies on persuasion, aggressive playstyles, and performance across a range of models. We conduct a variety of experiments across many popular LLMs, finding the larger models perform the best, but the smaller models still play adequately. We also introduce Critical State Analysis: an experimental protocol for rapidly iterating and analyzing key moments in a game at depth. Our harness democratizes the evaluation of strategic reasoning in LLMs by eliminating the need for fine-tuning, and it provides insights into how these capabilities emerge naturally from widely used LLMs. Our code is available in the supplement and will be open sourced.


DipLLM: Fine-Tuning LLM for Strategic Decision-making in Diplomacy

Xu, Kaixuan, Chai, Jiajun, Li, Sicheng, Fu, Yuqian, Zhu, Yuanheng, Zhao, Dongbin

arXiv.org Artificial Intelligence

Diplomacy is a complex multiplayer game that requires both cooperation and competition, posing significant challenges for AI systems. Traditional methods rely on equilibrium search to generate extensive game data for training, which demands substantial computational resources. Large Language Models (LLMs) offer a promising alternative, leveraging pre-trained knowledge to achieve strong performance with relatively small-scale fine-tuning. However, applying LLMs to Diplomacy remains challenging due to the exponential growth of possible action combinations and the intricate strategic interactions among players. To address this challenge, we propose DipLLM, a fine-tuned LLM-based agent that learns equilibrium policies for Diplomacy. DipLLM employs an autoregressive factorization framework to simplify the complex task of multi-unit action assignment into a sequence of unit-level decisions. By defining an equilibrium policy within this framework as the learning objective, we fine-tune the model using only 1.5% of the data required by the state-of-the-art Cicero model, surpassing its performance. Our results demonstrate the potential of fine-tuned LLMs for tackling complex strategic decision-making in multiplayer games.


Ukraine's 'Spiderweb' drone assault forces Russia to shelter, move aircraft

Al Jazeera

Russia's increased sense of vulnerability may be the most important result of a recent large-scale Ukrainian drone attack named Operation Spiderweb, experts tell Al Jazeera. The operation destroyed as much as a third of Russia's strategic bomber fleet on the tarmac of four airfields deep inside Russia on June 1. Days later, Russia started to build shelters for its bombers and relocate them. An open source intelligence (OSINT) researcher nicknamed Def Mon posted time-lapse satellite photographs on social media showing major excavations at the Kirovskoe airfield in annexed Crimea as well as in Sevastopol, Gvardiyskoye and Saki, where Russia was constructing shelters for military aircraft. They reported similar work at several airbases in Russia, including the Engels base, which was targeted in Ukraine's attacks on June 1.


SPIN-Bench: How Well Do LLMs Plan Strategically and Reason Socially?

Yao, Jianzhu, Wang, Kevin, Hsieh, Ryan, Zhou, Haisu, Zou, Tianqing, Cheng, Zerui, Wang, Zhangyang, Viswanath, Pramod

arXiv.org Artificial Intelligence

Reasoning and strategic behavior in social interactions is a hallmark of intelligence. This form of reasoning is significantly more sophisticated than isolated planning or reasoning tasks in static settings (e.g., math problem solving). In this paper, we present Strategic Planning, Interaction, and Negotiation (SPIN-Bench), a new multi-domain evaluation designed to measure the intelligence of strategic planning and social reasoning. While many existing benchmarks focus on narrow planning or single-agent reasoning, SPIN-Bench combines classical PDDL tasks, competitive board games, cooperative card games, and multi-agent negotiation scenarios in one unified framework. The framework includes both a benchmark as well as an arena to simulate and evaluate the variety of social settings to test reasoning and strategic behavior of AI agents. We formulate the benchmark SPIN-Bench by systematically varying action spaces, state complexity, and the number of interacting agents to simulate a variety of social settings where success depends on not only methodical and step-wise decision making, but also conceptual inference of other (adversarial or cooperative) participants. Our experiments reveal that while contemporary LLMs handle basic fact retrieval and short-range planning reasonably well, they encounter significant performance bottlenecks in tasks requiring deep multi-hop reasoning over large state spaces and socially adept coordination under uncertainty. We envision SPIN-Bench as a catalyst for future research on robust multi-agent planning, social reasoning, and human--AI teaming. Project Website: https://spinbench.github.io/


License Plate Images Generation with Diffusion Models

Shpir, Mariia, Shvai, Nadiya, Nakib, Amir

arXiv.org Artificial Intelligence

Despite the evident practical importance of license plate recognition (LPR), corresponding research is limited by the volume of publicly available datasets due to privacy regulations such as the General Data Protection Regulation (GDPR). To address this challenge, synthetic data generation has emerged as a promising approach. In this paper, we propose to synthesize realistic license plates (LPs) using diffusion models, inspired by recent advances in image and video generation. In our experiments a diffusion model was successfully trained on a Ukrainian LP dataset, and 1000 synthetic images were generated for detailed analysis. Through manual classification and annotation of the generated images, we performed a thorough study of the model output, such as success rate, character distributions, and type of failures. Our contributions include experimental validation of the efficacy of diffusion models for LP synthesis, along with insights into the characteristics of the generated data. Furthermore, we have prepared a synthetic dataset consisting of 10,000 LP images, publicly available at https://zenodo.org/doi/10.5281/zenodo.13342102. Conducted experiments empirically confirm the usefulness of synthetic data for the LPR task. Despite the initial performance gap between the model trained with real and synthetic data, the expansion of the training data set with pseudolabeled synthetic data leads to an improvement in LPR accuracy by 3% compared to baseline.


Swift Cross-Dataset Pruning: Enhancing Fine-Tuning Efficiency in Natural Language Understanding

Nguyen, Binh-Nguyen, He, Yang

arXiv.org Artificial Intelligence

Dataset pruning aims to select a subset of a dataset for efficient model training. While data efficiency in natural language processing has primarily focused on within-corpus scenarios during model pre-training, efficient dataset pruning for task-specific fine-tuning across diverse datasets remains challenging due to variability in dataset sizes, data distributions, class imbalance and label spaces. Current cross-dataset pruning techniques for fine-tuning often rely on computationally expensive sample ranking processes, typically requiring full dataset training or reference models. We address this gap by proposing Swift Cross-Dataset Pruning (SCDP). Specifically, our approach uses TF-IDF embeddings with geometric median to rapidly evaluate sample importance. We then apply dataset size-adaptive pruning to ensure diversity: for smaller datasets, we retain samples far from the geometric median, while for larger ones, we employ distance-based stratified pruning. Experimental results on six diverse datasets demonstrate the effectiveness of our method, spanning various tasks and scales while significantly reducing computational resources. Source code is available at: https://github.com/he-y/NLP-Dataset-Pruning


BordIRlines: A Dataset for Evaluating Cross-lingual Retrieval-Augmented Generation

Li, Bryan, Haider, Samar, Luo, Fiona, Agashe, Adwait, Callison-Burch, Chris

arXiv.org Artificial Intelligence

Large language models excel at creative generation but continue to struggle with the issues of hallucination and bias. While retrieval-augmented generation (RAG) provides a framework for grounding LLMs' responses in accurate and up-to-date information, it still raises the question of bias: which sources should be selected for inclusion in the context? And how should their importance be weighted? In this paper, we study the challenge of cross-lingual RAG and present a dataset to investigate the robustness of existing systems at answering queries about geopolitical disputes, which exist at the intersection of linguistic, cultural, and political boundaries. Our dataset is sourced from Wikipedia pages containing information relevant to the given queries and we investigate the impact of including additional context, as well as the composition of this context in terms of language and source, on an LLM's response. Our results show that existing RAG systems continue to be challenged by cross-lingual use cases and suffer from a lack of consistency when they are provided with competing information in multiple languages. We present case studies to illustrate these issues and outline steps for future research to address these challenges. We make our dataset and code publicly available at https://github.com/manestay/bordIRlines.


Ukraine's navy chief says Russian warships are leaving Crimean hub in Black Sea

FOX News

The Russian navy's Black Sea Fleet has been forced to rebase nearly all its combat-ready warships from occupied Crimea to other locations, and its main naval hub is becoming ineffectual because of attacks by Kyiv, Ukraine's navy chief said. Vice-Admiral Oleksiy Neizhpapa said Ukrainian missile and naval drone strikes had caused heavy damage to the Sevastopol base, a logistics hub for repairs, maintenance, training and ammunition storage among other important functions for Russia. "They were established over many decades, possibly centuries. And clearly they are now losing this hub," Neizhpapa told Reuters in a rare interview in the port city of Odesa ahead of Ukraine Navy Day on Sunday. More than 28 months since Russia's full-scale invasion, Kyiv has dealt a series of stinging blows to Moscow in the Black Sea although Ukrainian ground troops are on the back foot across a sprawling front.