stalker
Bi-Level Knowledge Transfer for Multi-Task Multi-Agent Reinforcement Learning
Multi-Agent Reinforcement Learning (MARL) has achieved remarkable success in various real-world scenarios, but its high cost of online training makes it impractical to learn each task from scratch. To enable effective policy reuse, we consider the problem of zero-shot generalization from offline data across multiple tasks. While prior work focuses on transferring individual skills of agents, we argue that the effective policy transfer across tasks should also capture the team-level coordination knowledge. In this paper, we propose Bi-Level Knowledge Transfer (BiKT) for Multi-Task MARL, which performs knowledge transfer at both the individual and team levels. At the individual level, we extract transferable individual skill embeddings from offline MARL trajectories.
EvoCurr: Self-evolving Curriculum with Behavior Code Generation for Complex Decision-making
Cheng, Yang, Wang, Zilai, Ma, Weiyu, Zhu, Wenhui, Deng, Yue, Zhao, Jian
Large Language Models (LLMs) have demonstrated remarkable capabilities across diverse domains, including programming, planning, and decision-making. However, their performance often degrades when faced with highly complex problem instances that require deep reasoning over long horizons. In such cases, direct problem-solving approaches can lead to inefficiency or failure due to the lack of structured intermediate guidance. To address this, we propose a novel self-evolve framework, EvoCurr, in which a dedicated curriculum-generation LLM constructs a sequence of problem instances with gradually increasing difficulty, tailored to the solver LLM's learning progress. The curriculum dynamically adapts easing challenges when the solver struggles and escalating them when success is consistent, thus maintaining an optimal learning trajectory. This approach enables the solver LLM, implemented as a code-generation model producing Python decision-tree scripts, to progressively acquire the skills needed for complex decision-making tasks. Experimental results on challenging decision-making benchmarks show that our method significantly improves task success rates and solution efficiency compared to direct-solving baselines. These findings suggest that LLM-driven curriculum learning holds strong potential for enhancing automated reasoning in real-world, high-complexity domains.
SC2Arena and StarEvolve: Benchmark and Self-Improvement Framework for LLMs in Complex Decision-Making Tasks
Shen, Pengbo, Wang, Yaqing, Mu, Ni, Luan, Yao, Xie, Runpeng, Yang, Senhao, Wang, Lexiang, Hu, Hao, Xu, Shuang, Yang, Yiqin, Xu, Bo
Evaluating large language models (LLMs) in complex decision-making is essential for advancing AI's ability for strategic planning and real-time adaptation. However, existing benchmarks for tasks like StarCraft II fail to capture the game's full complexity, such as its complete game context, diverse action spaces, and all playable races. To address this gap, we present SC2Arena, a benchmark that fully supports all playable races, low-level action spaces, and optimizes text-based observations to tackle spatial reasoning challenges. Complementing this, we introduce StarEvolve, a hierarchical framework that integrates strategic planning with tactical execution, featuring iterative self-correction and continuous improvement via fine-tuning on high-quality game-play data. Its key components include a Planner-Executor-V erifier structure to break down gameplay, and a scoring system for selecting high-quality training samples. Comprehensive analysis using SC2Arena provides valuable insights into developing generalist agents that were not possible with previous benchmarks. Experimental results also demonstrate that our proposed StarEvolve achieves superior performance in strategic planning. Our code, environment, and algorithms are publicly available.
'Close to perfect': readers' favourite games of 2025 so far
Enshrouded is a beautiful combination of Minecraft, Skyrim and resource gathering that makes it at least three games in one. My daughter told me I would love it and I ignored her for too long. I've tackled Elden Ring, but much prefer the often gentler combat of Enshrouded. It sometimes makes me feel like an elite fighter, then other times kicks my arse in precisely the right measures. Its real joy is the flexibility to spend your time doing whatever tickles your fancy. I'll spend a few hours growing crops to make a cake or smelting metals for better armour, then knock off a few quests to unlock new materials and weapons.
Learning Generalizable Skills from Offline Multi-Task Data for Multi-Agent Cooperation
Liu, Sicong, Shu, Yang, Guo, Chenjuan, Yang, Bin
Learning cooperative multi-agent policy from offline multi-task data that can generalize to unseen tasks with varying numbers of agents and targets is an attractive problem in many scenarios. Although aggregating general behavior patterns among multiple tasks as skills to improve policy transfer is a promising approach, two primary challenges hinder the further advancement of skill learning in offline multi-task MARL. Firstly, extracting general cooperative behaviors from various action sequences as common skills lacks bringing cooperative temporal knowledge into them. Secondly, existing works only involve common skills and can not adaptively choose independent knowledge as task-specific skills in each task for fine-grained action execution. To tackle these challenges, we propose Hierarchical and Separate Skill Discovery (HiSSD), a novel approach for generalizable offline multi-task MARL through skill learning. HiSSD leverages a hierarchical framework that jointly learns common and task-specific skills. The common skills learn cooperative temporal knowledge and enable in-sample exploitation for offline multi-task MARL. The task-specific skills represent the priors of each task and achieve a task-guided fine-grained action execution. To verify the advancement of our method, we conduct experiments on multi-agent MuJoCo and SMAC benchmarks. After training the policy using HiSSD on offline multi-task data, the empirical results show that HiSSD assigns effective cooperative behaviors and obtains superior performance in unseen tasks.
Atomfall, the survival game that draws from classic British sci-fi
The year is 1962 and you've just woken up in the shadow of the Windscale (now Sellafield) nuclear power station in Cumbria, five years after its catastrophic meltdown. Trapped in the sizeable quarantine zone surrounding the accident site, you must stay alive long enough to figure out how to escape – a task made rather more challenging by the presence of aggressive cultists, irradiated monsters and highly territorial terror bees. Imagine Stalker, but set in northern England, and you're edging towards what Oxford-based developer Rebellion has in store. Fallout may seem like another obvious inspiration for this irradiated game world, but after playing a two-hour demo, it's clear the game draws more from classic British sci-fi. Here you are, stuck in the picturesque Lake District, with its lush woodlands, gurgling rivers and dry-stone walls.
Few is More: Task-Efficient Skill-Discovery for Multi-Task Offline Multi-Agent Reinforcement Learning
Wang, Xun, Li, Zhuoran, Zhong, Hai, Huang, Longbo
As a data-driven approach, offline MARL learns superior policies solely from offline datasets, ideal for domains rich in historical data but with high interaction costs and risks. However, most existing methods are task-specific, requiring retraining for new tasks, leading to redundancy and inefficiency. To address this issue, in this paper, we propose a task-efficient multi-task offline MARL algorithm, Skill-Discovery Conservative Q-Learning (SD-CQL). Unlike existing offline skill-discovery methods, SD-CQL discovers skills by reconstructing the next observation. It then evaluates fixed and variable actions separately and employs behavior-regularized conservative Q-learning to execute the optimal action for each skill. This approach eliminates the need for local-global alignment and enables strong multi-task generalization from limited small-scale source tasks. Substantial experiments on StarCraftII demonstrates the superior generalization performance and task-efficiency of SD-CQL. It achieves the best performance on $\textbf{10}$ out of $14$ task sets, with up to $\textbf{65%}$ improvement on individual task sets, and is within $4\%$ of the best baseline on the remaining four.
Artificial Intelligence: Lockheed Martin and Red Hat to collaborate on Military Drone Systems
Lockheed Martin and Red Hat, Inc. announced their collaboration to advance artificial intelligence (AI) innovation on Lockheed Martin's unmanned military platforms. The adoption of newly developed Red Hat Device Edge technology will enable Lockheed Martin's unmanned systems to operate safely in geographically constrained environments and improves the processing of sensor-derived information. In a recent demonstration, Lockheed Martin used Red Hat Device Edge on a Stalker UAS to show how AI-enhanced sensing can advance joint operations across domains. The Stalker used onboard sensors and AI to adapt in real time to a threat environment. As reported by the company, the Stalker was flying an intelligence, surveillance and reconnaissance (ISR) mission to detect a simulated military target.
This facial recognition website can turn anyone into a cop -- or a stalker
In one PimEyes thread on 4chan from October, an anonymous user posted a digital collage, titled "Complete Exposure" and a woman's name, filled with sensitive details of their personal life. It was unclear whether all the photos had been surfaced by PimEyes, or even whether they were all of the same woman. But the collage was scarily comprehensive, including photos of her standing in the middle-school classroom where she teaches, her driver's license, school badge, wedding announcement, the outside of her home and her home address.
Briefly Noted Book Reviews
Philosophers have long debated the nature of consciousness. This probing study takes an evolutionary approach, examining "experience in general" not only in humans but in much of the animal kingdom. Animals, it argues, developed consciousness gradually, through such biological innovations as centralized nervous systems and the ability to distinguish one's actions from external forces, which have given rise to "varieties of subjectivity." The author is crisp on a subject notorious for abstraction, dissecting fuzzy philosophical metaphors and weaving in lively descriptions of the octopuses, whale sharks, and banded shrimp he observes on scuba dives off the coasts of Australia. Born in 1797 in Düsseldorf, then under Napoleonic occupation, Heine remained a committed liberal even as Germany turned inward after the Congress of Vienna.