compass
Combinatorial Optimization with Policy Adaptation using Latent Space Search
Combinatorial Optimization underpins many real-world applications and yet, designing performant algorithms to solve these complex, typically NP-hard, problems remains a significant research challenge. Reinforcement Learning (RL) provides a versatile framework for designing heuristics across a broad spectrum of problem domains. However, despite notable progress, RL has not yet supplanted industrial solvers as the go-to solution. Current approaches emphasize pre-training heuristics that construct solutions, but often rely on search procedures with limited variance, such as stochastically sampling numerous solutions from a single policy, or employing computationally expensive fine-tuning of the policy on individual problem instances. Building on the intuition that performant search at inference time should be anticipated during pre-training, we propose COMPASS, a novel RL approach that parameterizes a distribution of diverse and specialized policies conditioned on a continuous latent space. We evaluate COMPASS across three canonical problems - Travelling Salesman, Capacitated Vehicle Routing, and Job-Shop Scheduling - and demonstrate that our search strategy (i) outperforms state-of-the-art approaches in 9 out of 11 standard benchmarking tasks and (ii) generalizes better, surpassing all other approaches on a set of 18 procedurally transformed instance distributions.
- South America > Brazil > Paraná > Curitiba (0.04)
- North America > Canada (0.04)
- Europe > Italy > Emilia-Romagna > Metropolitan City of Bologna > Bologna (0.04)
- South America > Brazil > Paraná > Curitiba (0.04)
- North America > United States (0.04)
- North America > Canada (0.04)
- Europe > Italy > Emilia-Romagna > Metropolitan City of Bologna > Bologna (0.04)
COMPASS: Cooperative Multi-Agent Persistent Monitoring using Spatio-Temporal Attention Network
Zhang, Xingjian, Wang, Yizhuo, Sartoretti, Guillaume
Persistent monitoring of dynamic targets is essential in real-world applications such as disaster response, environmental sensing, and wildlife conservation, where mobile agents must continuously gather information under uncertainty. We propose COMPASS, a multi-agent reinforcement learning (MARL) framework that enables decentralized agents to persistently monitor multiple moving targets efficiently. We model the environment as a graph, where nodes represent spatial locations and edges capture topological proximity, allowing agents to reason over structured layouts and revisit informative regions as needed. Each agent independently selects actions based on a shared spatio-temporal attention network that we design to integrate historical observations and spatial context. We model target dynamics using Gaussian Processes (GPs), which support principled belief updates and enable uncertainty-aware planning. We train COMPASS using centralized value estimation and decentralized policy execution under an adaptive reward setting. Our extensive experiments demonstrate that COMPASS consistently outperforms strong baselines in uncertainty reduction, target coverage, and coordination efficiency across dynamic multi-target scenarios.
COMPASS: Enhancing Agent Long-Horizon Reasoning with Evolving Context
Wan, Guangya, Ling, Mingyang, Ren, Xiaoqi, Han, Rujun, Li, Sheng, Zhang, Zizhao
Long-horizon tasks that require sustained reasoning and multiple tool interactions remain challenging for LLM agents: small errors compound across steps, and even state-of-the-art models often hallucinate or lose coherence. We identify context management as the central bottleneck -- extended histories cause agents to overlook critical evidence or become distracted by irrelevant information, thus failing to replan or reflect from previous mistakes. To address this, we propose COMPASS (Context-Organized Multi-Agent Planning and Strategy System), a lightweight hierarchical framework that separates tactical execution, strategic oversight, and context organization into three specialized components: (1) a Main Agent that performs reasoning and tool use, (2) a Meta-Thinker that monitors progress and issues strategic interventions, and (3) a Context Manager that maintains concise, relevant progress briefs for different reasoning stages. Across three challenging benchmarks -- GAIA, BrowseComp, and Humanity's Last Exam -- COMPASS improves accuracy by up to 20% relative to both single- and multi-agent baselines. We further introduce a test-time scaling extension that elevates performance to match established DeepResearch agents, and a post-training pipeline that delegates context management to smaller models for enhanced efficiency.
- Europe > Romania (0.05)
- Europe > Ireland (0.05)
- South America > Brazil (0.05)
- (5 more...)
- Workflow (1.00)
- Research Report > New Finding (0.67)
- Leisure & Entertainment > Sports > Soccer (0.94)
- Information Technology (0.93)
Learning to Orient Surfaces by Self-supervised Spherical CNNs (Supplementary Material)
In this section, we study how the data augmentation carried out while training on local surface patches improves the robustness of Compass against self-occlusions and missing parts. Results for 3DMatch are shown in Table 1: the performance gain achieved by Compass when deploying the proposed data augmentation validates its importance. Indeed, without the proposed augmentation FLARE performs better than Compass on this dataset. Compass on 3DMatch and 3DMatch rotated. CNNs, we are able to achieve a similar performance on both datasets.
- South America > Brazil > Paraná > Curitiba (0.04)
- North America > Canada (0.04)
- Europe > Italy > Emilia-Romagna > Metropolitan City of Bologna > Bologna (0.04)
- South America > Brazil > Paraná > Curitiba (0.04)
- North America > United States (0.04)
- North America > Canada (0.04)
- Europe > Italy > Emilia-Romagna > Metropolitan City of Bologna > Bologna (0.04)
we propose a more general framework that can also be adopted to orient whole objects and perform rotation-invariant
Moreover, as recently shown in Bai et al . in "D3Feat: Joint Learning of Dense Detection We will add this information to the revised version. As suggested, we will use only the term orientation . We will modify it in the final version of the paper. We agree that the domain of Spherical CNNs feature maps is key and we will better highlight it in the final version. Since we seek for one rotation, the loss function in (6) is applied once, and only to the last layer of the network.
Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models
Liu, Runze, Wang, Jiakang, Shi, Yuling, Xie, Zhihui, An, Chenxin, Zhang, Kaiyan, Zhao, Jian, Gu, Xiaodong, Lin, Lei, Hu, Wenping, Li, Xiu, Zhang, Fuzheng, Zhou, Guorui, Gai, Kun
Reinforcement Learning (RL) has shown remarkable success in enhancing the reasoning capabilities of Large Language Models (LLMs). Process-Supervised RL (PSRL) has emerged as a more effective paradigm compared to outcome-based RL. However, existing PSRL approaches suffer from limited exploration efficiency, both in terms of branching positions and sampling. In this paper, we introduce a novel PSRL framework (AttnRL), which enables efficient exploration for reasoning models. Motivated by preliminary observations that steps exhibiting high attention scores correlate with reasoning behaviors, we propose to branch from positions with high values. Furthermore, we develop an adaptive sampling strategy that accounts for problem difficulty and historical batch size, ensuring that the whole training batch maintains non-zero advantage values. To further improve sampling efficiency, we design a one-step off-policy training pipeline for PSRL. Extensive experiments on multiple challenging mathematical reasoning benchmarks demonstrate that our method consistently outperforms prior approaches in terms of performance and sampling and training efficiency.
- Europe > Austria > Vienna (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- (4 more...)