Search
Learning-Based TSP-Solvers Tend to Be Overly Greedy
Deep learning has shown significant potential in solving combinatorial optimization problems such as the Euclidean traveling salesman problem (TSP). However, most training and test instances for existing TSP algorithms are generated randomly from specific distributions like uniform distribution. This has led to a lack of analysis and understanding of the performance of deep learning algorithms in out-of-distribution (OOD) generalization scenarios, which has a close relationship with the worst-case performance in the combinatorial optimization field. For data-driven algorithms, the statistical properties of randomly generated datasets are critical. This study constructs a statistical measure called nearest-neighbor density to verify the asymptotic properties of randomly generated datasets and reveal the greedy behavior of learning-based solvers, i.e., always choosing the nearest neighbor nodes to construct the solution path. Based on this statistical measure, we develop interpretable data augmentation methods that rely on distribution shifts or instance perturbations and validate that the performance of the learning-based solvers degenerates much on such augmented data. Moreover, fine-tuning learning-based solvers with augmented data further enhances their generalization abilities. In short, we decipher the limitations of learning-based TSP solvers tending to be overly greedy, which may have profound implications for AI-empowered combinatorial optimization solvers.
Lipschitz Lifelong Monte Carlo Tree Search for Mastering Non-Stationary Tasks
Monte Carlo Tree Search (MCTS) has proven highly effective in solving complex planning tasks by balancing exploration and exploitation using Upper Confidence Bound for Trees (UCT). However, existing work have not considered MCTS-based lifelong planning, where an agent faces a non-stationary series of tasks -- e.g., with varying transition probabilities and rewards -- that are drawn sequentially throughout the operational lifetime. This paper presents LiZero for Lipschitz lifelong planning using MCTS. We propose a novel concept of adaptive UCT (aUCT) to transfer knowledge from a source task to the exploration/exploitation of a new task, depending on both the Lipschitz continuity between tasks and the confidence of knowledge in in Monte Carlo action sampling. We analyze LiZero's acceleration factor in terms of improved sampling efficiency and also develop efficient algorithms to compute aUCT in an online fashion by both data-driven and model-based approaches, whose sampling complexity and error bounds are also characterized. Experiment results show that LiZero significantly outperforms existing MCTS and lifelong learning baselines in terms of much faster convergence (3$\sim$4x) to optimal rewards. Our results highlight the potential of LiZero to advance decision-making and planning in dynamic real-world environments.
Doubly Robust Monte Carlo Tree Search
We present Doubly Robust Monte Carlo Tree Search (DR-MCTS), a novel algorithm that integrates Doubly Robust (DR) off-policy estimation into Monte Carlo Tree Search (MCTS) to enhance sample efficiency and decision quality in complex environments. Our approach introduces a hybrid estimator that combines MCTS rollouts with DR estimation, offering theoretical guarantees of unbiasedness and variance reduction under specified conditions. Empirical evaluations in Tic-Tac-Toe and the partially observable VirtualHome environment demonstrate DR-MCTS's superior performance over standard MCTS. In Tic-Tac-Toe, DR-MCTS achieves an 88% win rate compared to a 10% win rate for standard MCTS. In compound VirtualHome tasks, DR-MCTS attains a 20.7% success rate versus 10.3% for standard MCTS. Our scaling analysis reveals that DR-MCTS exhibits better sample efficiency, notably outperforming standard MCTS with larger language models while using a smaller model. These results underscore DR-MCTS's potential for efficient decision-making in complex, real-world scenarios where sample efficiency is paramount.
A Direct Semi-Exhaustive Search Method for Robust, Partial-to-Full Point Cloud Registration
Cheng, Richard, Papozov, Chavdar, Helmick, Dan, Tjersland, Mark
Point cloud registration refers to the problem of finding the rigid transformation that aligns two given point clouds, and is crucial for many applications in robotics and computer vision. The main insight of this paper is that we can directly optimize the point cloud registration problem without correspondences by utilizing an algorithmically simple, yet computationally complex, semi-exhaustive search approach that is very well-suited for parallelization on modern GPUs. Our proposed algorithm, Direct Semi-Exhaustive Search (DSES), iterates over potential rotation matrices and efficiently computes the inlier-maximizing translation associated with each rotation. It then computes the optimal rigid transformation based on any desired distance metric by directly computing the error associated with each transformation candidate $\{R, t\}$. By leveraging the parallelism of modern GPUs, DSES outperforms state-of-the-art methods for partial-to-full point cloud registration on the simulated ModelNet40 benchmark and demonstrates high performance and robustness for pose estimation on a real-world robotics problem (https://youtu.be/q0q2-s2KSuA).
OneBatchPAM: A Fast and Frugal K-Medoids Algorithm
de Mathelin, Antoine, Cecchi, Nicolas Enrique, Deheeger, François, Mougeot, Mathilde, Vayatis, Nicolas
This paper proposes a novel k-medoids approximation algorithm to handle large-scale datasets with reasonable computational time and memory complexity. We develop a local-search algorithm that iteratively improves the medoid selection based on the estimation of the k-medoids objective. A single batch of size m << n provides the estimation, which reduces the required memory size and the number of pairwise dissimilarities computations to O(mn), instead of O(n^2) compared to most k-medoids baselines. We obtain theoretical results highlighting that a batch of size m = O(log(n)) is sufficient to guarantee, with strong probability, the same performance as the original local-search algorithm. Multiple experiments conducted on real datasets of various sizes and dimensions show that our algorithm provides similar performances as state-of-the-art methods such as FasterPAM and BanditPAM++ with a drastically reduced running time.
KBQA-o1: Agentic Knowledge Base Question Answering with Monte Carlo Tree Search
Luo, Haoran, E, Haihong, Guo, Yikai, Lin, Qika, Wu, Xiaobao, Mu, Xinyu, Liu, Wenhao, Song, Meina, Zhu, Yifan, Tuan, Luu Anh
Knowledge Base Question Answering (KBQA) aims to answer natural language questions with a large-scale structured knowledge base (KB). Despite advancements with large language models (LLMs), KBQA still faces challenges in weak KB awareness, imbalance between effectiveness and efficiency, and high reliance on annotated data. To address these challenges, we propose KBQA-o1, a novel agentic KBQA method with Monte Carlo Tree Search (MCTS). It introduces a ReAct-based agent process for stepwise logical form generation with KB environment exploration. Moreover, it employs MCTS, a heuristic search method driven by policy and reward models, to balance agentic exploration's performance and search space. With heuristic exploration, KBQA-o1 generates high-quality annotations for further improvement by incremental fine-tuning. Experimental results show that KBQA-o1 outperforms previous low-resource KBQA methods with limited annotated data, boosting Llama-3.1-8B model's GrailQA F1 performance to 78.5% compared to 48.5% of the previous sota method with GPT-3.5-turbo.
Node Classification and Search on the Rubik's Cube Graph with GNNs
This study focuses on the application of deep geometric models to solve the 3x3x3 Rubik's Cube. We begin by discussing the cube's graph representation and defining distance as the model's optimization objective. The distance approximation task is reformulated as a node classification problem, effectively addressed using Graph Neural Networks (GNNs). After training the model on a random subgraph, the predicted classes are used to construct a heuristic for $A^*$ search. We conclude with experiments comparing our heuristic to that of the DeepCubeA model.
A Variational Perspective on Generative Protein Fitness Optimization
Bogensperger, Lea, Narnhofer, Dominik, Allam, Ahmed, Schindler, Konrad, Krauthammer, Michael
The goal of protein fitness optimization is to discover new protein variants with enhanced fitness for a given use. The vast search space and the sparsely populated fitness landscape, along with the discrete nature of protein sequences, pose significant challenges when trying to determine the gradient towards configurations with higher fitness. We introduce Variational Latent Generative Protein Optimization (VLGPO), a variational perspective on fitness optimization. Our method embeds protein sequences in a continuous latent space to enable efficient sampling from the fitness distribution and combines a (learned) flow matching prior over sequence mutations with a fitness predictor to guide optimization towards sequences with high fitness. VLGPO achieves state-of-the-art results on two different protein benchmarks of varying complexity. Moreover, the variational design with explicit prior and likelihood functions offers a flexible plug-and-play framework that can be easily customized to suit various protein design tasks.
Scaling Flaws of Verifier-Guided Search in Mathematical Reasoning
Yu, Fei, Li, Yingru, Wang, Benyou
Large language models (LLMs) struggle with multi-step reasoning, where inference-time scaling has emerged as a promising strategy for performance improvement. Verifier-guided search outperforms repeated sampling when sample size is limited by selecting and prioritizing valid reasoning paths. However, we identify a critical limitation: scaling flaws, prevalent across different models (Mistral 7B and DeepSeekMath 7B), benchmarks (GSM8K and MATH), and verifiers (outcome value models and process reward models). As sample size increases, verifier-guided search exhibits diminishing advantages and eventually underperforms repeated sampling. Our analysis attributes this to verifier failures, where imperfect verifiers misrank candidates and erroneously prune all valid paths. These issues are further exacerbated in challenging and out-of-distribution problems, restricting search effectiveness. To mitigate verifier failures, we explore reducing reliance on verifiers and conduct preliminary investigations using two simple methods. Our findings reveal fundamental limitations in verifier-guided search and suggest future directions.
Robot localization aided by quantum algorithms
Antero, Unai, Sierra, Basilio, Oñativia, Jon, Ruiz, Alejandra, Osaba, Eneko
Localization is a vital aspect of mobile robotics, enabling robots to navigate their environment efficiently and avoid obstacles. Without localization, mobile robots would be unable to determine their position and orientation, making it challenging to plan a path or make informed decisions about their movement (Olson [2000]). Localization allows mobile robots to create an internal map of their environment, which is essential for tasks such as surveying, manipulation, inspection, and delivery (Huang and Lin [2023]). In fact, localization is what enables mobile robots to perform tasks autonomously, making informed decisions about their actions and movements without human intervention. The quality of localization is heavily dependent on the generation of accurate maps, which is a computationally intensive task. Probabilistic localization methods, such as the Adaptive-Monte Carlo localization (AMCL) algorithm, have been widely used in mobile robotics due to their accuracy and robustness (Kristensen and Jensfelt [2003]). However, these methods can be computationally demanding, especially when dealing with large maps or high-resolution sensor data. AMCL, in particular, uses a combination of sensor data and prior map knowledge to determine the probable location of a robot on a given map, but its computation complexity is proportional to the area of the grid of the map (Alshikh Khalil and Hatem [2022]). Recently, the integration of light detection and ranging (LiDAR) sensors has improved the accuracy of localization methods, but the computational requirements remain a challenge (Huang and Lin [2023]).