Search
Practical Topics in Optimization
In an era where data-driven decision-making and computational efficiency are paramount, optimization plays a foundational role in advancing fields such as mathematics, computer science, operations research, machine learning, and beyond. From refining machine learning models to improving resource allocation and designing efficient algorithms, optimization techniques serve as essential tools for tackling complex problems. This book aims to provide both an introductory guide and a comprehensive reference, equipping readers with the necessary knowledge to understand and apply optimization methods within their respective fields. Our primary goal is to demystify the inner workings of optimization algorithms, including black-box and stochastic optimizers, by offering both formal and intuitive explanations. Starting from fundamental mathematical principles, we derive key results to ensure that readers not only learn how these techniques work but also understand when and why to apply them effectively. By striking a careful balance between theoretical depth and practical application, this book serves a broad audience, from students and researchers to practitioners seeking robust optimization strategies.
TSS GAZ PTP: Towards Improving Gumbel AlphaZero with Two-stage Self-play for Multi-constrained Electric Vehicle Routing Problems
Wang, Hui, Zhang, Xufeng, Zhang, Xiaoyu, Ding, Zhenhuan, Mu, Chaoxu
Recently, Gumbel AlphaZero (GAZ) was proposed to solve classic combinatorial optimization problems such as TSP and JSSP by creating a carefully designed competition model (consisting of a learning player and a competitor player), which leverages the idea of self-play. However, if the competitor is too strong or too weak, the effectiveness of self-play training can be reduced, particularly in complex CO problems. To address this problem, we further propose a two-stage self-play strategy to improve the GAZ method (named TSS GAZ PTP). In the first stage, the learning player uses the enhanced policy network based on the Gumbel Monte Carlo Tree Search (MCTS), and the competitor uses the historical best trained policy network (acts as a greedy player). In the second stage, we employ Gumbel MCTS for both players, which makes the competition fiercer so that both players can continuously learn smarter trajectories. We first investigate the performance of our proposed TSS GAZ PTP method on TSP since it is also used as a test problem by the original GAZ. The results show the superior performance of TSS GAZ PTP . Then we extend TSS GAZ PTP to deal with multi-constrained Electric V ehicle Routing Problems (EVRP), which is a recently well-known real application research topic and remains challenging as a complex CO problem. Impressively, the experimental results show that the TSS GAZ PTP outperforms the state-of-the-art Deep Reinforcement Learning methods in all types of instances tested and outperforms the optimization solver in tested large-scale instances, indicating the importance and promising of employing more dynamic self-play strategies for complex CO problems.
Leveraging Constrained Monte Carlo Tree Search to Generate Reliable Long Chain-of-Thought for Mathematical Reasoning
Lin, Qingwen, Xu, Boyan, Li, Zijian, Hao, Zhifeng, Zhang, Keli, Cai, Ruichu
Recently, Long Chain-of-Thoughts (CoTs) have gained widespread attention for improving the reasoning capabilities of Large Language Models (LLMs). This necessitates that existing LLMs, which lack the ability to generate Long CoTs, to acquire such capability through post-training methods. Without additional training, LLMs typically enhance their mathematical reasoning abilities through inference scaling methods such as MCTS. However, they are hindered by the large action space and inefficient search strategies, making it challenging to generate Long CoTs effectively. To tackle this issue, we propose constraining the action space and guiding the emergence of Long CoTs through a refined search strategy. In our proposed Constrained Monte Carlo Tree Search (C-MCTS) framework, we limit the actions selected from a constrained action space, which is divided into five disjoint subsets: \emph{understanding}, \emph{planning}, \emph{reflection}, \emph{coding}, and \emph{summary}. Each subset is further constrained to a small number of predefined prompts, rather than allowing LLMs to generate actions arbitrarily. Additionally, we refine the search strategy by incorporating prior knowledge about the action sets, such as a human-like partial order of the action subsets and the pretrained process reward models. These strategies work together to significantly reduce the vast search space of Long CoTs. Extensive evaluations on mathematical reasoning benchmarks show that, under zero-shot settings, our method enables the 7B model to achieve reasoning capabilities that surpass those of the 72B model.
Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration Pitfalls
Wang, Ante, Song, Linfeng, Tian, Ye, Yu, Dian, Mi, Haitao, Duan, Xiangyu, Tu, Zhaopeng, Su, Jinsong, Yu, Dong
Recent advancements in tree search algorithms guided by verifiers have significantly enhanced the reasoning capabilities of large language models (LLMs), but at the cost of increased computational resources. In this work, we identify two key challenges contributing to this inefficiency: $\textit{over-exploration}$ due to redundant states with semantically equivalent content, and $\textit{under-exploration}$ caused by high variance in verifier scoring leading to frequent trajectory switching. To address these issues, we propose FETCH, an e$\textbf{f}$fici$\textbf{e}$nt $\textbf{t}$ree sear$\textbf{ch}$ framework, which is a flexible, plug-and-play system compatible with various tree search algorithms. Our framework mitigates over-exploration by merging semantically similar states using agglomerative clustering of text embeddings obtained from a fine-tuned SimCSE model. To tackle under-exploration, we enhance verifiers by incorporating temporal difference learning with adjusted $\lambda$-returns during training to reduce variance, and employing a verifier ensemble to aggregate scores during inference. Experiments on GSM8K, GSM-Plus, and MATH datasets demonstrate that our methods significantly improve reasoning accuracy and computational efficiency across four different tree search algorithms, paving the way for more practical applications of LLM-based reasoning. The code will be released upon acceptance.
Game-Of-Goals: Using adversarial games to achieve strategic resilience
Our objective in this paper is to develop a machinery that makes a given organizational strategic plan resilient to the actions of competitor agents (adverse environmental actions). We assume that we are given a goal tree representing strategic goals (can also be seen business requirements for a software systems) with the assumption that competitor agents are behaving in a maximally adversarial fashion(opposing actions against our sub goals or goals in general). We use game tree search methods (such as minimax) to select an optimal execution strategy(at a given point in time), such that it can maximize our chances of achieving our (high level) strategic goals. Our machinery helps us determine which path to follow(strategy selection) to achieve the best end outcome. This is done by comparing alternative execution strategies available to us via an evaluation function. Our evaluation function is based on the idea that we want to make our execution plans defensible(future-proof) by selecting execution strategies that make us least vulnerable to adversarial actions by the competitor agents.
Training-Free Guidance Beyond Differentiability: Scalable Path Steering with Tree Search in Diffusion and Flow Models
Guo, Yingqing, Yang, Yukang, Yuan, Hui, Wang, Mengdi
Training-free guidance enables controlled generation in diffusion and flow models, but most existing methods assume differentiable objectives and rely on gradients. This work focuses on training-free guidance addressing challenges from non-differentiable objectives and discrete data distributions. We propose an algorithmic framework TreeG: Tree Search-Based Path Steering Guidance, applicable to both continuous and discrete settings in diffusion and flow models. TreeG offers a unified perspective on training-free guidance: proposing candidates for the next step, evaluating candidates, and selecting the best to move forward, enhanced by a tree search mechanism over active paths or parallelizing exploration. We comprehensively investigate the design space of TreeG over the candidate proposal module and the evaluation function, instantiating TreeG into three novel algorithms. Our experiments show that TreeG consistently outperforms the top guidance baselines in symbolic music generation, small molecule generation, and enhancer DNA design, all of which involve non-differentiable challenges. Additionally, we identify an inference-time scaling law showing TreeG's scalability in inference-time computation.
Planning of Heuristics: Strategic Planning on Large Language Models with Monte Carlo Tree Search for Automating Heuristic Optimization
Mu, Chaoxu, Zhang, Xufeng, Wang, Hui
Heuristics have achieved great success in solving combinatorial optimization problems (COPs). However, heuristics designed by humans require too much domain knowledge and testing time. Given the fact that Large Language Models (LLMs) possess strong capabilities to understand and generate content, and a knowledge base that covers various domains, which offer a novel way to automatically optimize heuristics. Therefore, we propose Planning of Heuristics (PoH), an optimization method that integrates the self-reflection of LLMs with the Monte Carlo Tree Search (MCTS), a well-known planning algorithm. PoH iteratively refines generated heuristics by evaluating their performance and providing improvement suggestions. Our method enables to iteratively evaluate the generated heuristics (states) and improve them based on the improvement suggestions (actions) and evaluation results (rewards), by effectively simulating future states to search for paths with higher rewards. In this paper, we apply PoH to solve the Traveling Salesman Problem (TSP) and the Flow Shop Scheduling Problem (FSSP). The experimental results show that PoH outperforms other hand-crafted heuristics and Automatic Heuristic Design (AHD) by other LLMs-based methods, and achieves the significant improvements and the state-of-the-art performance of our proposed method in automating heuristic optimization with LLMs to solve COPs.
Boosting Generalization in Diffusion-Based Neural Combinatorial Solver via Energy-guided Sampling
Lei, Haoyu, Zhou, Kaiwen, Li, Yinchuan, Chen, Zhitang, Farnia, Farzan
Diffusion-based Neural Combinatorial Optimization (NCO) has demonstrated effectiveness in solving NP-complete (NPC) problems by learning discrete diffusion models for solution generation, eliminating hand-crafted domain knowledge. Despite their success, existing NCO methods face significant challenges in both cross-scale and cross-problem generalization, and high training costs compared to traditional solvers. While recent studies have introduced training-free guidance approaches that leverage pre-defined guidance functions for zero-shot conditional generation, such methodologies have not been extensively explored in combinatorial optimization. To bridge this gap, we propose a general energy-guided sampling framework during inference time that enhances both the cross-scale and cross-problem generalization capabilities of diffusion-based NCO solvers without requiring additional training. We provide theoretical analysis that helps understanding the cross-problem transfer capability. Our experimental results demonstrate that a diffusion solver, trained exclusively on the Traveling Salesman Problem (TSP), can achieve competitive zero-shot solution generation on TSP variants, such as Prize Collecting TSP (PCTSP) and the Orienteering Problem (OP), through energy-guided sampling across different problem scales.
Recent Advances in Malware Detection: Graph Learning and Explainability
Shokouhinejad, Hossein, Razavi-Far, Roozbeh, Mohammadian, Hesamodin, Rabbani, Mahdi, Ansong, Samuel, Higgins, Griffin, Ghorbani, Ali A
The rapid evolution of malware has necessitated the development of sophisticated detection methods that go beyond traditional signature-based approaches. Graph learning techniques have emerged as powerful tools for modeling and analyzing the complex relationships inherent in malware behavior, leveraging advancements in Graph Neural Networks (GNNs) and related methods. This survey provides a comprehensive exploration of recent advances in malware detection, focusing on the interplay between graph learning and explainability. It begins by reviewing malware analysis techniques and datasets, emphasizing their foundational role in understanding malware behavior and supporting detection strategies. The survey then discusses feature engineering, graph reduction, and graph embedding methods, highlighting their significance in transforming raw data into actionable insights, while ensuring scalability and efficiency. Furthermore, this survey focuses on explainability techniques and their applications in malware detection, ensuring transparency and trustworthiness. By integrating these components, this survey demonstrates how graph learning and explainability contribute to building robust, interpretable, and scalable malware detection systems. Future research directions are outlined to address existing challenges and unlock new opportunities in this critical area of cybersecurity.
DiSciPLE: Learning Interpretable Programs for Scientific Visual Discovery
Mall, Utkarsh, Phoo, Cheng Perng, Chiquier, Mia, Hariharan, Bharath, Bala, Kavita, Vondrick, Carl
Visual data is used in numerous different scientific workflows ranging from remote sensing to ecology. As the amount of observation data increases, the challenge is not just to make accurate predictions but also to understand the underlying mechanisms for those predictions. Good interpretation is important in scientific workflows, as it allows for better decision-making by providing insights into the data. This paper introduces an automatic way of obtaining such interpretable-by-design models, by learning programs that interleave neural networks. We propose DiSciPLE (Discovering Scientific Programs using LLMs and Evolution) an evolutionary algorithm that leverages common sense and prior knowledge of large language models (LLMs) to create Python programs explaining visual data. Additionally, we propose two improvements: a program critic and a program simplifier to improve our method further to synthesize good programs. On three different real-world problems, DiSciPLE learns state-of-the-art programs on novel tasks with no prior literature. For example, we can learn programs with 35% lower error than the closest non-interpretable baseline for population density estimation.