Search
TOAST: Fast and scalable auto-partitioning based on principled static analysis
Alabed, Sami, Grewe, Dominik, Rink, Norman Alexander, Samsikova, Masha, Sitdikov, Timur, Swietlik, Agnieszka, Vytiniotis, Dimitrios, Belov, Daniel
Partitioning large machine learning models across distributed accelerator systems is a complex process, requiring a series of interdependent decisions that are further complicated by internal sharding ambiguities. Consequently, existing auto-partitioners often suffer from out-of-memory errors or are prohibitively slow when exploring the exponentially large space of possible partitionings. To mitigate this, they artificially restrict the search space, but this approach frequently yields infeasible solutions that violate device memory constraints or lead to sub-optimal performance. We propose a system that combines a novel static compiler analysis with a Monte Carlo Tree Search. Our analysis constructs an efficient decision space by identifying (i) tensor dimensions requiring identical sharding, and (ii) partitioning "conflicts" that require resolution. Our system significantly outperforms state-of-the-art industrial methods across diverse hardware platforms and model architectures, discovering previously unknown, superior solutions, and the process is fully automated even for complex and large models.
e-boost: Boosted E-Graph Extraction with Adaptive Heuristics and Exact Solving
Yin, Jiaqi, Song, Zhan, Chen, Chen, Cai, Yaohui, Zhang, Zhiru, Yu, Cunxi
E-graphs have attracted growing interest in many fields, particularly in logic synthesis and formal verification. E-graph extraction is a challenging NP-hard combinatorial optimization problem. It requires identifying optimal terms from exponentially many equivalent expressions, serving as the primary performance bottleneck in e-graph based optimization tasks. However, traditional extraction methods face a critical trade-off: heuristic approaches offer speed but sacrifice optimality, while exact methods provide optimal solutions but face prohibitive computational costs on practical problems. We present e-boost, a novel framework that bridges this gap through three key innovations: (1) parallelized heuristic extraction that leverages weak data dependence to compute DAG costs concurrently, enabling efficient multi-threaded performance without sacrificing extraction quality; (2) adaptive search space pruning that employs a parameterized threshold mechanism to retain only promising candidates, dramatically reducing the solution space while preserving near-optimal solutions; and (3) initialized exact solving that formulates the reduced problem as an Integer Linear Program with warm-start capabilities, guiding solvers toward high-quality solutions faster. Across the diverse benchmarks in formal verification and logic synthesis fields, e-boost demonstrates 558x runtime speedup over traditional exact approaches (ILP) and 19.04% performance improvement over the state-of-the-art extraction framework (SmoothE). In realistic logic synthesis tasks, e-boost produces 7.6% and 8.1% area improvements compared to conventional synthesis tools with two different technology mapping libraries. e-boost is available at https://github.com/Yu-Maryland/e-boost.
SPL-LNS: Sampling-Enhanced Large Neighborhood Search for Solving Integer Linear Programs
Feng, Shengyu, Sun, Zhiqing, Yang, Yiming
Large Neighborhood Search (LNS) is a common heuristic in combinatorial optimization that iteratively searches over a large neighborhood of the current solution for a better one. Recently, neural network-based LNS solvers have achieved great success in solving Integer Linear Programs (ILPs) by learning to greedily predict the locally optimal solution for the next neighborhood proposal. However, this greedy approach raises two key concerns: (1) to what extent this greedy proposal suffers from local optima, and (2) how can we effectively improve its sample efficiency in the long run . To address these questions, this paper first formulates LNS as a stochastic process, and then introduces SPL-LNS, a sampling-enhanced neural LNS solver that leverages locally-informed proposals to escape local optima. We also develop a novel hindsight relabeling method to efficiently train SPL-LNS on self-generated data. Experimental results demonstrate that SPL-LNS substantially surpasses prior neural LNS solvers for various ILP problems of different sizes.
An Efficient Hybridization of Graph Representation Learning and Metaheuristics for the Constrained Incremental Graph Drawing Problem
Charytitsch, Bruna C. B., Nascimento, Marรญa C. V.
Hybridizing machine learning techniques with metaheuristics has attracted significant attention in recent years. Many attempts employ supervised or reinforcement learning to support the decision-making of heuristic methods. However, in some cases, these techniques are deemed too time-consuming and not competitive with hand-crafted heuristics. This paper proposes a hybridization between metaheuristics and a less expensive learning strategy to extract the latent structure of graphs, known as Graph Representation Learning (GRL). For such, we approach the Constrained Incremental Graph Drawing Problem (C-IGDP), a hierarchical graph visualization problem. There is limited literature on methods for this problem, for which Greedy Randomized Search Procedures (GRASP) heuristics have shown promising results. In line with this, this paper investigates the gains of incorporating GRL into the construction phase of GRASP, which we refer to as Graph Learning GRASP (GL-GRASP). In computational experiments, we first analyze the results achieved considering different node embedding techniques, where deep learning-based strategies stood out. The evaluation considered the primal integral measure that assesses the quality of the solutions according to the required time for such. According to this measure, the best GL-GRASP heuristics demonstrated superior performance than state-of-the-art literature GRASP heuristics for the problem. A scalability test on newly generated denser instances under a fixed time limit further confirmed the robustness of the GL-GRASP heuristics.
Hyper Yoshimura: How a slight tweak on a classical folding pattern unleashes meta-stability for deployable robots
Zhou, Ziyang, Phalak, Yogesh, Deshpande, Vishrut, O'Brien, Ethan, Walker, Ian, Li, Suyi
Deployable structures inspired by origami have provided lightweight, compact, and reconfigurable solutions for various robotic and architectural applications. However, creating an integrated structural system that can effectively balance the competing requirements of high packing efficiency, simple deployment, and precise morphing into multiple load-bearing configurations remains a significant challenge. This study introduces a new class of hyper-Yoshimura origami, which exhibits a wide range of kinematically admissible and locally metastable states, including newly discovered symmetric "self-packing" and asymmetric "pop-out" states. This metastability is achieved by breaking a design rule of Yoshimura origami that has been in place for many decades. To this end, this study derives a new set of mathematically rigorous design rules and geometric formulations. Based on this, forward and inverse kinematic strategies are developed to stack hyper-Yoshimura modules into deployable booms that can approximate complex 3D shapes. Finally, this study showcases the potential of hyper-Yoshimura with a meter-scale pop-up cellphone charging station deployed at our university's bus transit station, along with a 3D-printed, scaled prototype of a space crane that can function as an object manipulator, solar tracking device, or high-load-bearing structure. These results establish hyper-Yoshimura as a promising platform for deployable and adaptable robotic systems in both terrestrial and space environments.
CO-Bench: Benchmarking Language Model Agents in Algorithm Search for Combinatorial Optimization
Sun, Weiwei, Feng, Shengyu, Li, Shanda, Yang, Yiming
Although LLM-based agents have attracted significant attention in domains such as software engineering and machine learning research, their role in advancing combinatorial optimization (CO) remains relatively underexplored. This gap underscores the need for a deeper understanding of their potential in tackling structured, constraint-intensive problems -- a pursuit currently limited by the absence of comprehensive benchmarks for systematic investigation. To address this, we introduce CO-Bench, a benchmark suite featuring 36 real-world CO problems drawn from a broad range of domains and complexity levels. CO-Bench includes structured problem formulations and curated data to support rigorous investigation of LLM agents. We evaluate multiple agentic frameworks against established human-designed algorithms, revealing the strengths and limitations of existing LLM agents and identifying promising directions for future research. CO-Bench is publicly available at https://github.com/sunnweiwei/CO-Bench.