Optimization
Online Learning with Probing for Sequential User-Centric Selection
Xu, Tianyi, Chen, Yiting, Li, Henger, Bian, Zheyong, Dall'Anese, Emiliano, Zheng, Zizhan
We formalize sequential decision-making with information acquisition as the probing-augmented user-centric selection (PUCS) framework, where a learner first probes a subset of arms to obtain side information on resources and rewards, and then assigns $K$ plays to $M$ arms. PUCS covers applications such as ridesharing, wireless scheduling, and content recommendation, in which both resources and payoffs are initially unknown and probing is costly. For the offline setting with known distributions, we present a greedy probing algorithm with a constant-factor approximation guarantee $ฮถ= (e-1)/(2e-1)$. For the online setting with unknown distributions, we introduce OLPA, a stochastic combinatorial bandit algorithm that achieves a regret bound $\mathcal{O}(\sqrt{T} + \ln^{2} T)$. We also prove a lower bound $ฮฉ(\sqrt{T})$, showing that the upper bound is tight up to logarithmic factors. Experiments on real-world data demonstrate the effectiveness of our solutions.
ADMIRE-BayesOpt: Accelerated Data MIxture RE-weighting for Language Models with Bayesian Optimization
Chen, Shengzhuang, Ouyang, Xu, Pearce, Michael Arthur Leopold, Hartvigsen, Thomas, Schwarz, Jonathan Richard
Determining the optimal data mixture for large language model training remains a challenging problem with an outsized impact on performance. In practice, language model developers continue to rely on heuristic exploration since no learning-based approach has emerged as a reliable solution. In this work, we propose to view the selection of training data mixtures as a black-box hyperparameter optimization problem, for which Bayesian Optimization is a well-established class of appropriate algorithms. Firstly, we cast data mixture learning as a sequential decision-making problem, in which we aim to find a suitable trade-off between the computational cost of training exploratory (proxy-) models and final mixture performance. Secondly, we systematically explore the properties of transferring mixtures learned at a small scale to larger-scale experiments, providing insights and highlighting opportunities for research at a modest scale. By proposing Multi-fidelity Bayesian Optimization as a suitable method in this common scenario, we introduce a natural framework to balance experiment cost with model fit, avoiding the risks of overfitting to smaller scales while minimizing the number of experiments at high cost. We present results for pre-training and instruction finetuning across models ranging from 1 million to 7 billion parameters, varying from simple architectures to state-of-the-art models and benchmarks spanning dozens of datasets. We demonstrate consistently strong results relative to a wide range of baselines, resulting inspeed-ups of over 500% in determining the best data mixture on our largest experiments. In addition, we broaden access to research by sharing ADMIRE IFT Runs, a dataset of 460 full training & evaluation runs worth over 13,000 GPU hours, greatly reducing the cost of conducting research in this area.
SparseMap: A Sparse Tensor Accelerator Framework Based on Evolution Strategy
Zhao, Boran, Zhai, Haiming, Yuan, Zihang, Liu, Hetian, Xia, Tian, Zhao, Wenzhe, Ren, Pengju
The growing demand for sparse tensor algebra (SpTA) in machine learning and big data has driven the development of various sparse tensor accelerators. However, most existing manually designed accelerators are limited to specific scenarios, and it's time-consuming and challenging to adjust a large number of design factors when scenarios change. Therefore, automating the design of SpTA accelerators is crucial. Nevertheless, previous works focus solely on either mapping (i.e., tiling communication and computation in space and time) or sparse strategy (i.e., bypassing zero elements for efficiency), leading to suboptimal designs due to the lack of comprehensive consideration of both. A unified framework that jointly optimizes both is urgently needed. However, integrating mapping and sparse strategies leads to a combinatorial explosion in the design space(e.g., as large as $O(10^{41})$ for the workload $P_{32 \times 64} \times Q_{64 \times 48} = Z_{32 \times 48}$). This vast search space renders most conventional optimization methods (e.g., particle swarm optimization, reinforcement learning and Monte Carlo tree search) inefficient. To address this challenge, we propose an evolution strategy-based sparse tensor accelerator optimization framework, called SparseMap. SparseMap constructing a more comprehensive design space with the consideration of both mapping and sparse strategy. We introduce a series of enhancements to genetic encoding and evolutionary operators, enabling SparseMap to efficiently explore the vast and diverse design space. We quantitatively compare SparseMap with prior works and classical optimization methods, demonstrating that SparseMap consistently finds superior solutions.
DIT: Dimension Reduction View on Optimal NFT Rarity Meters
Belousov, Dmitry, Yanovich, Yury
Non-fungible tokens (NFTs) have become a significant digital asset class, each uniquely representing virtual entities such as artworks. These tokens are stored in collections within smart contracts and are actively traded across platforms on Ethereum, Bitcoin, and Solana blockchains. The value of NFTs is closely tied to their distinctive characteristics that define rarity, leading to a growing interest in quantifying rarity within both industry and academia. While there are existing rarity meters for assessing NFT rarity, comparing them can be challenging without direct access to the underlying collection data. The Rating over all Rarities (ROAR) benchmark addresses this challenge by providing a standardized framework for evaluating NFT rarity. This paper explores a dimension reduction approach to rarity design, introducing new performance measures and meters, and evaluates them using the ROAR benchmark. Our contributions to the rarity meter design issue include developing an optimal rarity meter design using non-metric weighted multidimensional scaling, introducing Dissimilarity in Trades (DIT) as a performance measure inspired by dimension reduction techniques, and unveiling the non-interpretable rarity meter DIT, which demonstrates superior performance compared to existing methods.
The Maximum Coverage Model and Recommendation System for UAV Vertiports Location Planning
Hua, Chunliang, Hu, Xiao, Sun, Jiayang, Yang, Zeyuan
As urban aerial mobility (UAM) infrastructure development accelerates globally, cities like Shenzhen are planning large-scale vertiport networks (e.g., 1,200+ facilities by 2026). Existing planning frameworks remain inadequate for this complexity due to historical limitations in data granularity and real-world applicability. This paper addresses these gaps by first proposing the Capacitated Dynamic Maximum Covering Location Problem (CDMCLP), a novel optimization framework that simultaneously models urban-scale spatial-temporal demand, heterogeneous user behaviors, and infrastructure capacity constraints. Building on this foundation, we introduce an Integrated Planning Recommendation System that combines CDMCLP with socio-economic factors and dynamic clustering initialization. This system leverages adaptive parameter tuning based on empirical user behavior to generate practical planning solutions. Validation in a Chinese center city demonstrates the effectiveness of the new optimization framework and recommendation system. Under the evaluation and optimization of CDMCLP, the quantitative performance of traditional location methods are exposed and can be improved by 38\%--52\%, while the recommendation system shows user-friendliness and the effective integration of complex elements. By integrating mathematical rigor with practical implementation considerations, this hybrid approach bridges the gap between theoretical location modeling and real-world UAM infrastructure planning, offering municipalities a pragmatic tool for vertiport network design.
Trust Region Constrained Measure Transport in Path Space for Stochastic Optimal Control and Inference
Blessing, Denis, Berner, Julius, Richter, Lorenz, Domingo-Enrich, Carles, Du, Yuanqi, Vahdat, Arash, Neumann, Gerhard
Solving stochastic optimal control problems with quadratic control costs can be viewed as approximating a target path space measure, e.g. via gradient-based optimization. In practice, however, this optimization is challenging in particular if the target measure differs substantially from the prior. In this work, we therefore approach the problem by iteratively solving constrained problems incorporating trust regions that aim for approaching the target measure gradually in a systematic way. It turns out that this trust region based strategy can be understood as a geometric annealing from the prior to the target measure, where, however, the incorporated trust regions lead to a principled and educated way of choosing the time steps in the annealing path. We demonstrate in multiple optimal control applications that our novel method can improve performance significantly, including tasks in diffusion-based sampling, transition path sampling, and fine-tuning of diffusion models.
EXOTIC: An Exact, Optimistic, Tree-Based Algorithm for Min-Max Optimization
Maheshwari, Chinmay, Pimpalkhare, Chinmay, Chatterjee, Debasish
Min-max optimization arises in many domains such as game theory, adversarial machine learning, etc., with gradient-based methods as a typical computational tool. Beyond convex-concave min-max optimization, the solutions found by gradient-based methods may be arbitrarily far from global optima. In this work, we present an algorithmic apparatus for computing globally optimal solutions in convex-non-concave and non-convex-concave min-max optimization. For former, we employ a reformulation that transforms it into a non-concave-convex max-min optimization problem with suitably defined feasible sets and objective function. The new form can be viewed as a generalization of Sion's minimax theorem. Next, we introduce EXOTIC-an Exact, Optimistic, Tree-based algorithm for solving the reformulated max-min problem. EXOTIC employs an iterative convex optimization solver to (approximately) solve the inner minimization and a hierarchical tree search for the outer maximization to optimistically select promising regions to search based on the approximate solution returned by convex optimization solver. We establish an upper bound on its optimality gap as a function of the number of calls to the inner solver, the solver's convergence rate, and additional problem-dependent parameters. Both our algorithmic apparatus along with its accompanying theoretical analysis can also be applied for non-convex-concave min-max optimization. In addition, we propose a class of benchmark convex-non-concave min-max problems along with their analytical global solutions, providing a testbed for evaluating algorithms for min-max optimization. Empirically, EXOTIC outperforms gradient-based methods on this benchmark as well as on existing numerical benchmark problems from the literature. Finally, we demonstrate the utility of EXOTIC by computing security strategies in multi-player games with three or more players.