Optimization
Automated Feature Labeling with Token-Space Gradient Descent
Schulz, Julian, Fallows, Seamus
We present a novel approach to feature labeling using gradient descent in token-space. While existing methods typically use language models to generate hypotheses about feature meanings, our method directly optimizes label representations by using a language model as a discriminator to predict feature activations. We formulate this as a multi-objective optimization problem in token-space, balancing prediction accuracy, entropy minimization, and linguistic naturalness. Our proof-of-concept experiments demonstrate successful convergence to interpretable single-token labels across diverse domains, including features for detecting animals, mammals, Chinese text, and numbers. Although our current implementation is constrained to single-token labels and relatively simple features, the results suggest that token-space gradient descent could become a valuable addition to the interpretability researcher's toolkit. Recent work in mechanistic interpretability has made significant progress in decomposing neural networks into interpretable features.
Policy Gradient for LQR with Domain Randomization
Fujinami, Tesshu, Lee, Bruce D., Matni, Nikolai, Pappas, George J.
-- Domain randomization (DR) enables sim-to-real transfer by training controllers on a distribution of simulated environments, with the goal of achieving robust performance in the real world. Although DR is widely used in practice and is often solved using simple policy gradient (PG) methods, understanding of its theoretical guarantees remains limited. T oward addressing this gap, we provide the first convergence analysis of PG methods for domain-randomized linear quadratic regulation (LQR). We show that PG converges globally to the minimizer of a finite-sample approximation of the DR objective under suitable bounds on the heterogeneity of the sampled systems. We also quantify the sample-complexity associated with achieving a small performance gap between the sample-average and population-level objectives. Additionally, we propose and analyze a discount-factor annealing algorithm that obviates the need for an initial jointly stabilizing controller, which may be challenging to find. Empirical results support our theoretical findings and highlight promising directions for future work, including risk-sensitive DR formulations and stochastic PG algorithms. Domain randomization (DR) has emerged as a dominant paradigm to enable transfer of policies optimized in simulation to the real world by randomizing simulator parameters during training [1-3]. In doing so, just as with robust control, DR accounts for discrepancies between the model used in simulation to synthesize a policy and the system that it is deployed on. Since DR does not solely focus on optimizing the worst-case performance, it can result in less conservative controller performance while still ensuring robust stability with high probability. Furthermore, DR can be easily implemented via first order methods. This makes it straightforward to incorporate into a wide variety of reinforcement learning schemes and to benefit from the increasing availability of parallel computation. Despite the ease with which DR can be implemented using first order methods, ensuring convergence of these methods remains a critical challenge, with practitioners relying upon complex scheduling of various hyperparameters in the optimization procedure [3].
Ride-Sourcing Vehicle Rebalancing with Service Accessibility Guarantees via Constrained Mean-Field Reinforcement Learning
Jusup, Matej, Zhang, Kenan, Hu, Zhiyuan, Pásztor, Barna, Krause, Andreas, Corman, Francesco
The rapid expansion of ride-sourcing services such as Uber, Lyft, and Didi Chuxing has fundamentally reshaped urban transportation by offering flexible, on-demand mobility via mobile applications. Despite their convenience, these platforms confront significant operational challenges, particularly vehicle rebalancing - the strategic repositioning of thousands of vehicles to address spatiotemporal mismatches in supply and demand. Inadequate rebalancing results in prolonged rider waiting times, inefficient vehicle utilization, and inequitable distribution of services, leading to disparities in driver availability and income. To tackle these complexities, we introduce scalable continuous-state mean-field control (MFC) and reinforcement learning (MFRL) models that explicitly represent each vehicle's precise location and employ continuous repositioning actions guided by the distribution of other vehicles. To ensure equitable service distribution, an accessibility constraint is integrated within our optimal control formulation, balancing operational efficiency with equitable access to the service across geographic regions. Our approach acknowledges realistic conditions, including inherent stochasticity in transitions, the simultaneous occurrence of vehicle-rider matching, vehicles' rebalancing and cruising, and variability in rider behaviors. Crucially, we relax the traditional mean-field assumption of equal supply-demand volume, better reflecting practical scenarios. Extensive empirical evaluation using real-world data-driven simulation of Shenzhen demonstrates the real-time efficiency and robustness of our approach at the scale of tens of thousands of vehicles. The code is available at https://github.com/mjusup1501/mf-vehicle-rebalancing.
Many-to-Many Matching via Sparsity Controlled Optimal Transport
Liu, Weijie, Bao, Han, Yamada, Makoto, Huang, Zenan, Zheng, Nenggan, Qian, Hui
Many-to-many matching seeks to match multiple points in one set and multiple points in another set, which is a basis for a wide range of data mining problems. It can be naturally recast in the framework of Optimal Transport (OT). However, existing OT methods either lack the ability to accomplish many-to-many matching or necessitate careful tuning of a regularization parameter to achieve satisfactory results. This paper proposes a novel many-to-many matching method to explicitly encode many-to-many constraints while preventing the degeneration into one-to-one matching. The proposed method consists of the following two components. The first component is the matching budget constraints on each row and column of a transport plan, which specify how many points can be matched to a point at most. The second component is the deformed $q$-entropy regularization, which encourages a point to meet the matching budget maximally. While the deformed $q$-entropy was initially proposed to sparsify a transport plan, we employ it to avoid the degeneration into one-to-one matching. We optimize the objective via a penalty algorithm, which is efficient and theoretically guaranteed to converge. Experimental results on various tasks demonstrate that the proposed method achieves good performance by gleaning meaningful many-to-many matchings.
Pro-Routing: Proactive Routing of Autonomous Multi-Capacity Robots for Pickup-and-Delivery Tasks
Garces, Daniel, Gil, Stephanie
We consider a multi-robot setting, where we have a fleet of multi-capacity autonomous robots that must service spatially distributed pickup-and-delivery requests with fixed maximum wait times. Requests can be either scheduled ahead of time or they can enter the system in real-time. In this setting, stability for a routing policy is defined as the cost of the policy being uniformly bounded over time. Most previous work either solve the problem offline to theoretically maintain stability or they consider dynamically arriving requests at the expense of the theoretical guarantees on stability. In this paper, we aim to bridge this gap by proposing a novel proactive rollout-based routing framework that adapts to real-time demand while still provably maintaining the stability of the learned routing policy. We derive provable stability guarantees for our method by proposing a fleet sizing algorithm that obtains a sufficiently large fleet that ensures stability by construction. To validate our theoretical results, we consider a case study on real ride requests for Harvard's evening Van System. We also evaluate the performance of our framework using the currently deployed smaller fleet size. In this smaller setup, we compare against the currently deployed routing algorithm, greedy heuristics, and Monte-Carlo-Tree-Search-based algorithms. Our empirical results show that our framework maintains stability when we use the sufficiently large fleet size found in our theoretical results. For the smaller currently deployed fleet size, our method services 6% more requests than the closest baseline while reducing median passenger wait times by 33%.
It's a (Blind) Match! Towards Vision-Language Correspondence without Parallel Data
Schnaus, Dominik, Araslanov, Nikita, Cremers, Daniel
The platonic representation hypothesis suggests that vision and language embeddings become more homogeneous as model and dataset sizes increase. In particular, pairwise distances within each modality become more similar. This suggests that as foundation models mature, it may become possible to match vision and language embeddings in a fully unsupervised fashion, i.e. without parallel data. We present the first feasibility study, and investigate conformity of existing vision and language foundation models in the context of unsupervised, or "blind", matching. First, we formulate unsupervised matching as a quadratic assignment problem and introduce a novel heuristic that outperforms previous solvers. We also develop a technique to find optimal matching problems, for which a non-trivial match is very likely. Second, we conduct an extensive study deploying a range of vision and language models on four datasets. Our analysis reveals that for many problem instances, vision and language representations can be indeed matched without supervision. This finding opens up the exciting possibility of embedding semantic knowledge into other modalities virtually annotation-free. As a proof of concept, we showcase an unsupervised classifier, which achieves non-trivial classification accuracy without any image-text annotation.
Traffic Engineering in Large-scale Networks with Generalizable Graph Neural Networks
Zhou, Fangtong, Liu, Xiaorui, Yu, Ruozhou, Xue, Guoliang
--Traffic engineering (TE) in large-scale computer networks has become a fundamental yet challenging problem, owing to the swift growth of global-scale cloud wide-area networks or backbone low-Earth-orbit satellite constellations. T o address the scalability issue of traditional TE algorithms, learning-based approaches have been proposed, showing potential of significant efficiency improvement over state-of-the-art methods. Nevertheless, the intrinsic limitations of existing learning-based methods hinder their practical application: they are not generalizable across diverse topologies and network conditions, incur excessive training overhead, and do not respect link capacities by default. This paper proposes TELGEN, a novel TE algorithm that learns to solve TE problems efficiently in large-scale networks, while achieving superior generalizability across diverse network conditions. TELGEN is based on the novel idea of transforming the problem of "predicting the optimal TE solution" into "predicting the optimal TE algorithm", which enables TELGEN to learn and efficiently approximate the end-to-end solving process of classical optimal TE algorithms. The learned algorithm is agnostic to the exact network topology or traffic patterns, and can efficiently solve TE problems given arbitrary inputs and generalize well to unseen topologies and demands. TELGEN achieved less than 3% optimality gap while ensuring feasibility in all cases, even when the test network had up to 20 more nodes than the largest in training. It also saved up to 84% solving time than classical optimal solver, and could reduce training time per epoch and solving time by 2 -4 orders of magnitude than latest learning algorithms on the largest networks. Traffic Engineering (TE) is becoming increasingly crucial amid the exponential growth in Internet traffic. Xue (xue@asu.edu) is with the School of Computing and Augmented Intelligence at the Arizona State University, Tempe, AZ, 85287, USA. The research of Zhou and Y u was supported in part by NSF grants 2045539 and 2433966. The research of Xue was sponsored in part by the Army Research Laboratory and was accomplished under Cooperative Agreement Number W911NF-23-2-0225. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Laboratory or the U.S. Government. Personal use of this material is permitted. Usually, TE is implemented by a central controller that has a global view of the network and can make informed decisions about routing and traffic splitting to optimize traffic [26]. With the emergence of large-scale and dynamic networks, classical TE faces fundamental challenges in terms of scalability, responsiveness and performance.
Rack Position Optimization in Large-Scale Heterogeneous Data Centers
Chen, Chang-Lin, Chen, Jiayu, Lan, Tian, Zhao, Zhaoxia, Dong, Hongbo, Aggarwal, Vaneet
As rapidly growing AI computational demands accelerate the need for new hardware installation and maintenance, this work explores optimal data center resource management by balancing operational efficiency with fault tolerance through strategic rack positioning considering diverse resources and locations. Traditional mixed-integer programming (MIP) approaches often struggle with scalability, while heuristic methods may result in significant sub-optimality. To address these issues, this paper presents a novel two-tier optimization framework using a high-level deep reinforcement learning (DRL) model to guide a low-level gradient-based heuristic for local search. The high-level DRL agent employs Leader Reward for optimal rack type ordering, and the low-level heuristic efficiently maps racks to positions, minimizing movement counts and ensuring fault-tolerant resource distribution. This approach allows scalability to over 100,000 positions and 100 rack types. Our method outperformed the gradient-based heuristic by 7\% on average and the MIP solver by over 30\% in objective value. It achieved a 100\% success rate versus MIP's 97.5\% (within a 20-minute limit), completing in just 2 minutes compared to MIP's 1630 minutes (i.e., almost 4 orders of magnitude improvement). Unlike the MIP solver, which showed performance variability under time constraints and high penalties, our algorithm consistently delivered stable, efficient results - an essential feature for large-scale data center management.
Aligning Diffusion Model with Problem Constraints for Trajectory Optimization
Diffusion models have recently emerged as effective generative frameworks for trajectory optimization, capable of producing high-quality and diverse solutions. However, training these models in a purely data-driven manner without explicit incorporation of constraint information often leads to violations of critical constraints, such as goal-reaching, collision avoidance, and adherence to system dynamics. To address this limitation, we propose a novel approach that aligns diffusion models explicitly with problem-specific constraints, drawing insights from the Dynamic Data-driven Application Systems (DDDAS) framework. Our approach introduces a hybrid loss function that explicitly measures and penalizes constraint violations during training. Furthermore, by statistically analyzing how constraint violations evolve throughout the diffusion steps, we develop a re-weighting strategy that aligns predicted violations to ground truth statistics at each diffusion step. Evaluated on a tabletop manipulation and a two-car reach-avoid problem, our constraint-aligned diffusion model significantly reduces constraint violations compared to traditional diffusion models, while maintaining the quality of trajectory solutions. This approach is well-suited for integration into the DDDAS framework for efficient online trajectory adaptation as new environmental data becomes available.
GenSwarm: Scalable Multi-Robot Code-Policy Generation and Deployment via Language Models
Ji, Wenkang, Chen, Huaben, Chen, Mingyang, Zhu, Guobin, Xu, Lufeng, Groß, Roderich, Zhou, Rui, Cao, Ming, Zhao, Shiyu
The present paradigm of developing multi-robot systems follows a complex and labor-intensive process that involves steps like task analysis, algorithm design, code programming, simulation validation, and real-world deployment. This paradigm requires skilled professionals who are familiar with both theories and software/hardware implementation, incurring high costs in human resources. Moreover, it does not adapt well to dynamically changing tasks: the emergence of a new task requires the repetition of the complex process. Automatic generation and deployment of control policies for multi-robot systems is an appealing paradigm, as it promises substantial savings in terms of human effort and other resources [3-5]. However, this paradigm is nontrivial to realize as a multi-robot system as a whole cannot be programmed directly; rather, a desired collective behavior can be achieved only by programming each individual robot, which relies on its locally available information. Previous methods for automatic development of multi-robot swarming are primarily based on optimization techniques [3, 5]. For instance, an objective function is first crafted to mathematically describe a desired task and then optimized to generate policies through methods such as evolutionary computation [5-7] or systematic search [8]. Despite their promise, these optimization methods face the common limitation of requiring manual crafting of objective functions.