Goto

Collaborating Authors

 Optimization


Memory-Efficient LLM Training by Various-Grained Low-Rank Projection of Gradients

arXiv.org Artificial Intelligence

Building upon the success of low-rank adapter (LoRA), low-rank gradient projection (LoRP) has emerged as a promising solution for memory-efficient fine-tuning. However, existing LoRP methods typically treat each row of the gradient matrix as the default projection unit, leaving the role of projection granularity underexplored. In this work, we propose a novel framework, VLoRP, that extends low-rank gradient projection by introducing an additional degree of freedom for controlling the trade-off between memory efficiency and performance, beyond the rank hyper-parameter. Through this framework, we systematically explore the impact of projection granularity, demonstrating that finer-grained projections lead to enhanced stability and efficiency even under a fixed memory budget. Regarding the optimization for VLoRP, we present ProjFactor, an adaptive memory-efficient optimizer, that significantly reduces memory requirement while ensuring competitive performance, even in the presence of gradient accumulation. Additionally, we provide a theoretical analysis of VLoRP, demonstrating the descent and convergence of its optimization trajectory under both SGD and ProjFactor. Extensive experiments are conducted to validate our findings, covering tasks such as commonsense reasoning, MMLU, and GSM8K.


Scalable Speed-ups for the SMS-EMOA from a Simple Aging Strategy

arXiv.org Artificial Intelligence

Different from single-objective evolutionary algorithms, where non-elitism is an established concept, multi-objective evolutionary algorithms almost always select the next population in a greedy fashion. In the only notable exception, Bian, Zhou, Li, and Qian (IJCAI 2023) proposed a stochastic selection mechanism for the SMS-EMOA and proved that it can speed up computing the Pareto front of the bi-objective jump benchmark with problem size $n$ and gap parameter $k$ by a factor of $\max\{1,2^{k/4}/n\}$. While this constitutes the first proven speed-up from non-elitist selection, suggesting a very interesting research direction, it has to be noted that a true speed-up only occurs for $k \ge 4\log_2(n)$, where the runtime is super-polynomial, and that the advantage reduces for larger numbers of objectives as shown in a later work. In this work, we propose a different non-elitist selection mechanism based on aging, which exempts individuals younger than a certain age from a possible removal. This remedies the two shortcomings of stochastic selection: We prove a speed-up by a factor of $\max\{1,Θ(k)^{k-1}\}$, regardless of the number of objectives. In particular, a positive speed-up can already be observed for constant $k$, the only setting for which polynomial runtimes can be witnessed. Overall, this result supports the use of non-elitist selection schemes, but suggests that aging-based mechanisms can be considerably more powerful than stochastic selection mechanisms.


Morello: Compiling Fast Neural Networks with Dynamic Programming and Spatial Compression

arXiv.org Artificial Intelligence

High-throughput neural network inference requires coordinating many optimization decisions, including parallel tiling, microkernel selection, and data layout. The product of these decisions forms a search space of programs which is typically intractably large. Existing approaches (e.g., auto-schedulers) often address this problem by sampling this space heuristically. In contrast, we introduce a dynamic-programming-based approach to explore more of the search space by iteratively decomposing large program specifications into smaller specifications reachable from a set of rewrites, then composing a final program from each rewrite that minimizes an affine cost model. To reduce memory requirements, we employ a novel memoization table representation, which indexes specifications by coordinates in $Z_{\geq 0}$ and compresses identical, adjacent solutions. This approach can visit a much larger set of programs than prior work. To evaluate the approach, we developed Morello, a compiler which lowers specifications roughly equivalent to a few-node XLA computation graph to x86. Notably, we found that an affine cost model is sufficient to surface high-throughput programs. For example, Morello synthesized a collection of matrix multiplication benchmarks targeting a Zen 1 CPU, including a 1x2048x16384, bfloat16-to-float32 vector-matrix multiply, which was integrated into Google's gemma.cpp.


Triangle-Decomposable Graphs for Isoperimetric Robots

arXiv.org Artificial Intelligence

-- Isoperimetric robots are large scale, untethered inflatable robots that can undergo large shape changes, but have only been demonstrated in one 3D shape-an octahedron. These robots consist of independent triangles that can change shape while maintaining their perimeter by moving the relative position of their joints. We introduce an optimization routine that determines if an arbitrary graph can be partitioned into unique triangles, and thus be constructed as an isoperimetric robotic system. We enumerate all minimally rigid graphs that can be constructed with unique triangles up to 9 nodes (7 triangles), and characterize the workspace of one node of each these robots. We also present a method for constructing larger graphs that can be partitioned by assembling subgraphs that are already partitioned into triangles. This enables a wide variety of isoperimetric robot configurations. Robotic systems will be capable of an increasing number of tasks if they can change shape to perform a variety of tasks and safely interact with people. One type of robot with the potential for large shape change and human-safe interaction is the isoperimetric robot, first introduced in [1], and with an example shown in Figure 1. This robot consists of inflated fabric beams as the primary structural members, with robotic rollers that pinch the tube, reducing the local bending stiffness to create a region of the tube that acts as a rotational joint.


Phasing Through the Flames: Rapid Motion Planning with the AGHF PDE for Arbitrary Objective Functions and Constraints

arXiv.org Artificial Intelligence

Figure 1: This paper introduces BLAZE, a Phase 1 - Phase 2 Affine Geometric Heat Flow (AGHF) framework, to rapidly solve optimal control problems while respecting robot constraints and avoiding obstacles. It begins with an initial trajectory (shown in orange with the color gradient illustrating the evolution in time starting from darkest and going to lightest) that may violate constraints (e.g., the second and fourth pose of the arm are in collision with the boxes and outlined in red). If the initial trajectory is infeasible, BLAZE enters Phase 1, where it evolves the trajectory into a trajectory that satisfies all constraints (e.g., in the blue trajectory, the Kinova arm has been moved out of collision with the boxes). Once the trajectory satisfies all constraints, Phase 2 begins, optimizing the motion to minimize a user-specified cost function while maintaining feasibility (optimized trajectory shown green). BLAZE optimizes the trajectory to reach a target configuration while avoiding the obstacles while considering the full dynamical model of the arm. Note that optimal control (including Phase 1 and Phase 2) for this 14 dimensional state space model is completed within 3s while satisfying input, state, and collision avoidance constraints. Abstract --The generation of optimal trajectories for high-dimensional robotic systems under constraints remains computationally challenging due to the need to simultaneously satisfy dynamic feasibility, input limits, and task-specific objectives while searching over high-dimensional spaces. Recent approaches using the Affine Geometric Heat Flow (AGHF) Partial Differential Equation (PDE) have demonstrated promising results, generating dynamically feasible trajectories for complex systems like the Digit V3 humanoid within seconds. These methods efficiently solve trajectory optimization problems over a two-dimensional domain by evolving an initial trajectory to minimize control effort. However, these AGHF approaches are limited to a single type of optimal control problem (i.e., minimizing the integral of squared control norms) and typically require initial guesses that satisfy constraints to ensure satisfactory convergence. These limitations restrict the potential utility of the AGHF PDE especially when trying to synthesize trajectories for robotic systems. This paper generalizes the AGHF formulation to accommodate arbitrary cost functions, significantly expanding the classes of trajectories that can be generated. This work also introduces a Phase 1 - Phase 2 Algorithm that enables the use of constraint-violating initial guesses while guaranteeing satisfactory convergence. The effectiveness of the proposed method is demonstrated through comparative evaluations against state-of-the-art techniques across various dynamical systems and challenging trajectory generation problems. Optimal Control is a powerful tool for motion planning and control of advanced robotic systems.


Discovering dynamical laws for speech gestures

arXiv.org Artificial Intelligence

A fundamental challenge in the cognitive sciences is discovering the dynamics that govern behaviour. Take the example of spoken language, which is characterised by a highly variable and complex set of physical movements that map onto the small set of cognitive units that comprise language. What are the fundamental dynamical principles behind the movements that structure speech production? In this study, we discover models in the form of symbolic equations that govern articulatory gestures during speech. A sparse symbolic regression algorithm is used to discover models from kinematic data on the tongue and lips. We explore these candidate models using analytical techniques and numerical simulations, and find that a second-order linear model achieves high levels of accuracy, but a nonlinear force is required to properly model articulatory dynamics in approximately one third of cases. This supports the proposal that an autonomous, nonlinear, second-order differential equation is a viable dynamical law for articulatory gestures in speech. We conclude by identifying future opportunities and obstacles in data-driven model discovery and outline prospects for discovering the dynamical principles that govern language, brain and behaviour.


Global Collinearity-aware Polygonizer for Polygonal Building Mapping in Remote Sensing

arXiv.org Artificial Intelligence

This paper addresses the challenge of mapping polygonal buildings from remote sensing images and introduces a novel algorithm, the Global Collinearity-aware Polygonizer (GCP). GCP, built upon an instance segmentation framework, processes binary masks produced by any instance segmentation model. The algorithm begins by collecting polylines sampled along the contours of the binary masks. These polylines undergo a refinement process using a transformer-based regression module to ensure they accurately fit the contours of the targeted building instances. Subsequently, a collinearity-aware polygon simplification module simplifies these refined polylines and generate the final polygon representation. This module employs dynamic programming technique to optimize an objective function that balances the simplicity and fidelity of the polygons, achieving globally optimal solutions. Furthermore, the optimized collinearity-aware objective is seamlessly integrated into network training, enhancing the cohesiveness of the entire pipeline. The effectiveness of GCP has been validated on two public benchmarks for polygonal building mapping. Further experiments reveal that applying the collinearity-aware polygon simplification module to arbitrary polylines, without prior knowledge, enhances accuracy over traditional methods such as the Douglas-Peucker algorithm. This finding underscores the broad applicability of GCP. The code for the proposed method will be made available at https://github.com/zhu-xlab.


An Efficient Real-Time Planning Method for Swarm Robotics Based on an Optimal Virtual Tube

arXiv.org Artificial Intelligence

Swarm robotics navigating through unknown obstacle environments is an emerging research area that faces challenges. Performing tasks in such environments requires swarms to achieve autonomous localization, perception, decision-making, control, and planning. The limited computational resources of onboard platforms present significant challenges for planning and control. Reactive planners offer low computational demands and high re-planning frequencies but lack predictive capabilities, often resulting in local minima. Long-horizon planners, on the other hand, can perform multi-step predictions to reduce deadlocks but cost much computation, leading to lower re-planning frequencies. This paper proposes a real-time optimal virtual tube planning method for swarm robotics in unknown environments, which generates approximate solutions for optimal trajectories through affine functions. As a result, the computational complexity of approximate solutions is $O(n_t)$, where $n_t$ is the number of parameters in the trajectory, thereby significantly reducing the overall computational burden. By integrating reactive methods, the proposed method enables low-computation, safe swarm motion in unknown environments. The effectiveness of the proposed method is validated through several simulations and experiments.


A Provably Convergent Plug-and-Play Framework for Stochastic Bilevel Optimization

arXiv.org Artificial Intelligence

A Provably Convergent Plug-and-Play Framework for Stochastic Bilevel Optimization Tianshu Chu Dachuan Xu Wei Yao Chengming Yu Jin Zhang Abstract Bilevel optimization has recently attracted significant attention in machine learning due to its wide range of applications and advanced hierarchical optimization capabilities. In this paper, we propose a plug-and-play framework, named PnPBO, for developing and analyzing stochastic bilevel optimization methods. This framework integrates both modern unbiased and biased stochastic estimators into the single-loop bilevel optimization framework introduced in [9], with several improvements. In the implementation of PnPBO, all stochastic estimators for different variables can be independently incorporated, and an additional moving average technique is applied when using an unbiased estimator for the upper-level variable. In the theoretical analysis, we provide a unified convergence and complexity analysis for PnPBO, demonstrating that the adaptation of various stochastic estimators (including PAGE, ZeroSARAH, and mixed strategies) within the PnPBO framework achieves optimal sample complexity, comparable to that of single-level optimization. This resolves the open question of whether the optimal complexity bounds for solving bilevel optimization are identical to those for single-level optimization. Keyword bilevel optimization, stochastic optimization, plug-and-play, sample complexity 1 Introduction Bilevel optimization (BLO) effectively addresses challenges arising from hierarchical optimization, where the decision variables in the upper level are also involved in the lower level. In recent years, BLO has gained increasing attention due to its extensive and effective applications, including hyperparameter optimization [13], meta-learning [20], continual learning [17], and reinforcement learning [40]. In this paper, we focus on the nonconvex-strongly-convex BLO problem under classical assumptions, formulated as follows min x R d x H (x):= f (x, y ( x)) s.t. This work was partially presented at International Conference on Machine Learning (ICML) 2024 ([7]). School of Science, Beijing University of Posts and Telecommunications; National Center for Applied Mathematics Shenzhen; yucm@bupt.edu.cn The bilevel structure typically makes the objective H (x) nonconvex, except in a few special cases. Therefore, our goal is to develop efficient gradient-based algorithms to find an ϵ-stationary point of H (x), specifically, a point x that satisfies the inequality E H (x) 2 ϵ in the stochastic setting [14]. To achieve this, a promising approach is to adapt stochastic methods from single-level optimization, which we refer to as stochastic estimators, to the bilevel optimization context.


DexFlow: A Unified Approach for Dexterous Hand Pose Retargeting and Interaction

arXiv.org Artificial Intelligence

DexFlow: A Unified Approach for Dexterous Hand Pose Retargeting and Interaction Xiaoyi Lin 1, Kunpeng Y ao 2, Lixin Xu 3,Xueqiang Wang 4,Xuetao Li 1,Y uchen Wang 1,Miao Li 4, Abstract -- Despite advances in hand-object interaction modeling, generating realistic dexterous manipulation data for robotic hands remains a challenge. Retargeting methods often suffer from low accuracy and fail to account for hand-object interactions, leading to artifacts like interpenetration. Generative methods, lacking human hand priors, produce limited and unnatural poses. We propose a data transformation pipeline that combines human hand and object data from multiple sources for high-precision retargeting. Our approach uses a differential loss constraint to ensure temporal consistency and generates contact maps to refine hand-object interactions.