Goto

Collaborating Authors

 Optimization


Multilinear and Linear Programs for Partially Identifiable Queries in Quasi-Markovian Structural Causal Models

arXiv.org Artificial Intelligence

We investigate partially identifiable queries in a class of causal models. We focus on acyclic Structural Causal Models that are quasi-Markovian (that is, each endogenous variable is connected with at most one exogenous confounder). We look into scenarios where endogenous variables are observed (and a distribution over them is known), while exogenous variables are not fully specified. This leads to a representation that is in essence a Bayesian network where the distribution of root variables is not uniquely determined. In such circumstances, it may not be possible to precisely compute a probability value of interest. We thus study the computation of tight probability bounds, a problem that has been solved by multilinear programming in general, and by linear programming when a single confounded component is intervened upon. We present a new algorithm to simplify the construction of such programs by exploiting input probabilities over endogenous variables. For scenarios with a single intervention, we apply column generation to compute a probability bound through a sequence of auxiliary linear integer programs, thus showing that a representation with polynomial cardinality for exogenous variables is possible. Experiments show column generation techniques to be superior to existing methods.


Segmented Trajectory Optimization for Autonomous Parking in Unstructured Environments

arXiv.org Artificial Intelligence

This paper presents a Segmented Trajectory Optimization (STO) method for autonomous parking, which refines an initial trajectory into a dynamically feasible and collision-free one using an iterative SQP-based approach. STO maintains the maneuver strategy of the high-level global planner while allowing curvature discontinuities at switching points to improve maneuver efficiency. To ensure safety, a convex corridor is constructed via GJK-accelerated ellipse shrinking and expansion, serving as safety constraints in each iteration. Numerical simulations in perpendicular and reverse-angled parking scenarios demonstrate that STO enhances maneuver efficiency while ensuring safety. Moreover, computational performance confirms its practicality for real-world applications.


Deliberate Planning of 3D Bin Packing on Packing Configuration Trees

arXiv.org Artificial Intelligence

Online 3D Bin Packing Problem (3D-BPP) has widespread applications in industrial automation. Existing methods usually solve the problem with limited resolution of spatial discretization, and/or cannot deal with complex practical constraints well. We propose to enhance the practical applicability of online 3D-BPP via learning on a novel hierarchical representation, packing configuration tree (PCT). PCT is a full-fledged description of the state and action space of bin packing which can support packing policy learning based on deep reinforcement learning (DRL). The size of the packing action space is proportional to the number of leaf nodes, making the DRL model easy to train and well-performing even with continuous solution space. We further discover the potential of PCT as tree-based planners in deliberately solving packing problems of industrial significance, including large-scale packing and different variations of BPP setting. A recursive packing method is proposed to decompose large-scale packing into smaller sub-trees while a spatial ensemble mechanism integrates local solutions into global. For different BPP variations with additional decision variables, such as lookahead, buffering, and offline packing, we propose a unified planning framework enabling out-of-the-box problem solving. Extensive evaluations demonstrate that our method outperforms existing online BPP baselines and is versatile in incorporating various practical constraints. The planning process excels across large-scale problems and diverse problem variations. We develop a real-world packing robot for industrial warehousing, with careful designs accounting for constrained placement and transportation stability. Our packing robot operates reliably and efficiently on unprotected pallets at 10 seconds per box. It achieves averagely 19 boxes per pallet with 57.4% space utilization for relatively large-size boxes.


A proximal augmented Lagrangian method for nonconvex optimization with equality and inequality constraints

arXiv.org Machine Learning

We propose an inexact proximal augmented Lagrangian method (P-ALM) for nonconvex structured optimization problems. The proposed method features an easily implementable rule not only for updating the penalty parameters, but also for adaptively tuning the proximal term. It allows the penalty parameter to grow rapidly in the early stages to speed up progress, while ameliorating the issue of ill-conditioning in later iterations, a well-known drawback of the traditional approach of linearly increasing the penalty parameters. A key element in our analysis lies in the observation that the augmented Lagrangian can be controlled effectively along the iterates, provided an initial feasible point is available. Our analysis, while simple, provides a new theoretical perspective about P-ALM and, as a by-product, results in similar convergence properties for its non-proximal variant, the classical augmented Lagrangian method (ALM). Numerical experiments, including convex and nonconvex problem instances, demonstrate the effectiveness of our approach.


Faster Gradient Methods for Highly-smooth Stochastic Bilevel Optimization

arXiv.org Machine Learning

This paper studies the complexity of finding an $ε$-stationary point for stochastic bilevel optimization when the upper-level problem is nonconvex and the lower-level problem is strongly convex. Recent work proposed the first-order method, F${}^2$SA, achieving the $\tilde{\mathcal{O}}(ε^{-6})$ upper complexity bound for first-order smooth problems. This is slower than the optimal $Ω(ε^{-4})$ complexity lower bound in its single-level counterpart. In this work, we show that faster rates are achievable for higher-order smooth problems. We first reformulate F$^2$SA as approximating the hyper-gradient with a forward difference. Based on this observation, we propose a class of methods F${}^2$SA-$p$ that uses $p$th-order finite difference for hyper-gradient approximation and improves the upper bound to $\tilde{\mathcal{O}}(p ε^{4-p/2})$ for $p$th-order smooth problems. Finally, we demonstrate that the $Ω(ε^{-4})$ lower bound also holds for stochastic bilevel problems when the high-order smoothness holds for the lower-level variable, indicating that the upper bound of F${}^2$SA-$p$ is nearly optimal in the highly smooth region $p = Ω( \log ε^{-1} / \log \log ε^{-1})$.


Warming Up for Zeroth-Order Federated Pre-Training with Low Resource Clients

arXiv.org Artificial Intelligence

Federated learning enables collaborative model training across numerous edge devices without requiring participants to share data; however, memory and communication constraints on these edge devices may preclude their participation in training. We consider a setting in which a subset of edge devices are below a critical memory or communication threshold required to conduct model updates. Under typical federated optimization algorithms, these devices are excluded from training which renders their data inaccessible and increases system induced bias. We are inspired by MeZO, a zeroth-order method used for memory-efficient fine-tuning. The increased variance inherent to zeroth-order gradient approximations has relegated previous zeroth-order optimizers exclusively to the domain of fine tuning; a limitation we seek to correct. We devise a federated, memory-efficient zeroth-order optimizer, ZOWarmUp that permits zeroth-order training from a random initialization. ZOWarmUp leverages differing client capabilities and careful variance reduction techniques to facilitate participation of under-represented, low-resource clients in model training. Like other federated zeroth-order methods, ZOWarmUp eliminates the need for edge devices to transmit their full gradients to the server and instead relies on only a small set of random seeds, rendering the up-link communication cost negligible. We present experiments using various datasets and model architectures to show that ZOWarmUp is a robust algorithm that can can be applied under a wide variety of circumstances. For systems with a high proportion of edge devices that would otherwise be excluded from training, this algorithm provides access to a greater volume and diversity of data, thus improving training outcomes. Over the last decade, there has been a notable increase in data generation, processing, and storage on edge devices. These devices can be used to train advanced neural networks, spanning applications from object detection in autonomous driving to facial recognition on smartphones.


Unsupervised Learning based Element Resource Allocation for Reconfigurable Intelligent Surfaces in mmWave Network

arXiv.org Artificial Intelligence

The increasing demand for high data rates and seamless connectivity in wireless systems has sparked significant interest in reconfigurable intelligent surfaces (RIS) and artificial intelligence-based wireless applications. RIS typically comprises passive reflective antenna elements that control the wireless propagation environment by adequately tuning the phase of the reflective elements. The allocation of RIS elements to multipleuser equipment (UEs) is crucial for efficiently utilizing RIS. In this work, we formulate a joint optimization problem that optimizes the RIS phase configuration and resource allocation under an $α$-fair scheduling framework and propose an efficient way of allocating RIS elements. Conventional iterative optimization methods, however, suffer from exponentially increasing computational complexity as the number of RIS elements increases and also complicate the generation of training labels for supervised learning. To overcome these challenges, we propose a five-layer fully connected neural network (FNN) combined with a preprocessing technique to significantly reduce input dimensionality, lower computational complexity, and enhance scalability. The simulation results show that our proposed NN-based solution reduces computational overhead while significantly improving system throughput by 6.8% compared to existing RIS element allocation schemes. Furthermore, the proposed system achieves better performance while reducing computational complexity, making it significantly more scalable than the iterative optimization algorithms.


Vibration Damping in Underactuated Cable-suspended Artwork -- Flying Belt Motion Control

arXiv.org Artificial Intelligence

This paper presents a comprehensive refurbishment of the interactive robotic art installation Standards and Double Standards by Rafael Lozano-Hemmer. The installation features an array of belts suspended from the ceiling, each actuated by stepper motors and dynamically oriented by a vision-based tracking system that follows the movements of exhibition visitors. The original system was limited by oscillatory dynamics, resulting in torsional and pendulum-like vibrations that constrained rotational speed and reduced interactive responsiveness. To address these challenges, the refurbishment involved significant upgrades to both hardware and motion control algorithms. A detailed mathematical model of the flying belt system was developed to accurately capture its dynamic behavior, providing a foundation for advanced control design. An input shaping method, formulated as a convex optimization problem, was implemented to effectively suppress vibrations, enabling smoother and faster belt movements. Experimental results demonstrate substantial improvements in system performance and audience interaction. This work exemplifies the integration of robotics, control engineering, and interactive art, offering new solutions to technical challenges in real-time motion control and vibration damping for large-scale kinetic installations.


LExI: Layer-Adaptive Active Experts for Efficient MoE Model Inference

arXiv.org Artificial Intelligence

Mixture-of-Experts (MoE) models scale efficiently by activating only a subset of experts per token, offering a computationally sparse alternative to dense architectures. While prior post-training optimizations, such as inter- and intra-expert pruning, reduce memory usage they provide limited gains in inference-time compute efficiency. Moreover, existing MoE architectures typically activate a fixed number of experts uniformly across all layers, resulting in redundant computation and suboptimal performance. In this work, we first demonstrate that MoE pruning strategies improve only the memory footprint but do not significantly improve inference performance on GPU using optimized frameworks such as vLLM. To address this, we introduce LExI, a data-free optimization technique that determines the optimal number of active experts per layer in a pretrained MoE model. LExI leverages only the model weights to estimate the relative importance of each layer and adaptively assigns the number of active experts accordingly per layer. Experiments on state-of-the-art language and vision MoE benchmarks demonstrate that LExI significantly outperforms traditional MoE pruning approaches in terms of inference efficiency with negligible accuracy loss. For example, using LExI, Qwen1.5-MoE achieves the same throughput on Nvidia H100 GPU with 10% better accuracy than traditional expert pruning.


Communication Efficient Robotic Mixed Reality with Gaussian Splatting Cross-Layer Optimization

arXiv.org Artificial Intelligence

Realizing low-cost communication in robotic mixed reality (RoboMR) systems presents a challenge, due to the necessity of uploading high-resolution images through wireless channels. This paper proposes Gaussian splatting (GS) RoboMR (GSMR), which enables the simulator to opportunistically render a photo-realistic view from the robot's pose by calling ``memory'' from a GS model, thus reducing the need for excessive image uploads. However, the GS model may involve discrepancies compared to the actual environments. To this end, a GS cross-layer optimization (GSCLO) framework is further proposed, which jointly optimizes content switching (i.e., deciding whether to upload image or not) and power allocation (i.e., adjusting to content profiles) across different frames by minimizing a newly derived GSMR loss function. The GSCLO problem is addressed by an accelerated penalty optimization (APO) algorithm that reduces computational complexity by over $10$x compared to traditional branch-and-bound and search algorithms. Moreover, variants of GSCLO are presented to achieve robust, low-power, and multi-robot GSMR. Extensive experiments demonstrate that the proposed GSMR paradigm and GSCLO method achieve significant improvements over existing benchmarks on both wheeled and legged robots in terms of diverse metrics in various scenarios. For the first time, it is found that RoboMR can be achieved with ultra-low communication costs, and mixture of data is useful for enhancing GS performance in dynamic scenarios.