Optimization
Reviews: Shadowing Properties of Optimization Algorithms
The paper presents a theoretical analysis of how well a discrete dynamic flow approximates the flow/solution of a corresponding ODE for gradient descent and heavy ball methods, e.g., how trajectory of the discrete method with small enough step-size does not deviate too much from the trajectory of the ODE. The main theoretical results are somewhat limited, i.e., small step size and quadratic functinos, but are of interest.
Predictive Lagrangian Optimization for Constrained Reinforcement Learning
Zhang, Tianqi, Yuan, Puzhen, Zhan, Guojian, Lin, Ziyu, Lyu, Yao, Qin, Zhenzhi, Duan, Jingliang, Zhang, Liping, Li, Shengbo Eben
Constrained optimization is popularly seen in reinforcement learning for addressing complex control tasks. From the perspective of dynamic system, iteratively solving a constrained optimization problem can be framed as the temporal evolution of a feedback control system. Classical constrained optimization methods, such as penalty and Lagrangian approaches, inherently use proportional and integral feedback controllers. In this paper, we propose a more generic equivalence framework to build the connection between constrained optimization and feedback control system, for the purpose of developing more effective constrained RL algorithms. Firstly, we define that each step of the system evolution determines the Lagrange multiplier by solving a multiplier feedback optimal control problem (MFOCP). In this problem, the control input is multiplier, the state is policy parameters, the dynamics is described by policy gradient descent, and the objective is to minimize constraint violations. Then, we introduce a multiplier guided policy learning (MGPL) module to perform policy parameters updating. And we prove that the resulting optimal policy, achieved through alternating MFOCP and MGPL, aligns with the solution of the primal constrained RL problem, thereby establishing our equivalence framework. Furthermore, we point out that the existing PID Lagrangian is merely one special case within our framework that utilizes a PID controller. We also accommodate the integration of other various feedback controllers, thereby facilitating the development of new algorithms. As a representative, we employ model predictive control (MPC) as the feedback controller and consequently propose a new algorithm called predictive Lagrangian optimization (PLO). Numerical experiments demonstrate its superiority over the PID Lagrangian method, achieving a larger feasible region up to 7.2% and a comparable average reward.
Safe and Agile Transportation of Cable-Suspended Payload via Multiple Aerial Robots
Wang, Yongchao, Wang, Junjie, Zhou, Xiaobin, Yang, Tiankai, Xu, Chao, Gao, Fei
Transporting a heavy payload using multiple aerial robots (MARs) is an efficient manner to extend the load capacity of a single aerial robot. However, existing schemes for the multiple aerial robots transportation system (MARTS) still lack the capability to generate a collision-free and dynamically feasible trajectory in real-time and further track an agile trajectory especially when there are no sensors available to measure the states of payload and cable. Therefore, they are limited to low-agility transportation in simple environments. To bridge the gap, we propose complete planning and control schemes for the MARTS, achieving safe and agile aerial transportation (SAAT) of a cable-suspended payload in complex environments. Flatness maps for the aerial robot considering the complete kinematical constraint and the dynamical coupling between each aerial robot and payload are derived. To improve the responsiveness for the generation of the safe, dynamically feasible, and agile trajectory in complex environments, a real-time spatio-temporal trajectory planning scheme is proposed for the MARTS. Besides, we break away from the reliance on the state measurement for both the payload and cable, as well as the closed-loop control for the payload, and propose a fully distributed control scheme to track the agile trajectory that is robust against imprecise payload mass and non-point mass payload. The proposed schemes are extensively validated through benchmark comparisons, ablation studies, and simulations. Finally, extensive real-world experiments are conducted on a MARTS integrated by three aerial robots with onboard computers and sensors. The result validates the efficiency and robustness of our proposed schemes for SAAT in complex environments.
Causally-Aware Unsupervised Feature Selection Learning
Shen, Zongxin, Huang, Yanyong, Wang, Dongjie, Ma, Minbo, Lv, Fengmao, Li, Tianrui
Unsupervised feature selection (UFS) has recently gained attention for its effectiveness in processing unlabeled high-dimensional data. However, existing methods overlook the intrinsic causal mechanisms within the data, resulting in the selection of irrelevant features and poor interpretability. Additionally, previous graph-based methods fail to account for the differing impacts of non-causal and causal features in constructing the similarity graph, which leads to false links in the generated graph. To address these issues, a novel UFS method, called Causally-Aware UnSupErvised Feature Selection learning (CAUSE-FS), is proposed. CAUSE-FS introduces a novel causal regularizer that reweights samples to balance the confounding distribution of each treatment feature. This regularizer is subsequently integrated into a generalized unsupervised spectral regression model to mitigate spurious associations between features and clustering labels, thus achieving causal feature selection. Furthermore, CAUSE-FS employs causality-guided hierarchical clustering to partition features with varying causal contributions into multiple granularities. By integrating similarity graphs learned adaptively at different granularities, CAUSE-FS increases the importance of causal features when constructing the fused similarity graph to capture the reliable local structure of data. Extensive experimental results demonstrate the superiority of CAUSE-FS over state-of-the-art methods, with its interpretability further validated through feature visualization.
Difference vs. Quotient: A Novel Algorithm for Dominant Eigenvalue Problem
The computation of the dominant eigenvector of symmetric positive semidefinite matrices is a cornerstone operation in numerous machine learning applications. Traditional approaches predominantly rely on the constrained Quotient formulation, which underpins most existing methods. However, these methods often suffer from challenges related to computational efficiency and dependence on spectral prior knowledge. This paper introduces a novel perspective by reformulating the eigenvalue problem using an unconstrained Difference formulation. This new approach sheds light on classical methods, revealing that the power method can be interpreted as a specific instance of Difference of Convex Algorithms. Building on this insight, we develop a generalized family of Difference-Type methods, which encompasses the power method as a special case. Within this family, we propose the Split-Merge algorithm, which achieves maximal acceleration without spectral prior knowledge and operates solely through matrix-vector products, making it both efficient and easy to implement. Extensive empirical evaluations on both synthetic and real-world datasets highlight that the Split-Merge algorithm achieves over a $\boldsymbol{10\times}$ speedup compared to the basic power method, offering significant advancements in efficiency and practicality for large-scale machine learning problems.
An efficient nonconvex reformulation of stagewise convex optimization problems
Convex optimization problems with staged structure appear in several contexts, including optimal control, verification of deep neural networks, and isotonic regression. Off-the-shelf solvers can solve these problems but may scale poorly. We develop a nonconvex reformulation designed to exploit this staged structure. Our reformulation has only simple bound constraints, enabling solution via projected gradient methods and their accelerated variants. The method automatically generates a sequence of primal and dual feasible solutions to the original convex problem, making optimality certification easy.
Review for NeurIPS paper: Deep Statistical Solvers
The paper proposes new theoretical results regarding universal approximation property of graph convolutional neural networks and uses and trains them for (approximately) solving optimization problems defined on graphs, in particular arising from a discretization of PDEs. The solver is trained directly from the model energy. The paper was recognized by reviewers as having an interesting contribution and meeting the quality standards. The authors are invited to submit the final version including the rebuttal points, addressing all minor revision issues and the literature connections mentioned. Showing the applicability boundaries by studying failure cases is also highly appreciated.
Reviews: Scalable Global Optimization via Local Bayesian Optimization
Major * I found this paper to be very exciting, presenting a promising methodology addressing some of the most critical bottlenecks of Bayesian Optimization, with a focus on large data sets (being therefore relevant for high-dimensional BO as well, where sample sizes typically need to be substantially increased with the dimension). So, one is far from filling the space, right? Not using these for some good reason is one thing, but putting it the way it is put here sounds like it is not possible to go batch-sequential with EI... * In the main contributions presented throughout Section 3, two main ideas are confounded here: splitting the data so as to obtain local models AND using TS as infill criterion. Which is (most) responsible for improved performances over the state of the art? Minor (selected points) * Page 1: What does "outputscales" mean?
Review for NeurIPS paper: Fair regression with Wasserstein barycenters
Weaknesses: My biggest worry is that I'm not sure whether this work adds significantly new contributions compared to the previous literature that uses optimal transport theory for fair classification. It seems like it's the modification of the post-processing approach in "Wasserstein Fair Classification" (Jiang et al). I would be happy to increase the score if the authors can highlight some challenges faced in updating approaches from previous work to this regression problem and how these challenges are not trivial. I wish there was a little more discussion about looking at these fairness constrained optimization problems through the lens of optimal transport theory; the paper only considered demographic parity, but maybe a discussion of why it is or is not immediate this approach may work for other fairness notions, such as equalized odds (appropriately're-defined' for the regression problem). Also, I wish whether it's possible to allow for some slack when considering demographic parity (difference can be at most some epsilon).