Optimization
Orientation-Aware Model Predictive Control with Footstep Adaptation for Dynamic Humanoid Walking
Ding, Yanran, Khazoom, Charles, Chignoli, Matthew, Kim, Sangbae
This paper proposes a novel orientation-aware model predictive control (MPC) for dynamic humanoid walking that can plan footstep locations online. Instead of a point-mass model, this work uses the augmented single rigid body model (aSRBM) to enable the MPC to leverage orientation dynamics and stepping strategy within a unified optimization framework. With the footstep location as part of the decision variables in the aSRBM, the MPC can reason about stepping within the kinematic constraints. A task-space controller (TSC) tracks the body pose and swing leg references output from the MPC, while exploiting the full-order dynamics of the humanoid. The proposed control framework is suitable for real-time applications since both MPC and TSC are formulated as quadratic programs. Simulation investigations show that the orientation-aware MPC-based framework is more robust against external torque disturbance compared to state-of-the-art controllers using the point mass model, especially when the torso undergoes large angular excursion. The same control framework can also enable the MIT Humanoid to overcome uneven terrains, such as traversing a wave field.
Online Stochastic Optimization with Wasserstein Based Non-stationarity
Jiang, Jiashuo, Li, Xiaocheng, Zhang, Jiawei
We consider a general online stochastic optimization problem with multiple budget constraints over a horizon of finite time periods. In each time period, a reward function and multiple cost functions are revealed, and the decision maker needs to specify an action from a convex and compact action set to collect the reward and consume the budget. Each cost function corresponds to the consumption of one budget. In each period, the reward and cost functions are drawn from an unknown distribution, which is non-stationary across time. The objective of the decision maker is to maximize the cumulative reward subject to the budget constraints. This formulation captures a wide range of applications including online linear programming and network revenue management, among others. In this paper, we consider two settings: (i) a data-driven setting where the true distribution is unknown but a prior estimate (possibly inaccurate) is available; (ii) an uninformative setting where the true distribution is completely unknown. We propose a unified Wasserstein-distance based measure to quantify the inaccuracy of the prior estimate in setting (i) and the non-stationarity of the system in setting (ii). We show that the proposed measure leads to a necessary and sufficient condition for the attainability of a sublinear regret in both settings. For setting (i), we propose a new algorithm, which takes a primal-dual perspective and integrates the prior information of the underlying distributions into an online gradient descent procedure in the dual space. The algorithm also naturally extends to the uninformative setting (ii). Under both settings, we show the corresponding algorithm achieves a regret of optimal order. In numerical experiments, we demonstrate how the proposed algorithms can be naturally integrated with the re-solving technique to further boost the empirical performance.
An Exact Algorithm for Semi-supervised Minimum Sum-of-Squares Clustering
Piccialli, Veronica, Russo, Anna Russo, Sudoso, Antonio M.
The minimum sum-of-squares clustering (MSSC), or k-means type clustering, is traditionally considered an unsupervised learning task. In recent years, the use of background knowledge to improve the cluster quality and promote interpretability of the clustering process has become a hot research topic at the intersection of mathematical optimization and machine learning research. The problem of taking advantage of background information in data clustering is called semi-supervised or constrained clustering. In this paper, we present a branch-and-cut algorithm for semi-supervised MSSC, where background knowledge is incorporated as pairwise must-link and cannot-link constraints. For the lower bound procedure, we solve the semidefinite programming relaxation of the MSSC discrete optimization model, and we use a cutting-plane procedure for strengthening the bound. For the upper bound, instead, by using integer programming tools, we use an adaptation of the k-means algorithm to the constrained case. For the first time, the proposed global optimization algorithm efficiently manages to solve real-world instances up to 800 data points with different combinations of must-link and cannot-link constraints and with a generic number of features. This problem size is about four times larger than the one of the instances solved by state-of-the-art exact algorithms.
Distributed Projection-free Algorithm for Constrained Aggregative Optimization
In this paper, we focus on solving a distributed convex aggregative optimization problem in a network, where each agent has its own cost function which depends not only on its own decision variables but also on the aggregated function of all agents' decision variables. The decision variable is constrained within a feasible set. In order to minimize the sum of the cost functions when each agent only knows its local cost function, we propose a distributed Frank-Wolfe algorithm based on gradient tracking for the aggregative optimization problem where each node maintains two estimates, namely an estimate of the sum of agents' decision variable and an estimate of the gradient of global function. The algorithm is projection-free, but only involves solving a linear optimization to get a search direction at each step. We show the convergence of the proposed algorithm for convex and smooth objective functions over a time-varying network. Finally, we demonstrate the convergence and computational efficiency of the proposed algorithm via numerical simulations.
3D Labeling Tool
Rachwan, John, Zalaket, Charbel
Training and testing supervised object detection models require a large collection of images with ground truth labels. Labels define object classes in the image, as well as their locations, shape, and possibly other information such as pose. The labeling process has proven extremely time consuming, even with the presence of manpower. We introduce a novel labeling tool for 2D images as well as 3D triangular meshes: 3D Labeling Tool (3DLT). This is a standalone, feature-heavy and cross-platform software that does not require installation and can run on Windows, macOS and Linux-based distributions. Instead of labeling the same object on every image separately like current tools, we use depth information to reconstruct a triangular mesh from said images and label the object only once on the aforementioned mesh. We use registration to simplify 3D labeling, outlier detection to improve 2D bounding box calculation and surface reconstruction to expand labeling possibility to large point clouds. Our tool is tested against state of the art methods and it greatly surpasses them in terms of speed while preserving accuracy and ease of use.
Personalized Promotion Decision Making Based on Direct and Enduring Effect Predictions
Yang, Jie, Li, Yilin, Jobson, Deddy
Promotions have been trending in the e-commerce marketplace to build up customer relationships and guide customers towards the desired actions. Since incentives are effective to engage customers and customers have different preferences for different types of incentives, the demand for personalized promotion decision making is increasing over time. However, research on promotion decision making has focused specifically on purchase conversion during the promotion period (the direct effect), while generally disregarding the enduring effect in the post promotion period. To achieve a better lift return on investment (lift ROI) on the enduring effect of the promotion and improve customer retention and loyalty, we propose a framework of multiple treatment promotion decision making by modeling each customer's direct and enduring response. First, we propose a customer direct and enduring effect (CDEE) model which predicts the customer direct and enduring response. With the help of the predictions of the CDEE, we personalize incentive allocation to optimize the enduring effect while keeping the cost under the budget. To estimate the effect of decision making, we apply an unbiased evaluation approach of business metrics with randomized control trial (RCT) data. We compare our method with benchmarks using two promotions in Mercari and achieve significantly better results.
Annealed Training for Combinatorial Optimization on Graphs
Sun, Haoran, Guha, Etash K., Dai, Hanjun
The hardness of combinatorial optimization (CO) problems hinders collecting solutions for supervised learning. However, learning neural networks for CO problems is notoriously difficult in lack of the labeled data as the training is easily trapped at local optima. In this work, we propose a simple but effective annealed training framework for CO problems. In particular, we transform CO problems into unbiased energy-based models (EBMs). We carefully selecting the penalties terms so as to make the EBMs as smooth as possible. Then we train graph neural networks to approximate the EBMs. To prevent the training from being stuck at local optima near the initialization, we introduce an annealed loss function. An experimental evaluation demonstrates that our annealed training framework obtains substantial improvements. In four types of CO problems, our method achieves performance substantially better than other unsupervised neural methods on both synthetic and real-world graphs.
Learning for MPC with Stability & Safety Guarantees
The combination of learning methods with Model Predictive Control (MPC) has attracted a significant amount of attention in the recent literature. The hope of this combination is to reduce the reliance of MPC schemes on accurate models, and to tap into the fast developing machine learning and reinforcement learning tools to exploit the growing amount of data available for many systems. In particular, the combination of reinforcement learning and MPC has been proposed as a viable and theoretically justified approach to introduce explainable, safe and stable policies in reinforcement learning. However, a formal theory detailing how the safety and stability of an MPC-based policy can be maintained through the parameter updates delivered by the learning tools is still lacking. This paper addresses this gap. The theory is developed for the generic Robust MPC case, and applied in simulation in the robust tube-based linear MPC case, where the theory is fairly easy to deploy in practice. The paper focuses on Reinforcement Learning as a learning tool, but it applies to any learning method that updates the MPC parameters online.
Towards Fairness-Aware Multi-Objective Optimization
Yu, Guo, Ma, Lianbo, Du, Wei, Du, Wenli, Jin, Yaochu
Recent years have seen the rapid development of fairness-aware machine learning in mitigating unfairness or discrimination in decision-making in a wide range of applications. However, much less attention has been paid to the fairness-aware multi-objective optimization, which is indeed commonly seen in real life, such as fair resource allocation problems and data driven multi-objective optimization problems. This paper aims to illuminate and broaden our understanding of multi-objective optimization from the perspective of fairness. To this end, we start with a discussion of user preferences in multi-objective optimization and then explore its relationship to fairness in machine learning and multi-objective optimization. Following the above discussions, representative cases of fairness-aware multiobjective optimization are presented, further elaborating the importance of fairness in traditional multi-objective optimization, data-driven optimization and federated optimization. Finally, challenges and opportunities in fairness-aware multi-objective optimization are addressed. We hope that this article makes a small step forward towards understanding fairness in the context of optimization and promote research interest in fairness-aware multi-objective optimization.
On Controller Tuning with Time-Varying Bayesian Optimization
Brunzema, Paul, von Rohr, Alexander, Trimpe, Sebastian
Changing conditions or environments can cause system dynamics to vary over time. To ensure optimal control performance, controllers should adapt to these changes. When the underlying cause and time of change is unknown, we need to rely on online data for this adaptation. In this paper, we will use time-varying Bayesian optimization (TVBO) to tune controllers online in changing environments using appropriate prior knowledge on the control objective and its changes. Two properties are characteristic of many online controller tuning problems: First, they exhibit incremental and lasting changes in the objective due to changes to the system dynamics, e.g., through wear and tear. Second, the optimization problem is convex in the tuning parameters. Current TVBO methods do not explicitly account for these properties, resulting in poor tuning performance and many unstable controllers through over-exploration of the parameter space. We propose a novel TVBO forgetting strategy using Uncertainty-Injection (UI), which incorporates the assumption of incremental and lasting changes. The control objective is modeled as a spatio-temporal Gaussian process (GP) with UI through a Wiener process in the temporal domain. Further, we explicitly model the convexity assumptions in the spatial dimension through GP models with linear inequality constraints. In numerical experiments, we show that our model outperforms the state-of-the-art method in TVBO, exhibiting reduced regret and fewer unstable parameter configurations.