Optimization
Sequential Learning of the Pareto Front for Multi-objective Bandits
Crépon, Elise, Garivier, Aurélien, Koolen, Wouter M
We study the problem of sequential learning of the Pareto front in multi-objective multi-armed bandits. An agent is faced with K possible arms to pull. At each turn she picks one, and receives a vector-valued reward. When she thinks she has enough information to identify the Pareto front of the different arm means, she stops the game and gives an answer. We are interested in designing algorithms such that the answer given is correct with probability at least 1-$\delta$. Our main contribution is an efficient implementation of an algorithm achieving the optimal sample complexity when the risk $\delta$ is small. With K arms in d dimensions p of which are in the Pareto set, the algorithm runs in time O(Kp^d) per round.
Joint Pricing and Resource Allocation: An Optimal Online-Learning Approach
Xu, Jianyu, Wang, Xuan, Wang, Yu-Xiang, Jiang, Jiashuo
The problem of dynamic pricing examines strategies of setting and adjusting prices in response to varying customer behaviors and market conditions. The mainstream of existing works on dynamic pricing, including Kleinberg and Leighton (2003); Broder and Rusmevichientong (2012); Cohen et al. (2020); Wang et al. (2021b), focuses on the estimation of demand curves while putting aside the decisions on the supply side. Another series of literature, including Besbes and Zeevi (2009); Chen et al. (2019, 2021a); Keskin et al. (2022), takes supply and inventories into account. However, these works simplify the supply cost as uniform and static, underestimating the difficulty of allocating products through sophisticated supply chains among multiple parties such as factories, warehouses, and retailers. On the other hand, the problem of resource allocation - to serve different demand classes with various types of resources - presents a complex challenge within the field of operations research. Analogous to online dynamic pricing, the recent proliferation of e-platforms has magnified the importance of developing online allocation algorithms that efficiently manage supply and demand on the fly while maximizing cumulative utilities.
Optimal Survey Design for Private Mean Estimation
Chen, Yu-Wei, Pasupathy, Raghu, Awan, Jordan A.
This work identifies the first privacy-aware stratified sampling scheme that minimizes the variance for general private mean estimation under the Laplace, Discrete Laplace (DLap) and Truncated-Uniform-Laplace (TuLap) mechanisms within the framework of differential privacy (DP). We view stratified sampling as a subsampling operation, which amplifies the privacy guarantee; however, to have the same final privacy guarantee for each group, different nominal privacy budgets need to be used depending on the subsampling rate. Ignoring the effect of DP, traditional stratified sampling strategies risk significant variance inflation. We phrase our optimal survey design as an optimization problem, where we determine the optimal subsampling sizes for each group with the goal of minimizing the variance of the resulting estimator. We establish strong convexity of the variance objective, propose an efficient algorithm to identify the integer-optimal design, and offer insights on the structure of the optimal design.
Review for NeurIPS paper: First Order Constrained Optimization in Policy Space
Strengths: The idea behind this work is very clear. Theorem 1 can be easily derived using duality, but the difficulty lies in how to solve it. The paper is creative in its simplification on the update of dual varaibles lambda and mu. Without this, the algorithm would be pretty much a primal-dual gradient method, which requires the primal part to be evaluated accurately enough before the update of dual variables, which is difficult in practice. For lambda, the paper takes advantage of its similarity to the temperature term in maximum entropy RL and thus fixes it during training.
Reviews: Tight Dimension Independent Lower Bound on the Expected Convergence Rate for Diminishing Step Sizes in SGD
Update: Thank you for the feedback, I have read it as well as other reviews. Compared with the vast literature on obtaining upper bounds on convergence rates of stochastic convex optimization problems, less work has been done towards deriving corresponding lower bounds that depict the fundamental hardness of these problems. This paper aims to fill this gap and proposes a general framework for comparing upper and lower bounds. The framework also suggests potential future research directions for obtaining better convergence rates. In addition, this paper proved, for all round t, a lower bound on the expected convergence rate of SGD over any diminishing step-size sequences when applied to strongly convex problems. This bound shows that the step-size schemes proposed in recent work (Gower et al. 2019, and Nguyen et al. 2018) are optimal within a dimension independent constant factor.
Reviews: Compiler Auto-Vectorization with Imitation Learning
This paper uses imitation learning to solve the compiler auto-vectorization problem. It trains an agent to mimic the optimal solution generated by an integer linear programming solver. It outperforms production-level compiler LLVM in the experiments. Originality: The novelty is incremental. This paper directly combines well-known techniques and does not make any new contribution from the machine learning perspective.
Reviews: A Primal Dual Formulation For Deep Learning With Constraints
The paper converts the constrained optimization problem to min-max optimization using Lagrangian function. To show the efficacy of the model, three experiments are conducted in 5.1 SRL, 5.2 NER and 5.3 Fine grained entity typing. The paper brings in a structured way of training with output constraints. However, I am not sure how much gain this model has on top of fixed weight on constraints (Metha et al 2018 & Diligenti et al 2017) with the provided experiments. Also while the experiments seem convincing as itself, it is hard to see how much significance this work brings in as the baselines significantly differ with related work. Also, it would give a better picture of this method if the paper could provide more analysis: an analysis on convergence, an analysis on experiment results on why more labeled data sometimes hurt, etc. [originality ] 1. The full Lagrangian expression and linking the output constraint to the model parameter and optimizing them with subgradient seems novel. However, how exactly the authors formulate f(w) is unclear to me. Is it just following the way Diligenti 2017 does it?
Optimization Landscapes Learned: Proxy Networks Boost Convergence in Physics-based Inverse Problems
Goyal, Girnar, Holl, Philipp, Agrawal, Sweta, Thuerey, Nils
Solving inverse problems in physics is central to understanding complex systems and advancing technologies in various fields. Iterative optimization algorithms, commonly used to solve these problems, often encounter local minima, chaos, or regions with zero gradients. This is due to their overreliance on local information and highly chaotic inverse loss landscapes governed by underlying partial differential equations (PDEs). In this work, we show that deep neural networks successfully replicate such complex loss landscapes through spatio-temporal trajectory inputs. They also offer the potential to control the underlying complexity of these chaotic loss landscapes during training through various regularization methods. We show that optimizing on network-smoothened loss landscapes leads to improved convergence in predicting optimum inverse parameters over conventional momentum-based optimizers such as BFGS on multiple challenging problems.
A scalable adaptive deep Koopman predictive controller for real-time optimization of mixed traffic flow
Lyu, Hao, Guo, Yanyong, Liu, Pan, Zheng, Nan, Wang, Ting
The use of connected automated vehicle (CAV) is advocated to mitigate traffic oscillations in mixed traffic flow consisting of CAVs and human driven vehicles (HDVs). This study proposes an adaptive deep Koopman predictive control framework (AdapKoopPC) for regulating mixed traffic flow. Firstly, a Koopman theory-based adaptive trajectory prediction deep network (AdapKoopnet) is designed for modeling HDVs car-following behavior. AdapKoopnet enables the representation of HDVs behavior by a linear model in a high-dimensional space. Secondly, the model predictive control is employed to smooth the mixed traffic flow, where the combination of the linear dynamic model of CAVs and linear prediction blocks from AdapKoopnet is embedded as the predictive model into the AdapKoopPC. Finally, the predictive performance of the prosed AdapKoopnet is verified using the HighD naturalistic driving dataset. Furthermore, the control performance of AdapKoopPC is validated by the numerical simulations. Results demonstrate that the AdapKoopnet provides more accuracy HDVs predicted trajectories than the baseline nonlinear models. Moreover, the proposed AdapKoopPC exhibits more effective control performance with less computation cost compared with baselines in mitigating traffic oscillations, especially at the low CAVs penetration rates. The code of proposed AdapKoopPC is open source.