cumulative constraint
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
- North America > United States > Pennsylvania (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Online convex optimization for cumulative constraints
We propose the algorithms for online convex optimization which lead to cumulative squared constraint violations of the form $\sum\limits_{t=1}^T\big([g(x_t)]_+\big)^2=O(T^{1-\beta})$, where $\beta\in(0,1)$. Previous literature has focused on long-term constraints of the form $\sum\limits_{t=1}^Tg(x_t)$. There, strictly feasible solutions can cancel out the effects of violated constraints. In contrast, the new form heavily penalizes large constraint violations and cancellation effects cannot occur. Furthermore, useful bounds on the single step constraint violation $[g(x_t)]_+$ are derived. For convex objectives, our regret bounds generalize existing bounds, and for strongly convex objectives we give improved regret bounds. In numerical experiments, we show that our algorithm closely follows the constraint boundary leading to low cumulative violation.
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
- North America > United States > Pennsylvania (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Reviews: Online convex optimization for cumulative constraints
In this setting our objective is to provide low regret with respect to the sequence of losses as well as with respect to the constraints. This setting is useful when one would like to avoid projections which might be costly. Previous works focused on the regret with the respect to the cumulative constraints, where constraint violation may be compensated by constrained satisfaction. In this work the authors consider a more appropriate measure of regret which accounts only for constraints violation. In this case, the authors come up with a new algorithm (similar in spirit to previous approaches), which provides regret guarantees with respect to the sum of squared constrained violations. They also extend their approach to the strongly convex setting.
Huang
Within the recently proposed Universal Booleanization framework, we consider the Cumulative constraint, for which the original Boolean encoding proves ineffective, and present a new Boolean encoding that causes the SAT solver to simulate, largely, the search strategy used by some of the best-performing native methods. Apart from providing motivation for future research in a similar direction, we obtain a significantly enhanced version of Universal Booleanization for problems containing Cumulative constraints.
Online convex optimization for cumulative constraints
Yuan, Jianjun, Lamperski, Andrew
There, strictly feasible solutions can cancel out the effects of violated constraints. In contrast, the new form heavily penalizes large constraint violations and cancellation effects cannot occur. Furthermore, useful bounds on the single step constraint violation $[g(x_t)]_ $ are derived. For convex objectives, our regret bounds generalize existing bounds, and for strongly convex objectives we give improved regret bounds. In numerical experiments, we show that our algorithm closely follows the constraint boundary leading to low cumulative violation. Papers published at the Neural Information Processing Systems Conference.
IPO: Interior-point Policy Optimization under Constraints
Liu, Yongshuai, Ding, Jiaxin, Liu, Xin
In this paper, we study reinforcement learning (RL) algorithms to solve real-world decision problems with the objective of maximizing the long-term reward as well as satisfying cumulative constraints. We propose a novel first-order policy optimization method, Interior-point Policy Optimization (IPO), which augments the objective with logarithmic barrier functions, inspired by the interior-point method. Our proposed method is easy to implement with performance guarantees and can handle general types of cumulative multi-constraint settings. We conduct extensive evaluations to compare our approach with state-of-the-art baselines. Our algorithm outperforms the baseline algorithms, in terms of reward maximization and constraint satisfaction. Introduction Recent advances have demonstrated significant potentials of deep reinforcement learning (RL) in solving complex sequential decision and control problems, e.g., the Atari game (Mnih et al. 2015), robotics (Andrychowicz et al. 2018), Go (Silver et al. 2016), etc. In such RL problems, the objective is to maximize the discounted cumulative reward. In many other problems, in addition to maximizing the reward, a policy needs to satisfy certain constraints.
- North America > United States > California > Yolo County > Davis (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
Distributed Online Linear Regression
Yuan, Deming, Proutiere, Alexandre, Shi, Guodong
We study online linear regression problems in a distributed setting, where the data is spread over a network. In each round, each network node proposes a linear predictor, with the objective of fitting the \emph{network-wide} data. It then updates its predictor for the next round according to the received local feedback and information received from neighboring nodes. The predictions made at a given node are assessed through the notion of regret, defined as the difference between their cumulative network-wide square errors and those of the best off-line network-wide linear predictor. Various scenarios are investigated, depending on the nature of the local feedback (full information or bandit feedback), on the set of available predictors (the decision set), and the way data is generated (by an oblivious or adaptive adversary). We propose simple and natural distributed regression algorithms, involving, at each node and in each round, a local gradient descent step and a communication and averaging step where nodes aim at aligning their predictors to those of their neighbors. We establish regret upper bounds typically in ${\cal O}(T^{3/4})$ when the decision set is unbounded and in ${\cal O}(\sqrt{T})$ in case of bounded decision set.
- Oceania > Australia > New South Wales > Sydney (0.14)
- Asia > Middle East > Jordan (0.04)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- (2 more...)