open problem
Hardness of Online Sleeping Combinatorial Optimization Problems
We show that several online combinatorial optimization problems that admit efficient no-regret algorithms become computationally hard in the sleeping setting where a subset of actions becomes unavailable in each round. Specifically, we show that the sleeping versions of these problems are at least as hard as PAC learning DNF expressions, a long standing open problem. We show hardness for the sleeping versions of Online Shortest Paths, Online Minimum Spanning Tree, Online k-Subsets, Online k-Truncated Permutations, Online Minimum Cut, and Online Bipartite Matching. The hardness result for the sleeping version of the Online Shortest Paths problem resolves an open problem presented at COLT 2015 [Koolen et al., 2015].
- North America > United States (0.14)
- Asia > China > Hubei Province > Wuhan (0.04)
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
- North America > Canada (0.04)
- Information Technology > Data Science > Data Mining > Big Data (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (0.97)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Performance Bounds for Policy-Based Average Reward Reinforcement Learning Algorithms
Many policy-based reinforcement learning (RL) algorithms can be viewed as instantiations of approximate policy iteration (PI), i.e., where policy improvement and policy evaluation are both performed approximately. In applications where the average reward objective is the meaningful performance metric, often discounted reward formulations are used with the discount factor being close to $1,$ which is equivalent to making the expected horizon very large.