Search
Stronger NAS with Weaker Predictors Appendix A Implementation details of baselines methods
We show the runtime comparison of WeakNAS and its BO variant in Table 1. Improvement (EI) acquisition function [2] being extremely costly.Method Predictors Config Train proxy model (s/arch) Derive new samples (s/arch) WeakNAS MLP 4 layers @1000 hidden 8. 59 10 We compare the effect of using different architecture encodings in in Table 2. As shown in Table 3. We conduct a controlled experiment on NAS-Bench-201 by varying number of samples. Evolution [1] in all three subsets, with better stability indicated by confidence intervals.
A Appendix
A.1 Pseudocode for our search algorithm Our framework follows a standard search pipeline: 1. Candidate proposal: the search algorithm samples an optimizer from the search space. Search: The optimizer score is used to guide the search algorithm to propose new optimizers. Our set of operators is a subset of the full operator set presented in Section 4.1 of the NOS-RL paper. We refer the readers to "Further discussions on NOS-RL baseline" in Appendix D.3 for more For inspiration on what to add, the user might look into 1). With an augmented operator set, other components in our algorithm can largely remain the same.
Online Convex Optimization with Continuous Switching Constraint Guanghui Wang 1, Y uanyu Wan 1, 2, Tianbao Yang
In many sequential decision making applications, the change of decision would bring an additional cost, such as the wear-and-tear cost associated with changing server status. To control the switching cost, we introduce the problem of online convex optimization with continuous switching constraint, where the goal is to achieve a small regret given a budget on the overall switching cost. We first investigate the hardness of the problem, and provide a lower bound of order ( p T) when the switching cost budget S = ( p T), and (min { T/S, T }) when S = O ( p T), where T is the time horizon. The essential idea is to carefully design an adaptive adversary, who can adjust the loss function according to the cumulative switching cost of the player incurred so far based on the orthogonal technique. We then develop a simple gradient-based algorithm which enjoys the minimax optimal regret bound. Finally, we show that, for strongly convex functions, the regret bound can be improved to O (log T) for S = (log T), and O (min { T/ exp( S)+ S,T }) for S = O (log T) .
Minimax Regret for Stochastic Shortest Path
We study the Stochastic Shortest Path (SSP) problem in which an agent has to reach a goal state in minimum total expected cost. In the learning formulation of the problem, the agent has no prior knowledge about the costs and dynamics of the model. She repeatedly interacts with the model for K episodes, and has to minimize her regret.