our responses to individual questions and comments. Reviewer # 1 We thank for your positive feedback
We really appreciate the time and expertise you have invested in these reviews. We thank for your positive feedback. It is still possible that there is another method to prove the result for regression. Presentation of Algorithm 2: We will make Algorithm 2 more formal and make the proof of Theorem 8 more readable. A multi-class classification algorithm based on ordinal regression machine.) Thanks for raising this issue, and we will update Section 2.3 to clarify that the Realizable or agnostic settinig: For the sake of clear presentation, we only discussed the realizable setting in the paper.
Model-Based Transfer Learning for Contextual Reinforcement Learning
Deep reinforcement learning (RL) is a powerful approach to complex decision making. However, one issue that limits its practical application is its brittleness, sometimes failing to train in the presence of small changes in the environment. Motivated by the success of zero-shot transfer--where pre-trained models perform well on related tasks--we consider the problem of selecting a good set of training tasks to maximize generalization performance across a range of tasks. Given the high cost of training, it is critical to select training tasks strategically, but not well understood how to do so. We hence introduce Model-Based Transfer Learning (MBTL), which layers on top of existing RL methods to effectively solve contextual RL problems. MBTL models the generalization performance in two parts: 1) the performance set point, modeled using Gaussian processes, and 2) performance loss (generalization gap), modeled as a linear function of contextual similarity. MBTL combines these two pieces of information within a Bayesian optimization (BO) framework to strategically select training tasks. We show theoretically that the method exhibits sublinear regret in the number of training tasks and discuss conditions to further tighten regret bounds.
53fde96fcc4b4ce72d7739202324cd49-AuthorFeedback.pdf
Effect (i) was observed in several papers (e.g. Since our goal is to also achieve (i), we focus on the conditioned models for (ii) in Tab. 4 and use the We have now conducted additional experiments for Tab. 4 using As requested, we have filled some gaps in the tables: For Tab. 1 Yes, all models define a distribution on the same variables. R3: The only baseline is based on [13]. In Tab. 1 above, we also compare Model (3) does capture the correlations of error vectors within each region via the error term ɛ.
Interpolation Technique to Speed Up Gradients Propagation in Neural ODEs
We propose a simple interpolation-based method for the efficient approximation of gradients in neural ODE models. We compare it with the reverse dynamic method (known in the literature as "adjoint method") to train neural ODEs on classification, density estimation, and inference approximation tasks. We also propose a theoretical justification of our approach using logarithmic norm formalism. As a result, our method allows faster model training than the reverse dynamic method that was confirmed and validated by extensive numerical experiments for several standard benchmarks.
53c6de78244e9f528eb3e1cda69699bb-AuthorFeedback.pdf
We would like to thank the reviewers for their comments and suggestions. In particular, TimeNet is a seq2seq method relying on an antoencoding loss and using LSTMs as encoder and decoder. TimeNet, and notably do not scale to long time series (as explained on lines 144-157), unlike ours. However, we did perform experiments on some datasets with different loss variants. We will add insights on this matter to the paper.
Transition Constrained Bayesian Optimization via Markov Decision Processes
Bayesian optimization is a methodology to optimize black-box functions. Traditionally, it focuses on the setting where you can arbitrarily query the search space. However, many real-life problems do not offer this flexibility; in particular, the search space of the next query may depend on previous ones. Example challenges arise in the physical sciences in the form of local movement constraints, required monotonicity in certain variables, and transitions influencing the accuracy of measurements.
Contextual Decision-Making with Knapsacks Beyond the Worst Case Rui Ai School of Computer Science
We study the framework of a dynamic decision-making scenario with resource constraints. In this framework, an agent, whose target is to maximize the total reward under the initial inventory, selects an action in each round upon observing a random request, leading to a reward and resource consumptions that are further associated with an unknown random external factor. While previous research has already established an Õ( T) worst-case regret for this problem, this work offers two results that go beyond the worst-case perspective: one for the worst-case gap between benchmarks and another for logarithmic regret rates. We first show that an Ω( T) distance between the commonly used fluid benchmark and the online optimum is unavoidable when the former has a degenerate optimal solution. On the algorithmic side, we merge the re-solving heuristic with distribution estimation skills and propose an algorithm that achieves an Õ(1) regret as long as the fluid LP has a unique and non-degenerate solution. Furthermore, we prove that our algorithm maintains a near-optimal Õ( T) regret even in the worst cases and extend these results to the setting where the request and external factor are continuous. Regarding information structure, our regret results are obtained under two feedback models, respectively, where the algorithm accesses the external factor at the end of each round and at the end of a round only when a non-null action is executed.
Procrastinating with Confidence: Near-Optimal, Anytime, Adaptive Algorithm Configuration
Robert Kleinberg, Kevin Leyton-Brown, Brendan Lucier, Devon Graham
Algorithm configuration methods optimize the performance of a parameterized heuristic algorithm on a given distribution of problem instances. Recent work introduced an algorithm configuration procedure ("Structured Procrastination") that provably achieves near optimal performance with high probability and with nearly minimal runtime in the worst case. It also offers an anytime property: it keeps tightening its optimality guarantees the longer it is run. Unfortunately, Structured Procrastination is not adaptive to characteristics of the parameterized algorithm: it treats every input like the worst case. Follow-up work ("LeapsAndBounds") achieves adaptivity but trades away the anytime property. This paper introduces a new algorithm, "Structured Procrastination with Confidence", that preserves the near-optimality and anytime properties of Structured Procrastination while adding adaptivity. In particular, the new algorithm will perform dramatically faster in settings where many algorithm configurations perform poorly. We show empirically both that such settings arise frequently in practice and that the anytime property is useful for finding good configurations quickly.