Search
Reinforcement Learning Constrained Beam Search for Parameter Optimization of Paper Drying Under Flexible Constraints
Chen, Siyuan, Yu, Hanshen, Yagoobi, Jamal, Shao, Chenhui
Existing approaches to enforcing design constraints in Reinforcement Learning (RL) applications often rely on training-time penalties in the reward function or training/inference-time invalid action masking, but these methods either cannot be modified after training, or are limited in the types of constraints that can be implemented. To address this limitation, we propose Reinforcement Learning Constrained Beam Search (RLCBS) for inference-time refinement in combinatorial optimization problems. This method respects flexible, inference-time constraints that support exclusion of invalid actions and forced inclusion of desired actions, and employs beam search to maximize sequence probability for more sensible constraint incorporation. RLCBS is extensible to RL-based planning and optimization problems that do not require real-time solution, and we apply the method to optimize process parameters for a novel modular testbed for paper drying. An RL agent is trained to minimize energy consumption across varying machine speed levels by generating optimal dryer module and air supply temperature configurations. Our results demonstrate that RLCBS outperforms NSGA-II under complex design constraints on drying module configurations at inference-time, while providing a 2.58-fold or higher speed improvement.
Reviews: Local Minimax Complexity of Stochastic Convex Optimization
While I would certainly agree with the claim that developing local minimax bounds is of interest, I feel that the interest of considering risks for 2-point subproblems is unclear. Note that many (probably, a majority of) known bounds in convex stochastic optimization are obtained on 2-point subproblems (see, e.g. In other words, the minimax risks (up to an absolute factor) for general classes of convex problems are attained already on the hardest 2-point subproblems (1-parametric families), so that, in the paper notation, R_T(\cal F)\sim \sup_{f\in \cal F} R_T(f, \cal F) On the other hand, it is unclear for me if the notion of R_T(f, \cal F) is of interest for its own sake (not as just a technical tool to construct a more general lower bound). For instance, it is not clear why and when the "2-point risk" R_T(f, \cal F) is an adequate measure of local complexity. Note that this is generally not the case in nonparametric estimation (except for few situations such as the case of linear functional estimation in [4-6]).
Reviews: Global Optimality of Local Search for Low Rank Matrix Recovery
This is a nice result. I am going to list a few nits I had about the paper as I read along. I think addressing some of these points would improve the presentation of the paper. There are a few cases which are not covered by the results. For instance, strict-saddle in noisy case local min are close to global in high rank, noisy case. A discussion about why these cases are not covered would be nice; I am assuming that it is not just straightforward modification of the current proof? 2. In practice, I believe that random init gradient descent without noise is sufficient.
Reviews: A Minimax Approach to Supervised Learning
The technical results appear to be correct and the experimental results (which I think are quite preliminary) suggest the minimax SVM might be a good idea. I think the idea of robust Bayes decision rules makes sense and the authors show how under squared loss a connection to the Huber loss emerges. My main comment is that the paper itself is a somewhat difficult read due to terseness at key places, which might limit the impact of the paper. So, the rest of my comments are just geared towards improving the clarity of the paper. Technically, in every instance where the authors apply Danskin's theorem, it was not really clear what form of Danskin's theorem was being used, and therefore it was difficult to follow the derivation.
Reviews: Minimax Estimation of Maximum Mean Discrepancy with Radial Kernels
The paper is very interesting as it provides the first known lower bounds for the minimax rate of estimation of the MMD distance between two distributions based on finite random samples, considering general radial kernels, which match with the known upper bounds up to constants. Theses bounds lead to a classical parametric rate of estimation, not depending on the dimension of the data. The text could however be more fluent, and easier to read for nonspecialists of the minimax estimation theory. For instance, the results are all expressed with respect to the minimax probabilities. Could the link with the more classical minimax risk be clearly written?
Reviews: Hardness of Online Sleeping Combinatorial Optimization Problems
The paper solves an open problem presented at COLT 2015 [Koolen et. It defines OSCO problems as online combinatorial optimization problems in sleeping setting in which in each trial a subset of actions become unavailable due to unavailability of a subset of elements. Basically, the paper shows that OSCO problems are as hard as PAC-Learning of DNF expressions -- which is a long standing open problem. In a nutshell, the hardness result is shown as follows: the paper reduces the problem of "Online Agnostic Learnign of Disjuction" into "Hard Instances" of OSCO problems with per-action regret via a proof that has similarities to a proof in [Kanade and Steinke, 2014]. Interestingly enough, because of the online-to-batch conversion argument in [Kanade and Steinke, 2014], the hardness results seem to be also true for a benign form of adversary namely stochastic availabilities and loss (i.e. they are drawn from an unknown but fixed joint distribution). After proving the general hardness results, the paper provides "Hard Instances" of various well-known OSCO problems to establish their hardness in particularv -- including Online Sabotaged Shortest Path.
Reviews: Total Variation Classes Beyond 1d: Minimax Rates, and the Limitations of Linear Smoothers
This paper includes several results regarding the problem of function denoising over a grid of dimension 1. In particular, demonstrating that simple linear methods are not sufficient for denoising functions of bounded variation and that TV denoisers are minimax optimal (up to a log factor). Overall, I feel these are solid contributions which should be of interest to the mathematical statistics, machine learning and signal processing communities. My main criticism is that the simulations shown in Figure 3 of the main text and Figure 1 of the supplementary use rather artificial signals. The paper would be much stronger if simulations on real data sets would be performed (such as 2D images, 3D volumetric scans, etc.) and see real-world performance of TV denoising vs. Laplacian smoothing and Laplacian eigenmaps.
Reviews: A Non-convex One-Pass Framework for Generalized Factorization Machine and Rank-One Matrix Sensing
Major comments -------------- * An obvious major issue with this paper is the lack of experiments. How does the algorithm compare to gradient-based local search algorithms? I would be curious to see if it works better on i) Gaussian distributed data and ii) real data. Even if the results turn out to be similar, perhaps the authors can find some advantages to their algorithm such as, e.g., robustness to initialization. Recent works [1, 2] have called this variant "polynomial network" so the authors might want to use this name instead of "factorization machine".
Reviews: Finding significant combinations of features in the presence of categorical covariates
The paper is well written and clearly organized, especially where it motivates the problem and the goals of the paper. It is a novel point of view to aim to find all feature subsets for which a statistical association test rejects the null hypothesis, and thus allowing to correct for a confounding categorical covariate. The authors results in the scope of the paper demonstrates that FACS keeps the computational efficiency, statistical power and the ability to correct for multiple hypothesis testing of existing method. The introduction of the branch-and-bound algorithm for conditional statistical association tests is novel and well-explained. Along with the results, the authors provide good intuition behind the methods and the key aspects, such as the appropriate testability criterion for the CMH test and the pruning criterion for the CMH test.
Monte Carlo Tree Search with Boltzmann Exploration
Monte-Carlo Tree Search (MCTS) methods, such as Upper Confidence Bound applied to Trees (UCT), are instrumental to automated planning techniques. However, UCT can be slow to explore an optimal action when it initially appears inferior to other actions. Maximum ENtropy Tree-Search (MENTS) incorporates the maximum entropy principle into an MCTS approach, utilising Boltzmann policies to sample actions, naturally encouraging more exploration. In this paper, we highlight a major limitation of MENTS: optimal actions for the maximum entropy objective do not necessarily correspond to optimal actions for the original objective. We introduce two algorithms, Boltzmann Tree Search (BTS) and Decaying ENtropy Tree-Search (DENTS), that address these limitations and preserve the benefits of Boltzmann policies, such as allowing actions to be sampled faster by using the Alias method. Our empirical analysis shows that our algorithms show consistent high performance across several benchmark domains, including the game of Go.