Search
Minimax Value Interval for Off-Policy Evaluation and Policy Optimization
We study minimax methods for off-policy evaluation (OPE) using value functions and marginalized importance weights. Despite that they hold promises of overcoming the exponential variance in traditional importance sampling, several key problems remain: (1) They require function approximation and are generally biased. For the sake of trustworthy OPE, is there anyway to quantify the biases? In this paper we answer both questions positively. By slightly altering the derivation of previous methods (one from each style), we unify them into a single value interval that comes with a special type of double robustness: when either the value-function or the importance-weight class is well specified, the interval is valid and its length quantifies the misspecification of the other class.
Reviews: DetNAS: Backbone Search for Object Detection
This paper proposes a neural network search strategy for object detection task. The problem is interesting and useful for many real applications. This paper gives a three stage solution that can search pre-training based detectors effectively and efficiently. Experiments on both COCO and VOC are conducted to show the effectiveness of the proposed solution, and detection based models are superior than classification based models. The idea of searching network structure for detection with pre-training stage is novel and interesting.
Boosting MCTS with Free Energy Minimization
Dao, Mawaba Pascal, Peter, Adrian M.
Active Inference, grounded in the Free Energy Principle, provides a powerful lens for understanding how agents balance exploration and goal-directed behavior in uncertain environments. Here, we propose a new planning framework, that integrates Monte Carlo Tree Search (MCTS) with active inference objectives to systematically reduce epistemic uncertainty while pursuing extrinsic rewards. Our key insight is that MCTS already renowned for its search efficiency can be naturally extended to incorporate free energy minimization by blending expected rewards with information gain. Concretely, the Cross-Entropy Method (CEM) is used to optimize action proposals at the root node, while tree expansions leverage reward modeling alongside intrinsic exploration bonuses. This synergy allows our planner to maintain coherent estimates of value and uncertainty throughout planning, without sacrificing computational tractability. Empirically, we benchmark our planner on a diverse set of continuous control tasks, where it demonstrates performance gains over both standalone CEM and MCTS with random rollouts.
Reviews: Theoretical Analysis of Adversarial Learning: A Minimax Approach
Originality: I find the approach original and interesting, I find that other works have been cited and the section of related work is written clearly and detailed, it gives a nice overview. I think only that it is important to highlight more clearly the differences between [40] and the current work. In particular, it is unclear what is the penalty parameter, and how their method of adversarial training relates to this work - do they optimize a different bound or what quantities do they optimize, and do these quantities show up in the proposed bound? Quality: the work seems complete, and sound for as far as I could check. I could not check all the proofs in detail but I read the work in great detail.
Review for NeurIPS paper: Unreasonable Effectiveness of Greedy Algorithms in Multi-Armed Bandit with Many Arms
Additional Feedback: Post-rebuttal comments: I've read the rebuttal and other reviews. The authors have addressed most of my concerns and hence I increase my score. I hope the authors would make the suggested edits in the revised version and explain the role of their main assumption. Can you explain why things fail if this assumption does not hold? Can you make use of a prior (in the case it is informative)?
Review for NeurIPS paper: Unreasonable Effectiveness of Greedy Algorithms in Multi-Armed Bandit with Many Arms
All reviewers agree that the paper considers a problem of relevance (bandits with many arms) and shows interesting results about simple-to-implement learning algorithms based on the greedy principle. However, one lingering concern that arose during the discussions among the reviewers was whether/how the results obtained in the paper applied for the case when the number of arms is larger than the time horizon of the game (k T). It appears that the author response to this question has not been substantial. Though I can see that this will not be an issue -- the proof of Lemma 2 bounds regret with respect to the best possible reward of 1, the author(s) is/are requested to add a precise clarification of this regime in the updated version.
Reviews: Learning to Perform Local Rewriting for Combinatorial Optimization
After rebuttal: The discussion of the method applicability in the rebuttal is convinced for me. I upgrade my score to 7. This paper proposes a learning-based approach for combinatorial optimization problems. Starting from an initial complete solution of the problem, several local rewriting updates are applied to the solution iteratively. In each rewriting step, a local region and an updating rule are picked to update the solution and two networks are trained by reinforcement learning to pick local regions and updating rules.
Reviews: Learning Local Search Heuristics for Boolean Satisfiability
This work is original in its use of deep reinforcement learning and graph neural networks to learn novel search control heuristics for SAT solving. While the techniques used are not novel themselves, the application domain is. The authors do a good job of surveying related work in this area and situating their contributions in this landscape. The paper is well-written and I found it very easy to follow the details of the proposed approach and the authors' results. Technically, the work presented is solid, though I have a few comments/suggestions here.
Reviews: Learning Local Search Heuristics for Boolean Satisfiability
The reviewers were positive about this paper based upon their initial read. The authors response addressed their concerns, so they were even more comfortable with a positive outcome after the author response. I encourage the authors to incorporate their responses to the reviewer concerns into any final version of the paper.