Search
What is in the model? A Comparison of variable selection criteria and model search approaches
Xu, Shuangshuang, Ferreira, Marco A. R., Tegge, Allison N.
What is in the model? Abstract For many scientific questions, understanding the underlying mechanism is the goal. To help investigators better understand the underlying mechanism, variable selection is a crucial step that permits the identification of the most associated regression variables of interest. A variable selection method consists of model evaluation using an information criterion and a search of the model space. Here, we provide a comprehensive comparison of variable selection methods using performance measures of correct identification rate (CIR), recall, and false discovery rate (FDR). We consider the BIC and AIC for evaluating models, and exhaustive, greedy, LASSO path, and stochastic search approaches for searching the model space; we also consider LASSO using cross validation. We perform simulation studies for linear and generalized linear models that parametrically explore a wide range of realistic sample sizes, effect sizes, and correlations among regression variables. We consider model spaces with a small and larger number of potential regressors. The results show that the exhaustive search BIC and stochastic search BIC outperform the other methods when considering the performance measures on small and large model spaces, respectively. These approaches result in the highest CIR and lowest FDR, which collectively may support long-term efforts towards increasing replicability in research.
FormalML: A Benchmark for Evaluating Formal Subgoal Completion in Machine Learning Theory
Yang, Xiao-Wen, Zhang, Zihao, Cao, Jianuo, Zhou, Zhi, Li, Zenan, Guo, Lan-Zhe, Yao, Yuan, Chen, Taolue, Li, Yu-Feng, Ma, Xiaoxing
Large language models (LLMs) have recently demonstrated remarkable progress in formal theorem proving. Yet their ability to serve as practical assistants for mathematicians, filling in missing steps within complex proofs, remains underexplored. We identify this challenge as the task of subgoal completion, where an LLM must discharge short but nontrivial proof obligations left unresolved in a human-provided sketch. To study this problem, we introduce FormalML, a Lean 4 benchmark built from foundational theories of machine learning. Using a translation tactic that converts procedural proofs into declarative form, we extract 4937 problems spanning optimization and probability inequalities, with varying levels of difficulty. FormalML is the first subgoal completion benchmark to combine premise retrieval and complex research-level contexts. Evaluation of state-of-the-art provers highlights persistent limitations in accuracy and efficiency, underscoring the need for more capable LLM-based theorem provers for effective subgoal completion,
Decoupling Geometry from Optimization in 2D Irregular Cutting and Packing Problems: an Open-Source Collision Detection Engine
Gardeyn, Jeroen, Berghe, Greet Vanden, Wauters, Tony
Addressing irregular cutting and packing (C&P) optimization problems poses two distinct challenges: the geometric challenge of determining whether or not an item can be placed feasibly at a certain position, and the optimization challenge of finding a good solution according to some objective function. Until now, those tackling such problems have had to address both challenges simultaneously, requiring two distinct sets of expertise and a lot of research & development effort. One way to lower this barrier is to decouple the two challenges. In this paper we introduce a powerful collision detection engine (CDE) for 2D irregular C&P problems which assumes full responsibility for the geometric challenge. The CDE (i) allows users to focus with full confidence on their optimization challenge by abstracting geometry away and (ii) enables independent advances to propagate to all optimization algorithms built atop it. We present a set of core principles and design philosophies to model a general and adaptable CDE focused on maximizing performance, accuracy and robustness. These principles are accompanied by a concrete open-source implementation called jagua-rs. This paper together with its implementation serves as a catalyst for future advances in irregular C&P problems by providing a solid foundation which can either be used as it currently exists or be further improved upon. Funding: This research was supported by the Research Foundation -- Flanders (FWO) under grant number 1S71222N and K804824N.
To Backtrack or Not to Backtrack: When Sequential Search Limits Model Reasoning
Qin, Tian, Alvarez-Melis, David, Jelassi, Samy, Malach, Eran
Recent advancements in large language models (LLMs) have significantly improved their reasoning abilities, particularly through techniques involving search and backtracking. Backtracking naturally scales test-time compute by enabling sequential, linearized exploration via long chain-of-thought (CoT) generation. However, this is not the only strategy for scaling test time-compute: parallel sampling with best-of-N selection provides an alternative that generates diverse solutions simultaneously. Despite the growing adoption of sequential search, its advantages over parallel sampling-especially under a fixed compute budget-remain poorly understood. In this paper, we systematically compare these two approaches on two challenging reasoning tasks: CountDown and Sudoku. Surprisingly, we find that sequential search underperforms parallel sampling on CountDown but outperforms it on Sudoku, suggesting that backtracking is not universally beneficial. We identify two factors that can cause backtracking to degrade performance: (1) training on fixed search traces can lock models intro suboptimal strategies, and (2) explicit CoT supervision can discourage implicit (non verbalized) reasoning. Extending our analysis to reinforcement learning (RL), we show that models with backtracking capabilities benefit significantly from RL fine-tuning, while models without backtracking see limited, mixed gains. Together, these findings challenge the assumption that backtracking universally enhances LLM reasoning, instead revealing a complex interaction between task structure, training data, model scale, and learning paradigm.
38af86134b65d0f10fe33d30dd76442e-Reviews.html
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. The paper under review, Variational Guided Policy Search introduces a new approach of how classical policy search can be combined and improved with trajectory optimization methods serving as exploration strategy. An optimization criteria with the goal of finding optimal policy parameters is decomposed with a variational approach. The variational distribution is approximated as Gaussian distribution which allows a solution with the iterative LQR algorithm. The overall algorithm uses expectation maximization to iterate between minimizing the KL divergence of the variational decomposition and maximizing the lower bound with respect to the policy parameters.
Thanks for recognizing our novelty and performance
Thank reviewers for detailed comments. By "generic" we mean the model can be applied to different tasks that share the same problem structure. We're happy to revise the terminology and highlight what applications TGSL is appropriate for, namely, those where the input and output show certain resemblance. We mentioned in Line 269 that SA+MM cannot achieve reasonable performance. To our best knowledge, we are the first to work in this direction.
1579779b98ce9edb98dd85606f2c119d-Reviews.html
"NIPS 2013 Neural Information Processing Systems December 5 - 10, Lake Tahoe, Nevada, USA",,, "Paper ID:","1046" "Title:","Convergence of Monte Carlo Tree Search in Simultaneous Move Games" Reviews First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper studies Monte Carlo tree search in zero-sum extensive form games with perfect information and simultaneous moves. It is proved that the MCTS algorithm converges to an approximate Nash equilibrium under certain conditions. Empirical study confirms the formal result. The detailed comments are as follows. The result is useful and the presentation is clear.