Computational Learning Theory
Smart Cubing for Graph Search: A Comparative Study
Kirchweger, Markus, Xia, Hai, Peitl, Tomáš, Szeider, Stefan
Propositional satisfiability (SAT) solvers based on conflict-driven clause learning can solve huge instances with millions of variables and clauses [Fichte et al., 2023a]. However, for hard instances, particularly in combinatorial problems, parallelization becomes necessary. The cube-and-conquer technique has proven highly effective for such problems, most notably in resolving the Pythagorean triples conjecture [Heule et al., 2016]. In cube-and-conquer, a look-ahead solver first partitions the search space into disjoint subproblems via cubes (partial assignments), which are then solved independently by CDCL solvers. This independence enables efficient parallel solving. When encoding combinatorial problems into SAT, particularly those involving graphs, we often encounter highly symmetric search spaces. Many mutually isomorphic graphs satisfy the same constraints, but a solver needs to check only one representative, the canonical element, from each isomorphism class. Standard CDCL solvers cannot leverage these symmetries, and static symmetry breaking methods cannot break all symmetries [Codish et al., 2019]. SAT Modulo Symmetries (SMS) [Kirchweger and Szeider, 2021; Kirchweger and Szeider, 2024] addresses this limitation through dynamic symmetry breaking, using a custom propagator that learns symmetry-breaking predicates during the search.
Review for NeurIPS paper: Can Q-Learning with Graph Networks Learn a Generalizable Branching Heuristic for a SAT Solver?
Weaknesses: There have been several "proof-of-concept" papers using deep learning for SAT. This paper is an interesting paper, but yet another proof of concept, using relatively small size SAT instances. The paper falls short in terms of showing a true potential for improving the state of the art of Sat Solvers. The experimental section is limited: they mainly consider random 3SAT instances (at the phase transition, which is good) but they are relatively small instances (up to 250 variables when SAT solvers can solve considerably larger problems for satisfiable instances with thousands of variables and millions of clauses and hundreds of variables and thousands of clauses for unsat (see e.g., 2016 SAT competition)). They also consider graph coloring instances but again not very large size problems.
Review for NeurIPS paper: Optimal Learning from Verified Training Data
Summary and Contributions: This paper considers a problem of linear regression where some of your inputs are strategically corrupted. In particular, the algorithm is given some vector data x, which it uses to approximate y by w.x for some learned value w. However, an adversary can corrupt the value of x to x in an attempt to make the predicted value closer to z. Given a training set of triples (x,y,z) the objective is to learn a w that does as well as possible despite these corruptions. Formally, the adversary sends the algorithm x that minimizes w.x -z 2 gamma x-x 2. In particular, they try to minimize the error between the prediction and x without making x too far from z.
Review for NeurIPS paper: Optimal Learning from Verified Training Data
This paper focuses on solving least squares regression problem in the setting where some of the inputs are strategically corrupted. The authors show that certain special setting of the general problem, there exists an algorithm that achieves a provably optimal solution. Overall, the results are interesting and target an improved problems. The main concerns are regarding the significance of the results (as the setting targeted here seems to be fairly narrow) and providing a proper context wrt previous work (this hopefully can be remedied in the revision).
Review for NeurIPS paper: Learnability with Indirect Supervision Signals
Weaknesses: 1. Are there lower bounds to match the main upper bound for the main result, Thm. 4.2, Eq. 2? If there was only a single T which was the identity then "no" because you can get the faster rate from realizable PAC learning. How would this need to be generalized to have matching upper and lower bounds? At least a discussion of the matter would help make the limitations of the present work more clear. E.g. if there are only 2 indirect labels and 4 real labels, show a concrete example of learning the true labelling funciton for the more complex case.... Maybe just have linear 1-D thresholds with fixed distances between the transitions discontinuity in classes, with noisy subsetting. Both working through this analytically, to give intuition for why we can learn more labels with less labels is possible, and some empirical results would be useful.
Decision Making in Changing Environments: Robustness, Query-Based Learning, and Differential Privacy
We study the problem of interactive decision making in which the underlying environment changes over time subject to given constraints. We propose a framework, which we call \textit{hybrid Decision Making with Structured Observations} (hybrid DMSO), that provides an interpolation between the stochastic and adversarial settings of decision making. Within this framework, we can analyze local differentially private (LDP) decision making, query-based learning (in particular, SQ learning), and robust and smooth decision making under the same umbrella, deriving upper and lower bounds based on variants of the Decision-Estimation Coefficient (DEC). We further establish strong connections between the DEC's behavior, the SQ dimension, local minimax complexity, learnability, and joint differential privacy. To showcase the framework's power, we provide new results for contextual bandits under the LDP constraint.
Reviews: Covariate-Powered Empirical Bayes Estimation
This theory paper provides a number of novel results, including theoretical analysis of minimax bounds and an empirical analysis, for combinations of relatively simple statistical estimators and machine learning models of covariate information. The paper shows that these combinations improve on both the simple estimator alone and the machine learning model alone. The main concern raised by the reviewers is that the paper provides limited empirical validation. I disagree with this assessment, as the paper should be seen as a machine learning theory paper. As the proposed framework includes a number of advanced machine learning models, including XGBoost it should be very relevant for the NeurIPS community.
Reviews: Distribution-Independent PAC Learning of Halfspaces with Massart Noise
The paper gives a PAC learning algorithm for the basic problem of halfspaces in a model of learning with noise. The algorithm uses ideas from previous related results in the simpler model of random classification noise, with important new ideas. Learning with noise is a basic topic in learning theory. It can be argued that the most studied models (random misclassification noise and malicious noise) are unrealistically benign (even though the related SQ model is very important) or malicious, and there is a great need for the study of more realistic models. The Massart noise model is a candidate for such a model.
Reviews: Distribution-Independent PAC Learning of Halfspaces with Massart Noise
This is a very strong paper that makes impressive progress on the long-standing open problem of efficiently PAC learning halfspaces under the Massart noise model. While resolving the problem would involve getting within epsilon of the optimal error, achieving eta epsilon is a breakthrough and likely will fuel future results in learning theory.
Reviews: Globally Optimal Learning for Structured Elliptical Losses
Lemma 1) is incrementally built on previous work. Based on related work, it's hard to contextualize their work in the existing literature. The readers may have the following important questions. A clear explanation is needed for this. This work is significant in that they provide optimality proof that leads to more efficient optimization method for a wide range of robust elliptical losses including Gaussian, Generalized Gaussian, Huber, etc.