Sudoku
Causal language modeling can elicit search and reasoning capabilities on logic puzzles Kulin Shah
Causal language modeling using the Transformer architecture has yielded remarkable capabilities in Large Language Models (LLMs) over the last few years. However, the extent to which fundamental search and reasoning capabilities emerged within LLMs remains a topic of ongoing debate. In this work, we study if causal language modeling can learn a complex task such as solving Sudoku puzzles. To solve a Sudoku, the model is first required to search over all empty cells of the puzzle to decide on a cell to fill and then apply an appropriate strategy to fill the decided cell. Sometimes, the application of a strategy only results in thinning down the possible values in a cell rather than concluding the exact value of the cell. In such cases, multiple strategies are applied one after the other to fill a single cell. We observe that Transformer models trained on this synthetic task can indeed learn to solve Sudokus (our model solves 94.21% of the puzzles fully correctly) when trained on a logical sequence of steps taken by a solver. We find that training Transformers with the logical sequence of steps is necessary and without such training, they fail to learn Sudoku. We also extend our analysis to Zebra puzzles (known as Einstein puzzles) and show that the model solves 92.04% of the puzzles fully correctly.
Techniques for Symbol Grounding with SATNet
Many experts argue that the future of artificial intelligence is limited by the field's ability to integrate symbolic logical reasoning into deep learning architectures. The recently proposed differentiable MAXSAT solver, SATNet, was a breakthrough in its capacity to integrate with a traditional neural network and solve visual reasoning problems. For instance, it can learn the rules of Sudoku purely from image examples. Despite its success, SATNet was shown to succumb to a key challenge in neurosymbolic systems known as the Symbol Grounding Problem: the inability to map visual inputs to symbolic variables without explicit supervision ("label leakage"). In this work, we present a self-supervised pre-training pipeline that enables SATNet to overcome this limitation, thus broadening the class of problems that SATNet architectures can solve to include datasets where no intermediary labels are available at all. We demonstrate that our method allows SATNet to attain full accuracy even with a harder problem setup that prevents any label leakage. We additionally introduce a proofreading method that further improves the performance of SATNet architectures, beating the state-of-the-art on Visual Sudoku.
Causal language modeling can elicit search and reasoning capabilities on logic puzzles
Causal language modeling using the Transformer architecture has yielded remarkable capabilities in Large Language Models (LLMs) over the last few years. However, the extent to which fundamental search and reasoning capabilities emerged within LLMs remains a topic of ongoing debate. In this work, we study if causal language modeling can learn a complex task such as solving Sudoku puzzles. To solve a Sudoku, the model is first required to search over all empty cells of the puzzle to decide on a cell to fill and then apply an appropriate strategy to fill the decided cell. Sometimes, the application of a strategy only results in thinning down the possible values in a cell rather than concluding the exact value of the cell.
Review for NeurIPS paper: Assessing SATNet's Ability to Solve the Symbol Grounding Problem
Summary and Contributions: EDIT after author response: I thank the authors for their thoughtful response, and am happy to see the authors will address the issues around the framing/tone of the paper. I also like the suggestion to include a discussion of the different definitions of end-to-end; this discussion is likely to be of great value to the community, given the overloaded definition of end-to-end. To address these claims/elaborate further -- * SATNet-specific: In cases where the logical rules aren't known, it's not possible to use a discrete solving module, since the rules need to be learned. That is, using a classifier and a separate Sudoku solver wouldn't work for the SATNet experiments, as the premise of the SATNet experiment was that the rules of Sudoku weren't known. This allows learning to, e.g., be more data-efficient.
Assessing SATNet's Ability to Solve the Symbol Grounding Problem
SATNet is an award-winning MAXSAT solver that can be used to infer logical rules and integrated as a differentiable layer in a deep neural network [1]. It had been shown to solve Sudoku puzzles visually from examples of puzzle digit images, and was heralded as an impressive achievement towards the longstanding AI goal of combining pattern recognition with logical reasoning. In this paper, we clarify SATNet's capabilities by showing that in the absence of intermediate labels that identify individual Sudoku digit images with their logical representations, SATNet completely fails at visual Sudoku (0% test accuracy). More generally, the failure can be pinpointed to its inability to learn to assign symbols to perceptual phenomena, also known as the symbol grounding problem [2], which has long been thought to be a prerequisite for intelligent agents to perform real-world logical reasoning. We propose an MNIST based test as an easy instance of the symbol grounding problem that can serve as a sanity check for differentiable symbolic solvers in general.
Techniques for Symbol Grounding with SATNet
Many experts argue that the future of artificial intelligence is limited by the field's ability to integrate symbolic logical reasoning into deep learning architectures. The recently proposed differentiable MAXSAT solver, SATNet, was a breakthrough in its capacity to integrate with a traditional neural network and solve visual reasoning problems. For instance, it can learn the rules of Sudoku purely from image examples. Despite its success, SATNet was shown to succumb to a key challenge in neurosymbolic systems known as the Symbol Grounding Problem: the inability to map visual inputs to symbolic variables without explicit supervision ("label leakage"). In this work, we present a self-supervised pre-training pipeline that enables SATNet to overcome this limitation, thus broadening the class of problems that SATNet architectures can solve to include datasets where no intermediary labels are available at all.
Evaluating SAT and SMT Solvers on Large-Scale Sudoku Puzzles
Modern SMT solvers have revolutionized the approach to constraint satisfaction problems by integrating advanced theory reasoning and encoding techniques. In this work, we evaluate the performance of modern SMT solvers in Z3, CVC5 and DPLL(T) against a standard SAT solver in DPLL. By benchmarking these solvers on novel, diverse 25x25 Sudoku puzzles of various difficulty levels created by our improved Sudoku generator, we examine the impact of advanced theory reasoning and encoding techniques. Our findings demonstrate that modern SMT solvers significantly outperform classical SAT solvers. This work highlights the evolution of logical solvers and exemplifies the utility of SMT solvers in addressing large-scale constraint satisfaction problems.
Mathematical Definition and Systematization of Puzzle Rules
Maeda, Itsuki, Inoue, Yasuhiro
While logic puzzles have engaged individuals through problem-solving and critical thinking, the creation of new puzzle rules has largely relied on ad-hoc processes. Pencil puzzles, such as Slitherlink and Sudoku, represent a prominent subset of these games, celebrated for their intellectual challenges rooted in combinatorial logic and spatial reasoning. Despite extensive research into solving techniques and automated problem generation, a unified framework for systematic and scalable rule design has been lacking. Here, we introduce a mathematical framework for defining and systematizing pencil puzzle rules. This framework formalizes grid elements, their positional relationships, and iterative composition operations, allowing for the incremental construction of structures that form the basis of puzzle rules. Furthermore, we establish a formal method to describe constraints and domains for each structure, ensuring solvability and coherence. Applying this framework, we successfully formalized the rules of well-known Nikoli puzzles, including Slitherlink and Sudoku, demonstrating the formal representation of a significant portion (approximately one-fourth) of existing puzzles. These results validate the potential of the framework to systematize and innovate puzzle rule design, establishing a pathway to automated rule generation. By providing a mathematical foundation for puzzle rule creation, this framework opens avenues for computers, potentially enhanced by AI, to design novel puzzle rules tailored to player preferences, expanding the scope of puzzle diversity. Beyond its direct application to pencil puzzles, this work illustrates how mathematical frameworks can bridge recreational mathematics and algorithmic design, offering tools for broader exploration in logic-based systems, with potential applications in educational game design, personalized learning, and computational creativity.
Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning
Ye, Jiacheng, Gao, Jiahui, Gong, Shansan, Zheng, Lin, Jiang, Xin, Li, Zhenguo, Kong, Lingpeng
Autoregressive language models, despite their impressive capabilities, struggle with complex reasoning and long-term planning tasks. We introduce discrete diffusion models as a novel solution to these challenges. Through the lens of subgoal imbalance, we demonstrate how diffusion models effectively learn difficult subgoals that elude autoregressive approaches. We propose Multi-granularity Diffusion Modeling (MDM), which prioritizes subgoals based on difficulty during learning. On complex tasks like Countdown, Sudoku, and Boolean Satisfiability Problems, MDM significantly outperforms autoregressive models without using search techniques. For instance, MDM achieves 91.5\% and 100\% accuracy on Countdown and Sudoku, respectively, compared to 45.8\% and 20.7\% for autoregressive models. Our work highlights the potential of diffusion-based approaches in advancing AI capabilities for sophisticated language understanding and problem-solving tasks.
Controllable Generation via Locally Constrained Resampling
Ahmed, Kareem, Chang, Kai-Wei, Broeck, Guy Van den
Autoregressive models have demonstrated an unprecedented ability at modeling the intricacies of natural language. However, they continue to struggle with generating complex outputs that adhere to logical constraints. Sampling from a fully-independent distribution subject to a constraint is hard. Sampling from an autoregressive distribution subject to a constraint is doubly hard: We have to contend not only with the hardness of the constraint but also the distribution's lack of structure. We propose a tractable probabilistic approach that performs Bayesian conditioning to draw samples subject to a constraint. Our approach considers the entire sequence, leading to a more globally optimal constrained generation than current greedy methods. Starting from a model sample, we induce a local, factorized distribution which we can tractably condition on the constraint. To generate samples that satisfy the constraint, we sample from the conditional distribution, correct for biases in the samples and resample. The resulting samples closely approximate the target distribution and are guaranteed to satisfy the constraints. We evaluate our approach on several tasks, including LLM detoxification and solving Sudoku puzzles. We show that by disallowing a list of toxic expressions our approach is able to steer the model's outputs away from toxic generations, outperforming similar approaches to detoxification. We conclude by showing that our approach achieves a perfect accuracy on Sudoku compared to <50% for GPT4-o and Gemini 1.5.