Goto

Collaborating Authors

 Search


Efficient Beam Search for Large Language Models Using Trie-Based Decoding

arXiv.org Artificial Intelligence

In Transformer-based sequence-to-sequence generation, beam search has proven effective in enhancing the quality of generated sequences compared to greedy decoding. Conventional beam search methods typically adopt either a sequential or batch-based approach. The sequential approach, while memory-efficient, requires multiple decoding passes to construct a complete search tree, leading to significantly slower inference. On the other hand, the batch-based approach enables parallel computation across beams, but at the expense of high memory consumption due to the need to maintain separate key-value (KV) caches for each beam. In this study, we introduce a novel trie (prefix-tree)-based parallel decoding method that addresses the memory inefficiency of batch-based beam search. By sharing a single KV cache among all beams that share the same prefix, the proposed method not only reduces memory consumption dramatically but also enables parallel decoding across all branches. This innovative use of a prefix tree offers an efficient alternative for beam search, achieving significant memory savings while preserving inference speed, making it particularly well-suited for memory-constrained environments or large-scale model deployments.


Investigating the Monte-Carlo Tree Search Approach for the Job Shop Scheduling Problem

arXiv.org Artificial Intelligence

The Job Shop Scheduling Problem (JSSP) is a well-known optimization problem in manufacturing, where the goal is to determine the optimal sequence of jobs across different machines to minimize a given objective. In this work, we focus on minimising the weighted sum of job completion times. We explore the potential of Monte Carlo Tree Search (MCTS), a heuristic-based reinforcement learning technique, to solve large-scale JSSPs, especially those with recirculation. We propose several Markov Decision Process (MDP) formulations to model the JSSP for the MCTS algorithm. In addition, we introduce a new synthetic benchmark derived from real manufacturing data, which captures the complexity of large, non-rectangular instances often encountered in practice. Our experimental results show that MCTS effectively produces good-quality solutions for large-scale JSSP instances, outperforming our constraint programming approach.


Using Code Generation to Solve Open Instances of Combinatorial Design Problems

arXiv.org Artificial Intelligence

The Handbook of Combinatorial Designs catalogs many types of combinatorial designs, together with lists of open instances for which existence has not yet been determined. We develop a constructive protocol CPro1, which uses Large Language Models (LLMs) to generate code that constructs combinatorial designs and resolves some of these open instances. The protocol starts from a definition of a particular type of design, and a verifier that reliably confirms whether a proposed design is valid. The LLM selects strategies and implements them in code, and scaffolding provides automated hyperparameter tuning and execution feedback using the verifier. Most generated code fails, but by generating many candidates, the protocol automates exploration of a variety of standard methods (e.g.


Generative quantum combinatorial optimization by means of a novel conditional generative quantum eigensolver

arXiv.org Artificial Intelligence

Quantum computing is entering a transformative phase with the emergence of logical quantum processors, which hold the potential to tackle complex problems beyond classical capabilities. While significant progress has been made, applying quantum algorithms to real-world problems remains challenging. Hybrid quantum-classical techniques have been explored to bridge this gap, but they often face limitations in expressiveness, trainability, or scalability. In this work, we introduce conditional Generative Quantum Eigensolver (conditional-GQE), a context-aware quantum circuit generator powered by an encoder-decoder Transformer. Focusing on combinatorial optimization, we train our generator for solving problems with up to 10 qubits, exhibiting nearly perfect performance on new problems. By leveraging the high expressiveness and flexibility of classical generative models, along with an efficient preference-based training scheme, conditional-GQE provides a generalizable and scalable framework for quantum circuit generation. Our approach advances hybrid quantum-classical computing and contributes to accelerate the transition toward fault-tolerant quantum computing.


COS(M+O)S: Curiosity and RL-Enhanced MCTS for Exploring Story Space via Language Models

arXiv.org Artificial Intelligence

We present COS(M+O)S, a System 2-inspired framework for open-ended plot development that systematically explores the vast space of possible story expansions, enabling a 3B-parameter language model to approach the plot quality of a 70B model on select short-story tasks. The method accomplishes this by combining Monte Carlo Tree Search (MCTS), guided by a step-level value model that rewards moderate surprisal (curiosity) while penalizing incoherence, and Odds Ratio Preference Optimization (ORPO) to fine-tune the policy on high-value plot expansions. This iterative reinforcement learning loop systematically explores multiple candidate plot branches, backpropagates quality signals, and adapts the policy for faster convergence, notably shifting the policy from puzzle-based Chain-of-Thought to more character-driven storytelling. In small-scale tests with short-story prompts, 67%-77% of participants favored COS(M+O)S's highest-rated expansions over lower-rated ones, suggesting that our learned value function aligns. GPT-4o ratings further show that COS(M+O)S surpasses naive single-pass decoding from Llama 3.2 3B by 0.59 SD, coming within 0.06 SD of Llama 3.1 70B (no significant difference, p=0.93). Pairwise comparisons with o1 place COS(M+O)S 1.5 SD above the 3B baseline and find no statistically significant gap from 70B. Nevertheless, absolute story quality remains modest, constrained by the small model's capacity and limited training data.


Review for NeurIPS paper: Online learning with dynamics: A minimax perspective

Neural Information Processing Systems

Post-rebuttal: I am satisfied with the rebuttal. I am interested to know more about "One reason why such rates are common in online learning is the connection of the sequential Rademacher complexity with uniform convergence of martingale difference sequences in the corresponding Banach space (see [2] for details)." If the paper is accepted and space is allowed, I suggest to elaborate on this more thoroughly.



Reviews: Improved Regret Bounds for Bandit Combinatorial Optimization

Neural Information Processing Systems

In particular, the gap in the analysis is due to my mis-reading the formula, and the response convinced me. However, the paper overall looks incremental, so it is a paper nice to have, but its acceptance seems to be depending on the quality of other papers.] The paper studies the bandit combinatorial optimization problem and improve the lower bound of the problem from \Omega(\sqrt{dk 3T/log T}) in the prior work [8] to \Omega(\sqrt{dk 3T}), removing a factor of 1/\sqrt{\log T} . This makes the regret dependency on T and k, d tight up to a logarithmic factor. The analysis is built upon prior work [2,8], with the major innovation being a design of new distribution of loss vectors (given in Eq.(8)) that leads to a better lower bound.



Reviews: Bayesian Optimization with Unknown Search Space

Neural Information Processing Systems

Applying Bayesian optimization to expensive black-box problems needs to specify the bound of search space. However, when tackling a completely new problem, there is no prior knowledge to guarantee that the specified search space contains the global optimum. The paper proposes an approach to deal with this situation. In the approach, the user first specifies an initial search space; then the bound of search space automatically expands as the iteration proceeds; finally the algorithm will return a solution achieving \epsilon-accuracy. The key is how to expand the search space.