input number
Seemingly Simple Planning Problems are Computationally Challenging: The Countdown Game
Katz, Michael, Kokel, Harsha, Sreedharan, Sarath
There is a broad consensus that the inability to form long-term plans is one of the key limitations of current foundational models and agents. However, the existing planning benchmarks remain woefully inadequate to truly measure their planning capabilities. Most existing benchmarks either focus on loosely defined tasks like travel planning or end up leveraging existing domains and problems from international planning competitions. While the former tasks are hard to formalize and verify, the latter were specifically designed to test and challenge the weaknesses of existing automated planners. To address these shortcomings, we propose a procedure for creating a planning benchmark centered around the game called Countdown, where a player is expected to form a target number from a list of input numbers through arithmetic operations. We discuss how this problem meets many of the desiderata associated with an ideal benchmark for planning capabilities evaluation. Specifically, the domain allows for an intuitive, natural language description for each problem instance, it is computationally challenging (NP-complete), and the instance space is rich enough that we do not have to worry about memorization. We perform an extensive theoretical analysis, establishing the computational complexity result and demonstrate the advantage of our instance generation procedure over public benchmarks. We evaluate a variety of existing LLM-assisted planning methods on instances generated using our procedure. Our results show that, unlike other domains like 24 Game (a special case of Countdown), our proposed dynamic benchmark remains extremely challenging for existing LLM-based approaches.
- North America > United States > Colorado > Larimer County > Fort Collins (0.04)
- North America > United States > California > Santa Clara County > San Jose (0.04)
A Graph-Partitioning Based Continuous Optimization Approach to Semi-supervised Clustering Problems
Liu, Wei, Liu, Xin, Ng, Michael K., Zhang, Zaikun
Semi-supervised clustering is a basic problem in various applications. Most existing methods require knowledge of the ideal cluster number, which is often difficult to obtain in practice. Besides, satisfying the must-link constraints is another major challenge for these methods. In this work, we view the semi-supervised clustering task as a partitioning problem on a graph associated with the given dataset, where the similarity matrix includes a scaling parameter to reflect the must-link constraints. Utilizing a relaxation technique, we formulate the graph partitioning problem into a continuous optimization model that does not require the exact cluster number, but only an overestimate of it. We then propose a block coordinate descent algorithm to efficiently solve this model, and establish its convergence result. Based on the obtained solution, we can construct the clusters that theoretically meet the must-link constraints under mild assumptions. Furthermore, we verify the effectiveness and efficiency of our proposed method through comprehensive numerical experiments.
- Asia > China > Hong Kong (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > New York (0.04)
- (5 more...)
Stream of Search (SoS): Learning to Search in Language
Gandhi, Kanishk, Lee, Denise, Grand, Gabriel, Liu, Muxin, Cheng, Winson, Sharma, Archit, Goodman, Noah D.
Language models are rarely shown fruitful mistakes while training. They then struggle to look beyond the next token, suffering from a snowballing of errors and struggling to predict the consequence of their actions several steps ahead. In this paper, we show how language models can be taught to search by representing the process of search in language, as a flattened string -- a stream of search (SoS). We propose a unified language for search that captures an array of different symbolic search strategies. We demonstrate our approach using the simple yet difficult game of Countdown, where the goal is to combine input numbers with arithmetic operations to reach a target number. We pretrain a transformer-based language model from scratch on a dataset of streams of search generated by heuristic solvers. We find that SoS pretraining increases search accuracy by 25% over models trained to predict only the optimal search trajectory. We further finetune this model with two policy improvement methods: Advantage-Induced Policy Alignment (APA) and Self-Taught Reasoner (STaR). The finetuned SoS models solve 36% of previously unsolved problems, including problems that cannot be solved by any of the heuristic solvers. Our results indicate that language models can learn to solve problems via search, self-improve to flexibly use different search strategies, and potentially discover new ones.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Asia > Middle East > Jordan (0.04)
Reverse That Number! Decoding Order Matters in Arithmetic Learning
Zhang-Li, Daniel, Lin, Nianyi, Yu, Jifan, Zhang, Zheyuan, Yao, Zijun, Zhang, Xiaokang, Hou, Lei, Zhang, Jing, Li, Juanzi
Recent advancements in pretraining have demonstrated that modern Large Language Models (LLMs) possess the capability to effectively learn arithmetic operations. However, despite acknowledging the significance of digit order in arithmetic computation, current methodologies predominantly rely on sequential, step-by-step approaches for teaching LLMs arithmetic, resulting in a conclusion where obtaining better performance involves fine-grained step-by-step. Diverging from this conventional path, our work introduces a novel strategy that not only reevaluates the digit order by prioritizing output from the least significant digit but also incorporates a step-by-step methodology to substantially reduce complexity. We have developed and applied this method in a comprehensive set of experiments. Compared to the previous state-of-the-art (SOTA) method, our findings reveal an overall improvement of in accuracy while requiring only a third of the tokens typically used during training. For the purpose of facilitating replication and further research, we have made our code and dataset publicly available at \url{https://anonymous.4open.science/r/RAIT-9FB7/}.
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- (2 more...)
Language Models Understand Numbers, at Least Partially
Zhu, Fangwei, Dai, Damai, Sui, Zhifang
Large language models (LLMs) have exhibited impressive competence in various tasks, but their opaque internal mechanisms hinder their use in mathematical problems. In this paper, we study a fundamental question: whether language models understand numbers, a basic element in math. Based on an assumption that LLMs should be capable of compressing numbers in their hidden states to solve mathematical problems, we construct a synthetic dataset comprising addition problems and utilize linear probes to read out input numbers from the hidden states. Experimental results support the existence of compressed numbers in LLMs. However, it is difficult to precisely reconstruct the original numbers, indicating that the compression process may not be lossless. Further experiments show that LLMs can utilize encoded numbers to perform arithmetic computations, and the computational ability scales up with the model size. Our preliminary research suggests that LLMs exhibit a partial understanding of numbers, offering insights for future investigations about the models' mathematical capability.
Understanding uncertainty and the value of visualisation in AI
Maths PhD student, Alex Terenin, recently presented his group's work at the 2021 International Conference of Artificial Intelligence and Statistics. AISTATS is a prestigious event that brings together researchers from the machine learning and statistics communities. One of the group's papers, Matérn Gaussian Processes on Graphs, won the Best Student Paper award at the event – congratulations! We caught up with Alex to find out more about his experience at the conference and as a PhD student at Imperial, why he's fascinated with research into uncertainty, and to get his thoughts on why the visual aspect of machine learning is vital. My research focuses on artificial intelligence, particularly on learning-based decision-making systems.
- North America > United States > California > Santa Barbara County > Santa Barbara (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
- Education (0.49)
- Government (0.30)
Generating Pusheen with AI
The BEGAN model uses a loss equation based on the Wasserstein distance, except its goal is to minimize the absolute value of the autoencoder losses on the real and fake images instead of the images themselves. In practice it drops the absolute value and minimizes the reconstruction loss on real images minus the reconstruction loss on fake images. Additionally, it introduces a weighting term on the fake reconstruction loss which changes proportaional to the difference between the fake and real reconstruction losses; this serves to maintain a balance between the discriminator and generator so one does not easily win over the other. I'll be releasing the code soon but to summarize, the architectures and hyperparameters that typically worked well with for the discriminator and generator were: I did a fair amount of hyperparameter searching to get a model that worked (keep scrolling for some failures) but I think it ended up being a pretty decent cat generator. Below is a training video of one of the models that I use in the demos, where every 250 steps I take 16 samples from the generator.