Goto

Collaborating Authors

 potion



Understanding the Staged Dynamics of Transformers in Learning Latent Structure

Saha, Rohan, Aminmansour, Farzane, Fyshe, Alona

arXiv.org Artificial Intelligence

While transformers can discover latent structure from context, the dynamics of how they acquire different components of the latent structure remain poorly understood. In this work, we use the Alchemy benchmark, to investigate the dynamics of latent structure learning. We train a small decoder-only transformer on three task variants: 1) inferring missing rules from partial contextual information, 2) composing simple rules to solve multi-step sequences, and 3) decomposing complex multi-step examples to infer intermediate steps. By factorizing each task into interpretable events, we show that the model acquires capabilities in discrete stages, first learning the coarse grained rules, before learning the complete latent structure. We also identify a crucial asymmetry, where the model can compose fundamental rules robustly, but struggles to decompose complex examples to discover the fundamental rules. These findings offer new insights into understanding how a transformer model learns latent structures, providing a granular view of how these capabilities evolve during training.



The mechanization of science illustrated by the Lean formalization of the multi-graded Proj construction

Mayeux, Arnaud, Zhang, Jujian

arXiv.org Artificial Intelligence

Arnaud Mayeux and Jujian Zhang Efforts to mechanize aspects of scientific reasoning have been intertwined with the development of science from its earliest days. C1 "Whenever we have a long, difficult piece of algebra, and we have them more and more often these days, we could at least get the machine to check that the algebra was right before we went on and built further stages of derivation on top. Some people are working on such programs for algebra checking right now." C2 "Now I leave the region of known processes and enter the land of speculation. We can, I believe, reasonably expect that an algebra checking routine would not be around very long before someone would adapt the methods of heuristics that are presently being developed to the problem of doing algebra in a more creative way. The machine could supply several steps at a time, and be given only a guiding thread of a proof. The more successful the heuristics, the fewer steps we would have to supply."


Hypothesis Network Planned Exploration for Rapid Meta-Reinforcement Learning Adaptation

Jacobson, Maxwell Joseph, Xue, Yexiang

arXiv.org Artificial Intelligence

Meta Reinforcement Learning (Meta RL) trains agents that adapt to fast-changing environments and tasks. Current strategies often lose adaption efficiency due to the passive nature of model exploration, causing delayed understanding of new transition dynamics. This results in particularly fast-evolving tasks being impossible to solve. We propose a novel approach, Hypothesis Network Planned Exploration (HyPE), that integrates an active and planned exploration process via the hypothesis network to optimize adaptation speed. HyPE uses a generative hypothesis network to form potential models of state transition dynamics, then eliminates incorrect models through strategically devised experiments. Evaluated on a symbolic version of the Alchemy game, HyPE outpaces baseline methods in adaptation speed and model accuracy, validating its potential in enhancing reinforcement learning adaptation in rapidly evolving settings.


Machine learning for potion development at Hogwarts

Kurz, Christoph F., König, Adriana N.

arXiv.org Artificial Intelligence

Objective: To determine whether machine learning methods can generate useful potion recipes for research and teaching at Hogwarts School of Witchcraft and Wizardry. Design: Using deep neural networks to classify generated recipes into a standard drug classification system. Setting: Hogwarts School of Witchcraft and Wizardry. Data sources: 72 potion recipes from the Hogwarts curriculum, extracted from the Harry Potter Wiki. Results: Most generated recipes fall into the categories of psychoanaleptics and dermatologicals. The number of recipes predicted for each category reflected the number of training recipes. Predicted probabilities were often above 90% but some recipes were classified into 2 or more categories with similar probabilities which complicates anticipating the predicted effects. Conclusions: Machine learning powered methods are able to generate potentially useful potion recipes for teaching and research at Hogwarts. This corresponds to similar efforts in the non-magical world where such methods have been applied to identify potentially effective drug combinations.


Learning How to Infer Partial MDPs for In-Context Adaptation and Exploration

Jiang, Chentian, Ke, Nan Rosemary, van Hasselt, Hado

arXiv.org Artificial Intelligence

To generalize across tasks, an agent should acquire knowledge from past tasks that facilitate adaptation and exploration in future tasks. We focus on the problem of in-context adaptation and exploration, where an agent only relies on context, i.e., history of states, actions and/or rewards, rather than gradient-based updates. Posterior sampling (extension of Thompson sampling) is a promising approach, but it requires Bayesian inference and dynamic programming, which often involve unknowns (e.g., a prior) and costly computations. To address these difficulties, we use a transformer to learn an inference process from training tasks and consider a hypothesis space of partial models, represented as small Markov decision processes that are cheap for dynamic programming. In our version of the Symbolic Alchemy benchmark, our method's adaptation speed and exploration-exploitation balance approach those of an exact posterior sampling oracle. We also show that even though partial models exclude relevant information from the environment, they can nevertheless lead to good policies.


10 great games to play on iPhone

Washington Post - Technology News

Card of Darkness is a deceptively difficult card game from designer Zach Gage (who made the Wordle-like puzzle game Knotwords) and "Adventure Time" creator Pendleton Ward. Available through Apple Arcade, Card of Darkness tasks players with cutting a path across a grid-based game board populated by decks of cards. To make it to the exit, players need to exhaust the decks in front of them by either picking up all the item cards, which include weapons and potions, or fighting the cards in the pile that represent deadly enemies. Once you've picked up a card from a deck, you have to defeat the whole deck, which can lead to some complicated choices: Do I pick up a much-needed sword or health potion at the top of the deck, knowing I may be opening myself up to several enemy cards underneath?


How to Learn and Represent Abstractions: An Investigation using Symbolic Alchemy

AlKhamissi, Badr, Srinivasan, Akshay, Nelson, Zeb-Kurth, Ritter, Sam

arXiv.org Artificial Intelligence

Alchemy is a new meta-learning environment rich enough to contain interesting abstractions, yet simple enough to make fine-grained analysis tractable. Further, Alchemy provides an optional symbolic interface that enables meta-RL research without a large compute budget. In this work, we take the first steps toward using Symbolic Alchemy to identify design choices that enable deep-RL agents to learn various types of abstraction. Then, using a variety of behavioral and introspective analyses we investigate how our trained agents use and represent abstract task variables, and find intriguing connections to the neuroscience of abstraction. We conclude by discussing the next steps for using meta-RL and Alchemy to better understand the representation of abstract variables in the brain.


Alchemy: A structured task distribution for meta-reinforcement learning

Wang, Jane X., King, Michael, Porcel, Nicolas, Kurth-Nelson, Zeb, Zhu, Tina, Deck, Charlie, Choy, Peter, Cassin, Mary, Reynolds, Malcolm, Song, Francis, Buttimore, Gavin, Reichert, David P., Rabinowitz, Neil, Matthey, Loic, Hassabis, Demis, Lerchner, Alexander, Botvinick, Matthew

arXiv.org Artificial Intelligence

There has been rapidly growing interest in meta-learning as a method for increasing the flexibility and sample efficiency of reinforcement learning. One problem in this area of research, however, has been a scarcity of adequate benchmark tasks. In general, the structure underlying past benchmarks has either been too simple to be inherently interesting, or too ill-defined to support principled analysis. In the present work, we introduce a new benchmark for meta-RL research, which combines structural richness with structural transparency. Alchemy is a 3D video game, implemented in Unity, which involves a latent causal structure that is resampled procedurally from episode to episode, affording structure learning, online inference, hypothesis testing and action sequencing based on abstract domain knowledge. We evaluate a pair of powerful RL agents on Alchemy and present an in-depth analysis of one of these agents. Results clearly indicate a frank and specific failure of meta-learning, providing validation for Alchemy as a challenging benchmark for meta-RL. Concurrent with this report, we are releasing Alchemy as public resource, together with a suite of analysis tools and sample agent trajectories.