prodigy
My Son's Math Homework Is Essentially Just Pokémon
My Son's Math Homework Is Essentially Just Pokémon Education games are taking over American classrooms. One afternoon earlier this year, my 11-year-old son was sitting at his laptop and working quietly on his math homework. At least, that's what he was supposed to be doing. When I glanced at his screen, equations were nowhere to be seen. He was controlling a monster in the midst of battle, casting magic spells to outduel an opposing player.
PRODIGY: Enabling In-context Learning Over Graphs
While large language models have demonstrated this ability, how in-context learning could be performed over graphs is unexplored. In this paper, we develop Pr etraining O ver D iverse I n-Context G raph S y stems (PRODIGY), the first pretraining framework that enables in-context learning over graphs.
'Infinite Jest' Is Back. Maybe Litbros Should Be, Too
The notoriously challenging book is being re-released for its 30th anniversary. Its fandom is annoying, sure--but at least they read. The host had been grilling Wallace, ostensibly invited on to discuss his own literary and journalistic output, on range of topics: tennis, teaching, why women don't like Westerns, depression, and, yes, Anthony Minghella's Academy Award-winning epic war drama, which had by the time the interview aired already become a punch line . Watching the interview, it's clear Wallace, who died by suicide in 2008, bristles at being pressed to purvey rank punditry on the popular culture at large like some kind of dancing monkey. But the exercise revealed how Rose, and large swaths of American intellectual culture circa the late-1990s, thought of Wallace.
PRODIGY: Enabling In-context Learning Over Graphs
In-context learning is the ability of a pretrained model to adapt to novel and diverse downstream tasks by conditioning on prompt examples, without optimizing any parameters. While large language models have demonstrated this ability, how in-context learning could be performed over graphs is unexplored.
PRODIGY: Enabling In-context Learning Over Graphs
While large language models have demonstrated this ability, how in-context learning could be performed over graphs is unexplored. In this paper, we develop Pr etraining O ver D iverse I n-Context G raph S y stems (PRODIGY), the first pretraining framework that enables in-context learning over graphs.
Benchmarking Optimizers for Large Language Model Pretraining
Semenov, Andrei, Pagliardini, Matteo, Jaggi, Martin
The recent development of Large Language Models (LLMs) has been accompanied by an effervescence of novel ideas and methods to better optimize the loss of deep learning models. Claims from those methods are myriad: from faster convergence to removing reliance on certain hyperparameters. However, the diverse experimental protocols used to validate these claims make direct comparisons between methods challenging. This study presents a comprehensive evaluation of recent optimization techniques across standardized LLM pretraining scenarios, systematically varying model size, batch size, and training duration. Through careful tuning of each method, we provide guidance to practitioners on which optimizer is best suited for each scenario. For researchers, our work highlights promising directions for future optimization research. Finally, by releasing our code and making all experiments fully reproducible, we hope our efforts can help the development and rigorous benchmarking of future methods.
In 'Alien: Earth', the Future Is a Corporate Hellscape
Seventeen years ago, Noah Hawley became a father during the Great Recession. If you look at everything he's written since having children--including the TV series Fargo and Legion--Hawley says it all revolves around the same question every parent faces: "How are we supposed to raise these people in the world that we're living in?" Hawley's new series, Alien: Earth, which premieres August 12 on Hulu and FX, explores this question even more directly than his previous work. Set two years before the original Alien in 2120, it imagines a future where the race for immortality has led to three competing technologies: synths (AI minds in synthetic bodies), cyborgs (humans with cybernetic enhancements), and hybrids (human minds downloaded into synthetic bodies). When a deep space research vessel, the USCSS Maginot, crashes into Earth carrying five captured alien species, a megacorporation called Prodigy sends six hybrids to investigate. The first-ever hybrid, Wendy, played by Sydney Chandler, was a terminally ill child before she was selected for the immortality experiment, just like the rest of Prodigy's hybrids, all six of whom wake up in super-strong, super-fast, synthetic adult bodies that will never age.
Revisiting Learning Rate Control
Henheik, Micha, Eimer, Theresa, Lindauer, Marius
The learning rate is one of the most important hyperparameters in deep learning, and how to control it is an active area within both AutoML and deep learning research. Approaches for learning rate control span from classic optimization to online scheduling based on gradient statistics. This paper compares paradigms to assess the current state of learning rate control. We find that methods from multi-fidelity hyperparameter optimization, fixed-hyperparameter schedules, and hyperparameter-free learning often perform very well on selected deep learning tasks but are not reliable across settings. This highlights the need for algorithm selection methods in learning rate control, which have been neglected so far by both the AutoML and deep learning communities. We also observe a trend of hyperparameter optimization approaches becoming less effective as models and tasks grow in complexity, even when combined with multi-fidelity approaches for more expensive model trainings. A focus on more relevant test tasks and new promising directions like finetunable methods and meta-learning will enable the AutoML community to significantly strengthen its impact on this crucial factor in deep learning.
How far away are truly hyperparameter-free learning algorithms?
Kasimbeg, Priya, Roulet, Vincent, Agarwal, Naman, Medapati, Sourabh, Pedregosa, Fabian, Agarwala, Atish, Dahl, George E.
Despite major advances in methodology, hyperparameter tuning remains a crucial (and expensive) part of the development of machine learning systems. Even ignoring architectural choices, deep neural networks have a large number of optimization and regularization hyperparameters that need to be tuned carefully per workload in order to obtain the best results. In a perfect world, training algorithms would not require workload-specific hyperparameter tuning, but would instead have default settings that performed well across many workloads. Recently, there has been a growing literature on optimization methods which attempt to reduce the number of hyperparameters -- particularly the learning rate and its accompanying schedule. Given these developments, how far away is the dream of neural network training algorithms that completely obviate the need for painful tuning? In this paper, we evaluate the potential of learning-rate-free methods as components of hyperparameter-free methods. We freeze their (non-learning rate) hyperparameters to default values, and score their performance using the recently-proposed AlgoPerf: Training Algorithms benchmark. We found that literature-supplied default settings performed poorly on the benchmark, so we performed a search for hyperparameter configurations that performed well across all workloads simultaneously. The best AlgoPerf-calibrated learning-rate-free methods had much improved performance but still lagged slightly behind a similarly calibrated NadamW baseline in overall benchmark score. Our results suggest that there is still much room for improvement for learning-rate-free methods, and that testing against a strong, workload-agnostic baseline is important to improve hyperparameter reduction techniques.
PRODIGY: Enabling In-context Learning Over Graphs
In-context learning is the ability of a pretrained model to adapt to novel and diverse downstream tasks by conditioning on prompt examples, without optimizing any parameters. While large language models have demonstrated this ability, how in-context learning could be performed over graphs is unexplored. The key idea of our framework is to formulate in-context learning over graphs with a novel \emph{prompt graph} representation, which connects prompt examples and queries. We then propose a graph neural network architecture over the prompt graph and a corresponding family of in-context pretraining objectives. With PRODIGY, the pretrained model can directly perform novel downstream classification tasks on unseen graphs via in-context learning.