Goto

Collaborating Authors

 prodigy


'Infinite Jest' Is Back. Maybe Litbros Should Be, Too

WIRED

The notoriously challenging book is being re-released for its 30th anniversary. Its fandom is annoying, sure--but at least they read. The host had been grilling Wallace, ostensibly invited on to discuss his own literary and journalistic output, on range of topics: tennis, teaching, why women don't like Westerns, depression, and, yes, Anthony Minghella's Academy Award-winning epic war drama, which had by the time the interview aired already become a punch line . Watching the interview, it's clear Wallace, who died by suicide in 2008, bristles at being pressed to purvey rank punditry on the popular culture at large like some kind of dancing monkey. But the exercise revealed how Rose, and large swaths of American intellectual culture circa the late-1990s, thought of Wallace.


PRODIGY: Enabling In-context Learning Over Graphs

Neural Information Processing Systems

In-context learning is the ability of a pretrained model to adapt to novel and diverse downstream tasks by conditioning on prompt examples, without optimizing any parameters. While large language models have demonstrated this ability, how in-context learning could be performed over graphs is unexplored.


PRODIGY: Enabling In-context Learning Over Graphs

Neural Information Processing Systems

While large language models have demonstrated this ability, how in-context learning could be performed over graphs is unexplored. In this paper, we develop Pr etraining O ver D iverse I n-Context G raph S y stems (PRODIGY), the first pretraining framework that enables in-context learning over graphs.


PRODIGY: Enabling In-context Learning Over Graphs

Neural Information Processing Systems

While large language models have demonstrated this ability, how in-context learning could be performed over graphs is unexplored. In this paper, we develop Pr etraining O ver D iverse I n-Context G raph S y stems (PRODIGY), the first pretraining framework that enables in-context learning over graphs.


Benchmarking Optimizers for Large Language Model Pretraining

Semenov, Andrei, Pagliardini, Matteo, Jaggi, Martin

arXiv.org Artificial Intelligence

The recent development of Large Language Models (LLMs) has been accompanied by an effervescence of novel ideas and methods to better optimize the loss of deep learning models. Claims from those methods are myriad: from faster convergence to removing reliance on certain hyperparameters. However, the diverse experimental protocols used to validate these claims make direct comparisons between methods challenging. This study presents a comprehensive evaluation of recent optimization techniques across standardized LLM pretraining scenarios, systematically varying model size, batch size, and training duration. Through careful tuning of each method, we provide guidance to practitioners on which optimizer is best suited for each scenario. For researchers, our work highlights promising directions for future optimization research. Finally, by releasing our code and making all experiments fully reproducible, we hope our efforts can help the development and rigorous benchmarking of future methods.


In 'Alien: Earth', the Future Is a Corporate Hellscape

WIRED

Seventeen years ago, Noah Hawley became a father during the Great Recession. If you look at everything he's written since having children--including the TV series Fargo and Legion--Hawley says it all revolves around the same question every parent faces: "How are we supposed to raise these people in the world that we're living in?" Hawley's new series, Alien: Earth, which premieres August 12 on Hulu and FX, explores this question even more directly than his previous work. Set two years before the original Alien in 2120, it imagines a future where the race for immortality has led to three competing technologies: synths (AI minds in synthetic bodies), cyborgs (humans with cybernetic enhancements), and hybrids (human minds downloaded into synthetic bodies). When a deep space research vessel, the USCSS Maginot, crashes into Earth carrying five captured alien species, a megacorporation called Prodigy sends six hybrids to investigate. The first-ever hybrid, Wendy, played by Sydney Chandler, was a terminally ill child before she was selected for the immortality experiment, just like the rest of Prodigy's hybrids, all six of whom wake up in super-strong, super-fast, synthetic adult bodies that will never age.


Revisiting Learning Rate Control

Henheik, Micha, Eimer, Theresa, Lindauer, Marius

arXiv.org Artificial Intelligence

The learning rate is one of the most important hyperparameters in deep learning, and how to control it is an active area within both AutoML and deep learning research. Approaches for learning rate control span from classic optimization to online scheduling based on gradient statistics. This paper compares paradigms to assess the current state of learning rate control. We find that methods from multi-fidelity hyperparameter optimization, fixed-hyperparameter schedules, and hyperparameter-free learning often perform very well on selected deep learning tasks but are not reliable across settings. This highlights the need for algorithm selection methods in learning rate control, which have been neglected so far by both the AutoML and deep learning communities. We also observe a trend of hyperparameter optimization approaches becoming less effective as models and tasks grow in complexity, even when combined with multi-fidelity approaches for more expensive model trainings. A focus on more relevant test tasks and new promising directions like finetunable methods and meta-learning will enable the AutoML community to significantly strengthen its impact on this crucial factor in deep learning.


How far away are truly hyperparameter-free learning algorithms?

Kasimbeg, Priya, Roulet, Vincent, Agarwal, Naman, Medapati, Sourabh, Pedregosa, Fabian, Agarwala, Atish, Dahl, George E.

arXiv.org Artificial Intelligence

Despite major advances in methodology, hyperparameter tuning remains a crucial (and expensive) part of the development of machine learning systems. Even ignoring architectural choices, deep neural networks have a large number of optimization and regularization hyperparameters that need to be tuned carefully per workload in order to obtain the best results. In a perfect world, training algorithms would not require workload-specific hyperparameter tuning, but would instead have default settings that performed well across many workloads. Recently, there has been a growing literature on optimization methods which attempt to reduce the number of hyperparameters -- particularly the learning rate and its accompanying schedule. Given these developments, how far away is the dream of neural network training algorithms that completely obviate the need for painful tuning? In this paper, we evaluate the potential of learning-rate-free methods as components of hyperparameter-free methods. We freeze their (non-learning rate) hyperparameters to default values, and score their performance using the recently-proposed AlgoPerf: Training Algorithms benchmark. We found that literature-supplied default settings performed poorly on the benchmark, so we performed a search for hyperparameter configurations that performed well across all workloads simultaneously. The best AlgoPerf-calibrated learning-rate-free methods had much improved performance but still lagged slightly behind a similarly calibrated NadamW baseline in overall benchmark score. Our results suggest that there is still much room for improvement for learning-rate-free methods, and that testing against a strong, workload-agnostic baseline is important to improve hyperparameter reduction techniques.


PRODIGY: Enabling In-context Learning Over Graphs

Neural Information Processing Systems

In-context learning is the ability of a pretrained model to adapt to novel and diverse downstream tasks by conditioning on prompt examples, without optimizing any parameters. While large language models have demonstrated this ability, how in-context learning could be performed over graphs is unexplored. The key idea of our framework is to formulate in-context learning over graphs with a novel \emph{prompt graph} representation, which connects prompt examples and queries. We then propose a graph neural network architecture over the prompt graph and a corresponding family of in-context pretraining objectives. With PRODIGY, the pretrained model can directly perform novel downstream classification tasks on unseen graphs via in-context learning.


GraphPrompter: Multi-stage Adaptive Prompt Optimization for Graph In-Context Learning

Lv, Rui, Zhang, Zaixi, Zhang, Kai, Liu, Qi, Gao, Weibo, Liu, Jiawei, Yan, Jiaxia, Yue, Linan, Yao, Fangzhou

arXiv.org Artificial Intelligence

--Graph In-Context Learning, with the ability to adapt pre-trained graph models to novel and diverse downstream graphs without updating any parameters, has gained much attention in the community. The key to graph in-context learning is to perform downstream graphs conditioned on chosen prompt examples. Existing methods randomly select subgraphs or edges as prompts, leading to noisy graph prompts and inferior model performance. Additionally, due to the gap between pre-training and testing graphs, when the number of classes in the testing graphs is much greater than that in the training, the in-context learning ability will also significantly deteriorate. T o tackle the aforementioned challenges, we develop a multi-stage adaptive prompt optimization method GraphPrompter, which optimizes the entire process of generating, selecting, and using graph prompts for better in-context learning capabilities. Firstly, Prompt Generator introduces a reconstruction layer to highlight the most informative edges and reduce irrelevant noise for graph prompt construction. Furthermore, in the selection stage, Prompt Selector employs the k -nearest neighbors algorithm and pre-trained selection layers to dynamically choose appropriate samples and minimize the influence of irrelevant prompts. Finally, we leverage a Prompt Augmenter with a cache replacement strategy to enhance the generalization capability of the pre-trained model on new datasets. Extensive experiments show that GraphPrompter effectively enhances the in-context learning ability of graph models. One of the most fascinating properties of Large Language Models (LLMs) is its In-Context Learning capability [1], [2]. It refers to the ability of a pre-trained LLM to achieve competitive results on downstream tasks given only a few prompt examples during the prediction phase, without updating the model weights through fine-tuning approaches. Recently, there have been efforts to transfer this In-Context learning capability from large language models to graph models [3]-[5]. Out of these methods, Prodigy [3] and One For All (OFA) [5] stand out as the most effective frameworks that unify diverse levels of graph-related tasks and achieve competitive in-context learning performance. Generally, the graph in-context learning architecture can be divided into two main parts including data/prompt graph construction and task graph prediction (see Figure 1 as an example for edge classification). Figure 1: Graph In-Context Learning (edge classification as an example) with random prompts selection.