Architecture-Aware Learning Curve Extrapolation via Graph Ordinary Differential Equation

Ding, Yanna, Huang, Zijie, Shou, Xiao, Guo, Yihang, Sun, Yizhou, Gao, Jianxi

Dec-22-2024–arXiv.org Machine Learning

We Training neural architectures is a resource-intensive endeavor, utilize a seq2seq variational autoencoder framework to analyze often demanding considerable computational power the initial stages of a learning curve and predict its future and time. Researchers have developed various methodologies progression. This predictive capability is further enhanced to predict the performance of neural networks early in by an architecture-aware component that produces a graphlevel the training process using learning curve data. Some methods embedding from the architecture's topology, employing Domhan et al. (2015); Gargiani et al. (2019); Adriaensen techniques like Graph Convolutional Networks (GCN) Kipf et al. (2023) apply Bayesian inference to project these and Welling (2016) and Differentiable Pooling Ying et al. curves forward, while others employ time-series prediction (2018). This integration not only improves the accuracy of techniques, such as LSTM networks. Despite their effectiveness, learning curve extrapolations compared to existing methods these approaches (Swersky et al., 2014; Baker et al., but also significantly facilitates model ranking, potentially 2017) typically overlook the architectural features of networks, leading to more efficient use of computational resources, missing out on crucial insights that could be derived from the accelerated experimentation cycles, and faster progress in the models' topology.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Machine Learning

Dec-22-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.93)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Learning Graphical Models > Directed Networks
    - Bayesian Learning (0.46)
  - Neural Networks > Deep Learning (1.00)