Ding, Yanna
Inferring from Logits: Exploring Best Practices for Decoding-Free Generative Candidate Selection
Ma, Mingyu Derek, Ding, Yanna, Huang, Zijie, Gao, Jianxi, Sun, Yizhou, Wang, Wei
Generative Language Models rely on autoregressive decoding to produce the output sequence token by token. Many tasks such as preference optimization, require the model to produce task-level output consisting of multiple tokens directly by selecting candidates from a pool as predictions. Determining a task-level prediction from candidates using the ordinary token-level decoding mechanism is constrained by time-consuming decoding and interrupted gradients by discrete token selection. Existing works have been using decoding-free candidate selection methods to obtain candidate probability from initial output logits over vocabulary. Though these estimation methods are widely used, they are not systematically evaluated, especially on end tasks. We introduce an evaluation of a comprehensive collection of decoding-free candidate selection approaches on a comprehensive set of tasks, including five multiple-choice QA tasks with a small candidate pool and four clinical decision tasks with a massive amount of candidates, some with 10k+ options. We evaluate the estimation methods paired with a wide spectrum of foundation LMs covering different architectures, sizes and training paradigms. The results and insights from our analysis inform the future model design.
Predicting Time Series of Networked Dynamical Systems without Knowing Topology
Ding, Yanna, Huang, Zijie, Magdon-Ismail, Malik, Gao, Jianxi
Many real-world complex systems, such as epidemic spreading networks and ecosystems, can be modeled as networked dynamical systems that produce multivariate time series. Learning the intrinsic dynamics from observational data is pivotal for forecasting system behaviors and making informed decisions. However, existing methods for modeling networked time series often assume known topologies, whereas real-world networks are typically incomplete or inaccurate, with missing or spurious links that hinder precise predictions. Moreover, while networked time series often originate from diverse topologies, the ability of models to generalize across topologies has not been systematically evaluated. To address these gaps, we propose a novel framework for learning network dynamics directly from observed time-series data, when prior knowledge of graph topology or governing dynamical equations is absent. Our approach leverages continuous graph neural networks with an attention mechanism to construct a latent topology, enabling accurate reconstruction of future trajectories for network states. Extensive experiments on real and synthetic networks demonstrate that our model not only captures dynamics effectively without topology knowledge but also generalizes to unseen time series originating from diverse topologies.
Architecture-Aware Learning Curve Extrapolation via Graph Ordinary Differential Equation
Ding, Yanna, Huang, Zijie, Shou, Xiao, Guo, Yihang, Sun, Yizhou, Gao, Jianxi
We Training neural architectures is a resource-intensive endeavor, utilize a seq2seq variational autoencoder framework to analyze often demanding considerable computational power the initial stages of a learning curve and predict its future and time. Researchers have developed various methodologies progression. This predictive capability is further enhanced to predict the performance of neural networks early in by an architecture-aware component that produces a graphlevel the training process using learning curve data. Some methods embedding from the architecture's topology, employing Domhan et al. (2015); Gargiani et al. (2019); Adriaensen techniques like Graph Convolutional Networks (GCN) Kipf et al. (2023) apply Bayesian inference to project these and Welling (2016) and Differentiable Pooling Ying et al. curves forward, while others employ time-series prediction (2018). This integration not only improves the accuracy of techniques, such as LSTM networks. Despite their effectiveness, learning curve extrapolations compared to existing methods these approaches (Swersky et al., 2014; Baker et al., but also significantly facilitates model ranking, potentially 2017) typically overlook the architectural features of networks, leading to more efficient use of computational resources, missing out on crucial insights that could be derived from the accelerated experimentation cycles, and faster progress in the models' topology.