The framework of formulating the feedback structure as feedback graphs in bandits has a long history (Mannor and Shamir, 2011; Alon et al., 2015, 2017; Lykouris et al.,
Transformer-based large language models (LLMs) have displayed remarkable creative prowess and emergence capabilities. Existing empirical studies have revealed a strong connection between these LLMs' impressive emergence abilities and their
SelfCodeAlign employs the same base model for inference throughout the data generation process. It first extracts diverse coding concepts from high-quality seed snippets to generate new tasks.