A Reproducible, Scalable Pipeline for Synthesizing Autoregressive Model Literature
Alpay, Faruk, Kilictas, Bugra, Alakkad, Hamdi
–arXiv.org Artificial Intelligence
The number of publications on generative modelling has grown exponentially over the last decade, with dozens of new papers on large language models and autoregressive (AR) techniques appearing each week. This deluge renders manual literature reviews impractical and hampers reproducibility. Systematic literature review (SLR) pipelines such as PROMPTHEUS (Torres et al., 2024) and modular summarisation frameworks (Achkar et al., 2024) have shown that automation can reduce the burden on researchers; however, they are domain-agnostic and often separate extraction from experimental validation. Our goal is to advance this line of work by delivering a fully integrated pipeline focused on AR models that not only summarises research but also extracts the hyperparameters, architectures, and metrics needed to reproduce experiments. The challenges motivating our work are threefold. First, the "literature overload" problem means that even experts struggle to keep up with emergent models and techniques. Second, reproducibility remains an open concern in machine learning: a lack of transparent reporting of code and hyperparameters has led to irreproducible claims (Kapoor and Narayanan, 2022). Initiatives such as the NeurIPS reproducibility checklist encourage authors to document training settings and datasets (Pineau et al., 2021), yet many papers still omit critical information. Third, AR models themselves are evolving rapidly, from recurrent architectures such as LSTMs (Merity et al., 2017; Bengio et al., 2003) to Transformer-based systems (Vaswani et al., 2017) and emerging large language models (Touvron et al., 2023).
arXiv.org Artificial Intelligence
Aug-7-2025