A Reproducible, Scalable Pipeline for Synthesizing Autoregressive Model Literature

Alpay, Faruk, Kilictas, Bugra, Alakkad, Hamdi

Aug-7-2025–arXiv.org Artificial Intelligence

The number of publications on generative modelling has grown exponentially over the last decade, with dozens of new papers on large language models and autoregressive (AR) techniques appearing each week. This deluge renders manual literature reviews impractical and hampers reproducibility. Systematic literature review (SLR) pipelines such as PROMPTHEUS (Torres et al., 2024) and modular summarisation frameworks (Achkar et al., 2024) have shown that automation can reduce the burden on researchers; however, they are domain-agnostic and often separate extraction from experimental validation. Our goal is to advance this line of work by delivering a fully integrated pipeline focused on AR models that not only summarises research but also extracts the hyperparameters, architectures, and metrics needed to reproduce experiments. The challenges motivating our work are threefold. First, the "literature overload" problem means that even experts struggle to keep up with emergent models and techniques. Second, reproducibility remains an open concern in machine learning: a lack of transparent reporting of code and hyperparameters has led to irreproducible claims (Kapoor and Narayanan, 2022). Initiatives such as the NeurIPS reproducibility checklist encourage authors to document training settings and datasets (Pineau et al., 2021), yet many papers still omit critical information. Third, AR models themselves are evolving rapidly, from recurrent architectures such as LSTMs (Merity et al., 2017; Bengio et al., 2003) to Transformer-based systems (Vaswani et al., 2017) and emerging large language models (Touvron et al., 2023).

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Aug-7-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found