Laughing Hyena Distillery: Extracting Compact Recurrences From Convolutions

Massaroli, Stefano, Poli, Michael, Fu, Daniel Y., Kumbong, Hermann, Parnichkun, Rom N., Timalsina, Aman, Romero, David W., McIntyre, Quinn, Chen, Beidi, Rudra, Atri, Zhang, Ce, Re, Christopher, Ermon, Stefano, Bengio, Yoshua

Oct-28-2023–arXiv.org Artificial Intelligence

Attention-free approaches such as long convolution sequence models (LCSMs), e.g., H3 [1], Hyena [2], have shown promise in matching Transformer [3, 4] performance across a wide range of tasks, with sub-quadratic complexity with respect to sequence length. Despite the improved efficiency during training on long sequences, unless the convolution filters are either short or admit a low-dimensional state-state-space realization, LCSMs still need to process the entire growing sequence at every step of auto-regressive generation, similarly to Transformers. In this work, we seek to refine LCSMs in both efficiency and quality. First, we study the inference stage, and propose methods to enable a recurrent mode for auto-regressive generation. Recurrent modes prescribe the existence of a state encoding the past information of the process in a fixed-dimension memory, enabling constant per-step time and constant-memory in generation. Then, we draw upon an analysis of pre-trained models to develop architectural enhancements for the Hyena block, simultaneously improving model quality and efficiency of the distillation procedure. Distilling fast recurrences We introduce LaughingHyena, the first distillation approach for LCSMs that enables recurrent inference without impacting downstream quality. LaughingHyena seeks compact recurrences in the form of state-space models (SSMs) [5, 6] as the solution of a nonlinear interpolation problem involving the convolution filters of a pre-trained model. Since the total memory cost of SSMs grows linearly in the state dimension d, our distillation procedure enables high throughput by enabling processing of large batches during generation.

machine learning, natural language, transfer function, (19 more...)

arXiv.org Artificial Intelligence

Oct-28-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.67)

Genre:
- Research Report (0.81)

Industry:
- Government > Regional Government
  - North America Government > United States Government (0.45)
- Information Technology (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.67)
  - Natural Language (1.00)
  - Representation & Reasoning (0.88)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found