McIntyre, Quinn
Automating the Enterprise with Foundation Models
Wornow, Michael, Narayan, Avanika, Opsahl-Ong, Krista, McIntyre, Quinn, Shah, Nigam H., Re, Christopher
Automating enterprise workflows could unlock $4 trillion/year in productivity gains. Despite being of interest to the data management community for decades, the ultimate vision of end-to-end workflow automation has remained elusive. Current solutions rely on process mining and robotic process automation (RPA), in which a bot is hard-coded to follow a set of predefined rules for completing a workflow. Through case studies of a hospital and large B2B enterprise, we find that the adoption of RPA has been inhibited by high set-up costs (12-18 months), unreliable execution (60% initial accuracy), and burdensome maintenance (requiring multiple FTEs). Multimodal foundation models (FMs) such as GPT-4 offer a promising new approach for end-to-end workflow automation given their generalized reasoning and planning abilities. To study these capabilities we propose ECLAIR, a system to automate enterprise workflows with minimal human supervision. We conduct initial experiments showing that multimodal FMs can address the limitations of traditional RPA with (1) near-human-level understanding of workflows (93% accuracy on a workflow understanding task) and (2) instant set-up with minimal technical barrier (based solely on a natural language description of a workflow, ECLAIR achieves end-to-end completion rates of 40%). We identify human-AI collaboration, validation, and self-improvement as open challenges, and suggest ways they can be solved with data management techniques. Code is available at: https://github.com/HazyResearch/eclair-agents
Laughing Hyena Distillery: Extracting Compact Recurrences From Convolutions
Massaroli, Stefano, Poli, Michael, Fu, Daniel Y., Kumbong, Hermann, Parnichkun, Rom N., Timalsina, Aman, Romero, David W., McIntyre, Quinn, Chen, Beidi, Rudra, Atri, Zhang, Ce, Re, Christopher, Ermon, Stefano, Bengio, Yoshua
Attention-free approaches such as long convolution sequence models (LCSMs), e.g., H3 [1], Hyena [2], have shown promise in matching Transformer [3, 4] performance across a wide range of tasks, with sub-quadratic complexity with respect to sequence length. Despite the improved efficiency during training on long sequences, unless the convolution filters are either short or admit a low-dimensional state-state-space realization, LCSMs still need to process the entire growing sequence at every step of auto-regressive generation, similarly to Transformers. In this work, we seek to refine LCSMs in both efficiency and quality. First, we study the inference stage, and propose methods to enable a recurrent mode for auto-regressive generation. Recurrent modes prescribe the existence of a state encoding the past information of the process in a fixed-dimension memory, enabling constant per-step time and constant-memory in generation. Then, we draw upon an analysis of pre-trained models to develop architectural enhancements for the Hyena block, simultaneously improving model quality and efficiency of the distillation procedure. Distilling fast recurrences We introduce LaughingHyena, the first distillation approach for LCSMs that enables recurrent inference without impacting downstream quality. LaughingHyena seeks compact recurrences in the form of state-space models (SSMs) [5, 6] as the solution of a nonlinear interpolation problem involving the convolution filters of a pre-trained model. Since the total memory cost of SSMs grows linearly in the state dimension d, our distillation procedure enables high throughput by enabling processing of large batches during generation.