Dataverse: Open-Source ETL (Extract, Transform, Load) Pipeline for Large Language Models

Park, Hyunbyung, Lee, Sukyung, Gim, Gyoungjin, Kim, Yungi, Kim, Dahyun, Park, Chanjun

arXiv.org Artificial Intelligence 

To address the challenges associated with data processing at scale, we propose Dataverse, a unified open-source Extract-Transform-Load (ETL) pipeline for large language models (LLMs) with a user-friendly design at its core. Easy addition of custom processors with block-based interface in Dataverse allows users to readily and efficiently use Dataverse to build their own ETL pipeline. We hope that Dataverse will serve as a vital tool for LLM development and open source the entire library to welcome community contribution. Additionally, we provide a concise, two-minute video demonstration of our system, illustrating its capabilities and implementation.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found