Dataverse: Open-Source ETL (Extract, Transform, Load) Pipeline for Large Language Models

Park, Hyunbyung, Lee, Sukyung, Gim, Gyoungjin, Kim, Yungi, Kim, Dahyun, Park, Chanjun

Mar-28-2024–arXiv.org Artificial Intelligence

To address the challenges associated with data processing at scale, we propose Dataverse, a unified open-source Extract-Transform-Load (ETL) pipeline for large language models (LLMs) with a user-friendly design at its core. Easy addition of custom processors with block-based interface in Dataverse allows users to readily and efficiently use Dataverse to build their own ETL pipeline. We hope that Dataverse will serve as a vital tool for LLM development and open source the entire library to welcome community contribution. Additionally, we provide a concise, two-minute video demonstration of our system, illustrating its capabilities and implementation.

dataverse, etl pipeline, pipeline, (13 more...)

arXiv.org Artificial Intelligence

Mar-28-2024

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.50)

Industry:
- Information Technology (0.73)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Representation & Reasoning > Information Fusion (0.91)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found