Dataforge: A Data Agent Platform for Autonomous Data Engineering

Wang, Xinyuan, Fu, Yanjie

arXiv.org Artificial Intelligence 

B. Hierarchical Routing After data cleaning, to enable efficient and reliable decision-making, we adopt a hierarchical routing architecture, including task-level and action-level reasoning. At the task-level routing, a rule-based router quickly identifies the task type: classification, regression, or unsupervised learning, based on table schema metadata, such as, data types, label structures, and feature distribution. Such lightweight router relies on deterministic heuristics, instead of large language models, thus, enable fast and reliable responses across diverse datasets. At the action-level routing, a compact LLM-based planner refines the decision by selects and plans the most suitable feature-level actions such as, different ordered combinations of feature selection, transformation, or generation, under the identified task (e.g., a classification dataset). Since each router operates within a smaller, well-defined action space, this hierarchical routing approach not only accelerates processing but also avoid invalid or high-risk operations. C. Dual Feedback Loops We develop two collaborative feedback loops to transform the static workflow into an adaptive, self-correcting process, in order to achieve autonomy and continual refinement. 1) Action V alidation Loop for Safety: This feddback loop is to ground actions to ensure operational safety before execution. Each planned action is first grounded through schema alignment, type checking, and logical consistency tests, such as, detecting divisions by zero or invalid type conversions. Only actions that pass validation proceed to execution so as to prevent runtime errors and maintaining workflow integrity.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found