Data Virtualization for Machine Learning
Khan, Saiful, Chakraborty, Joyraj, Beaucamp, Philip, Bhujel, Niraj, Chen, Min
–arXiv.org Artificial Intelligence
Nowadays, machine learning (ML) teams have multiple concurrent ML workflows for different applications. Each workflow typically involves many experiments, iterations, and collaborative activities and commonly takes months and sometimes years from initial data wrangling to model deployment. Organizationally, there is a large amount of intermediate data to be stored, processed, and maintained. \emph{Data virtualization} becomes a critical technology in an infrastructure to serve ML workflows. In this paper, we present the design and implementation of a data virtualization service, focusing on its service architecture and service operations. The infrastructure currently supports six ML applications, each with more than one ML workflow. The data virtualization service allows the number of applications and workflows to grow in the coming years.
arXiv.org Artificial Intelligence
Sep-19-2025
- Country:
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Genre:
- Research Report (1.00)
- Industry:
- Information Technology > Security & Privacy (0.93)
- Technology:
- Information Technology
- Artificial Intelligence
- Data Science
- Data Mining (1.00)
- Data Quality (1.00)
- Databases (0.93)
- Information Management (1.00)
- Security & Privacy (0.93)
- Virtualization (1.00)
- Information Technology