Asada, Yuki
The Tensor Data Platform: Towards an AI-centric Database System
Gandhi, Apurva, Asada, Yuki, Fu, Victor, Gemawat, Advitya, Zhang, Lihao, Sen, Rathijit, Curino, Carlo, Camacho-Rodríguez, Jesús, Interlandi, Matteo
Database engines have historically absorbed many of the innovations in data processing, adding features to process graph data, XML, object oriented, and text among many others. In this paper, we make the case that it is time to do the same for AI -- but with a twist! While existing approaches have tried to achieve this by integrating databases with external ML tools, in this paper we claim that achieving a truly AI-centric database requires moving the DBMS engine, at its core, from a relational to a tensor abstraction. This allows us to: (1) support multi-modal data processing such as images, videos, audio, text as well as relational; (2) leverage the wellspring of innovation in HW and runtimes for tensor computation; and (3) exploit automatic differentiation to enable a novel class of "trainable" queries that can learn to perform a task. To support the above scenarios, we introduce TDP: a system that builds upon our prior work mapping relational queries to tensors. Thanks to a tighter integration with the tensor runtime, TDP is able to provide a broader coverage of new emerging scenarios requiring access to multi-modal data and automatic differentiation.
Share the Tensor Tea: How Databases can Leverage the Machine Learning Ecosystem
Asada, Yuki, Fu, Victor, Gandhi, Apurva, Gemawat, Advitya, Zhang, Lihao, He, Dong, Gupta, Vivek, Nosakhare, Ehi, Banda, Dalitso, Sen, Rathijit, Interlandi, Matteo
We demonstrate Tensor Query Processor (TQP): a query processor that automatically compiles relational operators into tensor programs. By leveraging tensor runtimes such as PyTorch, TQP is able to: (1) integrate with ML tools (e.g., Pandas for data ingestion, Tensorboard for visualization); (2) target different hardware (e.g., CPU, GPU) and software (e.g., browser) backends; and (3) end-to-end accelerate queries containing both relational and ML operators. TQP is generic enough to support the TPC-H benchmark, and it provides performance that is comparable to, and often better than, that of specialized CPU and GPU query processors.