SemBench: A Benchmark for Semantic Query Processing Engines
Lao, Jiale, Zimmerer, Andreas, Ovcharenko, Olga, Cong, Tianji, Russo, Matthew, Vitagliano, Gerardo, Cochez, Michael, Özcan, Fatma, Gupta, Gautam, Hottelier, Thibaud, Jagadish, H. V., Kissel, Kris, Schelter, Sebastian, Kipf, Andreas, Trummer, Immanuel
–arXiv.org Artificial Intelligence
We present a benchmark targeting a novel class of systems: semantic query processing engines. Those systems rely inherently on generative and reasoning capabilities of state-of-the-art large language models (LLMs). They extend SQL with semantic operators, configured by natural language instructions, that are evaluated via LLMs and enable users to perform various operations on multimodal data. Our benchmark introduces diversity across three key dimensions: scenarios, modalities, and operators. Included are scenarios ranging from movie review analysis to medical question-answering. Within these scenarios, we cover different data modalities, including images, audio, and text. Finally, the queries involve a diverse set of operators, including semantic filters, joins, mappings, ranking, and classification operators. We evaluated our benchmark on three academic systems (LOTUS, Palimpzest, and ThalamusDB) and one industrial system, Google BigQuery. Although these results reflect a snapshot of systems under continuous development, our study offers crucial insights into their current strengths and weaknesses, illuminating promising directions for future research.
arXiv.org Artificial Intelligence
Nov-4-2025
- Country:
- Africa > Kenya (0.04)
- Asia > Japan
- Honshū
- Chūbu > Toyama Prefecture
- Toyama (0.04)
- Kantō > Kanagawa Prefecture
- Yokohama (0.04)
- Chūbu > Toyama Prefecture
- Honshū
- Europe
- Austria (0.04)
- Germany > Bavaria
- Middle Franconia > Nuremberg (0.04)
- Netherlands > North Holland
- Amsterdam (0.04)
- North America
- Canada > British Columbia
- United States
- California > San Francisco County
- San Francisco (0.14)
- Hawaii (0.04)
- Illinois > Cook County
- Chicago (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- Michigan (0.04)
- New York > New York County
- New York City (0.14)
- California > San Francisco County
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Health & Medicine > Therapeutic Area (0.68)
- Leisure & Entertainment (0.88)
- Media > Film (0.49)
- Technology: