Goto

Collaborating Authors

 clickhouse


MERLIN: Multi-stagE query performance prediction for dynamic paRallel oLap pIpeliNe

Zhang, Kaixin, Wang, Hongzhi, Gu, Kunkai, Li, Ziqi, Zhao, Chunyu, Li, Yingze, Yan, Yu

arXiv.org Artificial Intelligence

High-performance OLAP database technology has emerged with the growing demand for massive data analysis. To achieve much higher performance, many DBMSs adopt sophisticated designs including SIMD operators, parallel execution, and dynamic pipeline modification. However, such advanced OLAP query execution mechanisms still lack targeted Query Performance Prediction (QPP) methods because most existing methods target conventional tree-shaped query plans and static serial executors. To address this problem, in this paper, we proposed MERLIN a multi-stage query performance prediction method for high-performance OLAP DBMSs. MERLIN first establishes resource cost models for each physical operator. Then, it constructs a DAG that consists of a data-flow tree backbone and resource competition relationships among concurrent operators. After using a GAT with an extra attention mechanism to calibrate the cost, the cost vector tree is extracted and summarized by a TCN, ultimately enabling effective query performance prediction. Experimental results demonstrate that MERLIN yields higher performance prediction precision than existing methods.


On-Premise AIOps Infrastructure for a Software Editor SME: An Experience Report

Bendimerad, Anes, Remil, Youcef, Mathonat, Romain, Kaytoue, Mehdi

arXiv.org Artificial Intelligence

Information Technology has become a critical component in various industries, leading to an increased focus on software maintenance and monitoring. With the complexities of modern software systems, traditional maintenance approaches have become insufficient. The concept of AIOps has emerged to enhance predictive maintenance using Big Data and Machine Learning capabilities. However, exploiting AIOps requires addressing several challenges related to the complexity of data and incident management. Commercial solutions exist, but they may not be suitable for certain companies due to high costs, data governance issues, and limitations in covering private software. This paper investigates the feasibility of implementing on-premise AIOps solutions by leveraging open-source tools. We introduce a comprehensive AIOps infrastructure that we have successfully deployed in our company, and we provide the rationale behind different choices that we made to build its various components. Particularly, we provide insights into our approach and criteria for selecting a data management system and we explain its integration. Our experience can be beneficial for companies seeking to internally manage their software maintenance processes with a modern AIOps approach.


Putting a two-layered recommendation system into production

#artificialintelligence

Recommendation systems will always stay relevant -- users want to see personalized content, the best of the catalog (in the case of our iFunny app -- trending memes and jokes). Our team is testing dozens of hypotheses on how a smart feed can improve user experience. This article will tell you how we implemented the second-ranking level of the model above the collaborative one: what difficulties we encountered, and how they affected the metrics. Usually, a matrix decomposition, such as implicit.ALS, is used to help improve the feed. In this method, for each user and each object, we get the embeddings, and the content, whose embeddings are the closest (in cosine measure) to the user's embeddings, ends up in the top recommendations.


4Bn rows/sec query benchmark: Clickhouse vs QuestDB vs Timescale

#artificialintelligence

QuestDB 6.2, our previous minor version release, introduced JIT (Just-in-Time) compiler for SQL filters. As we mentioned last time, the next step would be to parallelize the query execution when suitable to improve the execution time even further and that's what we're going to discuss and benchmark today. QuestDB 6.3 enables JIT compiled filters by default and, what's even more noticeable, includes parallel SQL filter execution optimization allowing us to reduce both cold and hot query execution times quite dramatically. Prior to diving into the implementation details and running some before/after benchmarks for QuestDB, we'll be having a friendly competition with two popular time series and analytical databases, TimescaleDB and ClickHouse. The purpose of the competition is nothing more but an attempt to understand whether our parallel filter execution is worth the hassle or not.


The Top 7 Databases for Machine Learning

#artificialintelligence

One of the most common questions I get asked is, 'What is the best database for Machine Learning?' In reality, the answer I give is nearly always, 'It depends', before bombarding the enquirer with a series of follow-up questions. But because'It depends' is never fun in blog form I have put together this list. Machine Learning has now penetrated every aspect of our lives whether you realise it or not. From the video recommendations you see on YouTube, the systems keeping you safe while you bank or shop online all the way to the processing of images on your smartphone.