We used ClickBench on March 3rd to evaluate the performance of Databend in comparison to other OLAP databases. ClickBench utilizes the HITS dataset from a production environment to measure database performance. The benchmark comprises 50 databases and assesses the data import time and 43 other queries. The total ranking of a database is determined by its overall performance across all testing scenarios.
Databend Ranked High
To demonstrate the real performance of Databend, we did not make any special optimizations for the testing scenarios. We used all default configurations without any parameter tuning, did not partition the data based on a specific column during the import and table creation, and did not cache the original data or query results, only caching metadata and indexes.
We submitted tests for the three most common types of Amazon EC2 instances:
- c6a.metal, 500GB gp2 (192 cores)
- c6a.4xlarge, 500GB gp2 (16 cores)
- c5.4xlarge, 500GB gp2 (16 cores)
We got good results. Databend's data load performance ranked first for all the three instance types.
Databend also performed exceptionally in hot run queries:
What Made it Happen
Thanks to the new expression system in Databend, all operators have been implemented with vectorization, and all operators have domain-based value inference capabilities. Based on this, we can apply a powerful constant folding framework to perform multi-level data pruning, as well as skip unnecessary data blocks as much as possible.
In addition, the scheduling ability of the pipeline and the functionality of the aggregation operator have been further strengthened, allowing for efficient scheduling of CPU and IO resources, thereby achieving optimal performance.
The gap between the top three in the ClickBench list is not significant. Therefore, even in scenarios where it is not particularly good at, combining high-performance computing power, Databend can also achieve good advantages. Since Databend uses default configurations and disables DataCache, the comparison has little significance in cold run scenarios. We can also optimize the table creation statement (e.g., import partition by UserID, optimize
Q17 and other scenarios that aggregate by UserId) or add some parameter tuning to further improve performance.
Databend's ultimate goal is to provide users with ultimate performance and easy-to-use product experience in common scenarios. Benchmarking is mainly to provide us with a way to measure performance and improve product quality. ClickBench testing is very representative, so we have integrated ClickBench into the performance testing CI of various versions and PRs, making it easy for developers to observe performance degradation and improvement and optimize product development.