Databend is a modern cloud data warehouse, serving your massive-scale analytics needs at low cost and complexity. Open source alternative to Snowflake. Also available in the cloud: https://app.databend.com .
Stay informed about the latest features of Databend.
New Filter Execution Framework
In the new filter execution framework, Databend introduces a groundbreaking concept, defining it as the "Immutable Index".
🚀 The Immutable Index enables us to avoid generating temporary selection buffer when encountering AND and OR operations. This not only reduces memory fragmentation but It also eliminates the cyclic copying from temporary selection to final selection.
Tests indicate a reduction in query time through the implementation of this optimization.
If you would like to learn more, please contact the Databend team or refer to the resources listed below:
Discover some fascinating code snippets or projects that showcase our work or learning journey.
Optimize Query Performance
Databend enhances query performance by providing Aggregate Index, Cluster Key, and Virtual Column, allowing users to optimize for specific query scenarios.
- Aggregate Index can pre-aggregate data to speed up aggregation query operations, such as sum, average, max, and min. It is especially useful for scenarios that require frequent aggregation calculations.
- Cluster Key guide Databend on how to organize data at the storage level. Rows with similar key values are physically stored together, reducing the number of reads during queries and thus speeding up query performance.
- Virtual columns can extract nested fields from Variant data and store this data in separate storage files. It is very useful for optimizing complex computations and conditional queries, reducing the computational load at runtime.
By properly applying these tools, Databend can significantly improve the speed and efficiency of data retrieval, providing users with fast and flexible options for query performance optimization.
We have also made these improvements to Databend that we hope you will find helpful:
- Added support for spilling Top-N sorting.
- Supported the use of conditional statements to build directed acyclic graphs when creating background tasks.
- Added new Binary data type.
- Added new stream_status HTTP API to check the status of streams.
- Added support for to defining default behavior with
MISSING_FIELD_ASduring Parquet load.
- Read Docs | Continuous Data Pipelines to learn how to use Stream and Pipeline for continuous data ingestion.
What's Up Next
We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.
Databend Roadmap for 2024 - Come & Join the Discussion!
In 2023, Databend scaled significantly. The largest single table in Databend managed to handle hundreds of thousands of segments, several ten million blocks, tens of trillions of records, encompassing 7PB of raw data and over 300TB of index data.
In 2024, our vision is Compute Where Data Lives: Swift, Smart, Seamless. Explore our ongoing journey and future plans for Databend. Join the discussion and contribute your ideas!
|Enhancements to Concurrency and Scheduler
|Aiming for faster, more efficient task handling and improved system responsiveness.
|GEOMETRY Data type
|Continuously optimizing for better performance benchmarks.
|Adding Python support for versatile data analysis alongside SQL.
|Unify Storage, Warehouse, and Compute
|Creating a cohesive data platform for AI and cloud computing, provisioning CPU & GPU resources.
Please let us know if you're interested in contributing to this feature, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.
You can check the changelog of Databend Nightly for details about our latest developments.
A total of 23 contributors participated
A total of 23 contributors participated
We are very grateful for the outstanding work of the contributors.