Skip to main content

This Week in Databend #126

Databend is a modern cloud data warehouse, serving your massive-scale analytics needs at low cost and complexity. Open source alternative to Snowflake. Also available in the cloud: https://app.databend.com .

What's New

Stay informed about the latest features of Databend.

New Filter Execution Framework

In the new filter execution framework, Databend introduces a groundbreaking concept, defining it as the "Immutable Index".

🚀 The Immutable Index enables us to avoid generating temporary selection buffer when encountering AND and OR operations. This not only reduces memory fragmentation but It also eliminates the cyclic copying from temporary selection to final selection.

Tests indicate a reduction in query time through the implementation of this optimization.

If you would like to learn more, please contact the Databend team or refer to the resources listed below:

Code Corner

Discover some fascinating code snippets or projects that showcase our work or learning journey.

Optimize Query Performance

Databend enhances query performance by providing Aggregate Index, Cluster Key, and Virtual Column, allowing users to optimize for specific query scenarios.

  • Aggregate Index can pre-aggregate data to speed up aggregation query operations, such as sum, average, max, and min. It is especially useful for scenarios that require frequent aggregation calculations.
  • Cluster Key guide Databend on how to organize data at the storage level. Rows with similar key values are physically stored together, reducing the number of reads during queries and thus speeding up query performance.
  • Virtual columns can extract nested fields from Variant data and store this data in separate storage files. It is very useful for optimizing complex computations and conditional queries, reducing the computational load at runtime.

By properly applying these tools, Databend can significantly improve the speed and efficiency of data retrieval, providing users with fast and flexible options for query performance optimization.

Highlights

We have also made these improvements to Databend that we hope you will find helpful:

  • Added support for spilling Top-N sorting.
  • Supported the use of conditional statements to build directed acyclic graphs when creating background tasks.
  • Added new Binary data type.
  • Added new stream_status HTTP API to check the status of streams.
  • Added support for to defining default behavior with MISSING_FIELD_AS during Parquet load.
  • Read Docs | Continuous Data Pipelines to learn how to use Stream and Pipeline for continuous data ingestion.

What's Up Next

We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.

Databend Roadmap for 2024 - Come & Join the Discussion!

In 2023, Databend scaled significantly. The largest single table in Databend managed to handle hundreds of thousands of segments, several ten million blocks, tens of trillions of records, encompassing 7PB of raw data and over 300TB of index data.

In 2024, our vision is Compute Where Data Lives: Swift, Smart, Seamless. Explore our ongoing journey and future plans for Databend. Join the discussion and contribute your ideas!

TaskStatusComments
Enhancements to Concurrency and SchedulerPlannedAiming for faster, more efficient task handling and improved system responsiveness.
GEOMETRY Data typePlanned
TPC-DS PerformanceIn ProgressContinuously optimizing for better performance benchmarks.
Multi-Statement TransactionsNot Specified
Stored Procedures(Python)Not SpecifiedAdding Python support for versatile data analysis alongside SQL.
Unify Storage, Warehouse, and ComputeNot SpecifiedCreating a cohesive data platform for AI and cloud computing, provisioning CPU & GPU resources.

Issue #14167 | Databend Roadmap for 2024 (Discussion)

Please let us know if you're interested in contributing to this feature, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.

Changelog

You can check the changelog of Databend Nightly for details about our latest developments.

Full Changelog: https://github.com/datafuselabs/databend/compare/v1.2.268-nightly...v1.2.277-nightly


Contributors

A total of 23 contributors participated

We are very grateful for the outstanding work of the contributors.