Skip to main content

This Week in Databend #115

Databend is a modern cloud data warehouse, serving your massive-scale analytics needs at low cost and complexity. Open source alternative to Snowflake. Also available in the cloud: https://app.databend.com .

What's New

Stay informed about the latest features of Databend.

AGGREGATING INDEX

Databend has recently introduced AGGREGATING INDEX to improve query performance, especially for aggregation queries involving MIN, MAX, and SUM. Aggregating Index leverage techniques like precomputing and storing query results separately to eliminate the need to scan the entire table, thus speeding up data retrieval.

In addition, this feature includes a refresh mechanism that allows you to update and persist the latest query results on demand, ensuring data accuracy and reliability by refreshing the results when needed. Databend recommends manually refreshing the aggregating index before executing relevant queries to retrieve the most up-to-date data; Databend Cloud supports auto-refreshing of aggregating index.

-- Create an aggregating index
CREATE AGGREGATING INDEX my_agg_index AS SELECT MIN(a), MAX(c) FROM agg;

-- Refresh the aggregating index
REFRESH AGGREGATING INDEX my_agg_index;

The AGGREGATING INDEX requires Databend Enterprise Edition. Please contact the Databend team for upgrade information.

If you are interested in learning more, please check out the resources below:

Code Corner

Discover some fascinating code snippets or projects that showcase our work or learning journey.

Visualizing the MERGE INTO Pipeline

Databend recently implemented the MERGE INTO statement to provide more comprehensive data management capabilities. For those interested in how it works under the hood, check out the pipeline visualization of MERGE INTO below.

                                                                                                                                               +-------------------+
+-----------------------------+ output_port_row_id | |
+-----------------------+ Matched | +------------------------>-ResizeProcessor(1)+---------------+
| +---+--------------->| MatchedSplitProcessor | | | |
| | | | +----------+ +-------------------+ |
+----------------------+ | +---+ +-----------------------------+ | |
| MergeIntoSource +---------->|MergeIntoSplitProcessor| output_port_updated |
+----------------------+ | +---+ +-----------------------------+ | +-------------------+ |
| | | NotMatched | | | | | |
| +---+--------------->| MergeIntoNotMatchedProcessor+----------+------------->-ResizeProcessor(1)+-----------+ |
+-----------------------+ | | | | | |
+-----------------------------+ +-------------------+ | |
| |
| |
| |
| |
| |
+-------------------------------------------------+ | |
| | | |
| | | |
+--------------------------+ +-------------------------+ | ++---------------------------+ | +--------------------------------------+ | |
+---------+ TransformSerializeSegment<--------+ TransformSerializeBlock <-----+---------+|TransformAddComputedColumns|<---------+-----+TransformResortAddOnWithoutSourceSchema<-+ |
| +--------------------------+ +-------------------------+ | ++---------------------------+ | +--------------------------------------+ |
| | | |
| | | |
| | | |
| | | |
| +---------------+ +------------------------------+ | ++---------------+ | +---------------+ |
+----------+ TransformDummy|<----------------+ AsyncAccumulatingTransformer <-+---------------+|TransformDummy |<---------------+---------------+TransformDummy <------------------+
| +---------------+ +------------------------------+ | ++---------------+ | +---------------+
| | |
| | If it includes 'computed', this section |
| | of code will be executed, otherwise it won't |
| | |
| -+-------------------------------------------------+
|
|
|
| +------------------+ +-----------------------+ +-----------+
+------->|ResizeProcessor(1)+----------->|TableMutationAggregator+------->|CommitSink |
+------------------+ +-----------------------+ +-----------+

If you are interested in learning more, please check out the resources below:

Highlights

We have also made these improvements to Databend that we hope you will find helpful:

  • MERGE INTO now supports for automatic recluster and compaction.
  • SQLsmith now covers DELETE, UPDATE, ALTER TABLE, and CAST.
  • Added semi-structured data processing functions json_each and json_array_elements.
  • Added time and date functions to_week_of_year and date_part. See Docs | Date & Time Functions for details.
  • Read Sending IoT Stream Data to Databend with LF Edge eKuiper to learn how Databend integrates with eKuiper to meet growing IoT data analytics demands.

What's Up Next

We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.

Enhancing Role-Based Access Control

Currently, Databend's access control system consists of Role-Based Access Control (RBAC) and Discretionary Access Control (DAC). However, there is still room for improvement to make it more comprehensive.

We plan to support more privilege checks on uncovered resources and provide privilege definition guidance in 2023 Q4.

Issue #13207 | Tracking: RBAC improvement plan in 2023 Q4

Please let us know if you're interested in contributing to this feature, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.

Changelog

You can check the changelog of Databend Nightly for details about our latest developments.

Full Changelog: https://github.com/datafuselabs/databend/compare/v1.2.147-nightly...v1.2.160-nightly


Contributors

A total of 29 contributors participated

We are very grateful for the outstanding work of the contributors.