Skip to main content

This Week in Databend #136

Databend is a modern cloud data warehouse, serving your massive-scale analytics needs at low cost and complexity. Open source alternative to Snowflake. Also available in the cloud: https://app.databend.com .

What's New

Stay informed about the latest features of Databend.

Understanding Tasks and Notifications in Databend

Databend now supports a comprehensive mechanism for tasks and notifications.

Tasks are executed according to a schedule or based on a DAG of tasks, executing specified SQL statements. With notification integrations, notifications can be sent to external messaging services.

CREATE TASK IF NOT EXISTS mytask
WAREHOUSE = 'mywh'
SCHEDULE = 30 SECOND
ERROR_INTEGRATION = 'myerror'
AS
BEGIN
BEGIN;
INSERT INTO mytable(ts) VALUES(CURRENT_TIMESTAMP);
DELETE FROM mytable WHERE ts < DATEADD(MINUTE, -5, CURRENT_TIMESTAMP());
COMMIT;
END;

The above example defines a task named mytask that runs every 30 seconds on the mywh compute cluster. The task executes a multi-statement transaction that includes an INSERT statement and a DELETE statement. When the task fails, it will trigger an error integration named myerror.

The mechanisms related to tasks and notifications are ready to use out of the box in Databend Cloud. If you would like to learn more, please contact the Databend team or refer to the resources listed below:

Code Corner

Discover some fascinating code snippets or projects that showcase our work or learning journey.

Databend vs. Snowflake: Data Ingestion Benchmark

We conducted four specific benchmarks to evaluate Databend Cloud versus Snowflake:

  • TPC-H SF100 Dataset Loading: Focuses on loading performance and cost for a large-scale dataset (100GB, ~600 million rows).
  • ClickBench Hits Dataset Loading: Tests efficiency in loading a wide-table dataset (76GB, ~100 million rows, 105 columns), emphasizing challenges associated with high column counts.
  • 1-Second Freshness: Measures the platforms' ability to ingest data within a strict 1-second freshness requirement.
  • 5-Second Freshness: Compares the platforms' data ingestion capabilities under a 5-second freshness constraint.

Data Loading Benchmark

data loading

Freshness Benchmark

data freshness

Welcome to read the following documentation to understand the low-cost, high-performance data ingestion of Databend Cloud.

Highlights

We have also made these improvements to Databend that we hope you will find helpful:

  • Added support for spill in CROSS JOIN.
  • Added support for spill in the new aggregate hash table.
  • Added support for refreshing inverted indexes.
  • Added more function aliases for time and date related functions to support more data analysis tools.

What's Up Next

We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.

Adding support for TOP K Syntax

Databend plans to support the SELECT TOP statement to pick the top K items from a result set.

The following SELECT TOP statement is used to select the top 4 items from an ordered result set:

select TOP 4 c1 from testable ORDER BY c1;

Equivalent to the following SELECT ... LIMIT statement.

select c1 from testable order by c1 limit 4;

This is a good first issue, aimed at guiding everyone interested in Rust and Databend to participate.

Issue #14972 | Feature: top k syntax support

Please let us know if you're interested in contributing to this feature, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.

New Contributors

We always open arms to everyone and can't wait to see how you'll help our community grow and thrive.

Changelog

You can check the changelog of Databend Nightly for details about our latest developments.

Full Changelog: https://github.com/datafuselabs/databend/compare/v1.2.371-nightly...v1.2.378-nightly


Contributors

A total of 24 contributors participated

We are very grateful for the outstanding work of the contributors.