Blog

This Week in Databend #136

PsiACEMar 18, 2024

Databend is a modern cloud data warehouse, serving your massive-scale analytics needs at low cost and complexity. Open source alternative to Snowflake. Also available in the cloud: https://app.databend.com .

What's New

Stay informed about the latest features of Databend.

Understanding Tasks and Notifications in Databend

Databend now supports a comprehensive mechanism for tasks and notifications.

Tasks are executed according to a schedule or based on a DAG of tasks, executing specified SQL statements. With notification integrations, notifications can be sent to external messaging services.

CREATE TASK IF NOT EXISTS mytask
WAREHOUSE = 'mywh'
SCHEDULE = 30 SECOND
ERROR_INTEGRATION = 'myerror'
AS
BEGIN
BEGIN;
INSERT INTO mytable(ts) VALUES(CURRENT_TIMESTAMP);
DELETE FROM mytable WHERE ts < DATEADD(MINUTE, -5, CURRENT_TIMESTAMP());
COMMIT;
END;

The above example defines a task named mytask that runs every 30 seconds on the mywh compute cluster. The task executes a multi-statement transaction that includes an

INSERT
statement and a
DELETE
statement. When the task fails, it will trigger an error integration named
myerror
.

The mechanisms related to tasks and notifications are ready to use out of the box in Databend Cloud. If you would like to learn more, please contact the Databend team or refer to the resources listed below:

Code Corner

Discover some fascinating code snippets or projects that showcase our work or learning journey.

Databend vs. Snowflake: Data Ingestion Benchmark

We conducted four specific benchmarks to evaluate Databend Cloud versus Snowflake:

  • TPC-H SF100 Dataset Loading: Focuses on loading performance and cost for a large-scale dataset (100GB, ~600 million rows).
  • ClickBench Hits Dataset Loading: Tests efficiency in loading a wide-table dataset (76GB, ~100 million rows, 105 columns), emphasizing challenges associated with high column counts.
  • 1-Second Freshness: Measures the platforms' ability to ingest data within a strict 1-second freshness requirement.
  • 5-Second Freshness: Compares the platforms' data ingestion capabilities under a 5-second freshness constraint.

Data Loading Benchmark

data loading

Freshness Benchmark

data freshness

Welcome to read the following documentation to understand the low-cost, high-performance data ingestion of Databend Cloud.

Highlights

We have also made these improvements to Databend that we hope you will find helpful:

  • Added support for spill in
    CROSS JOIN
    .
  • Added support for spill in the new aggregate hash table.
  • Added support for refreshing inverted indexes.
  • Added more function aliases for time and date related functions to support more data analysis tools.

What's Up Next

We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.

Adding support for TOP K Syntax

Databend plans to support the

SELECT TOP
statement to pick the top K items from a result set.

The following

SELECT TOP
statement is used to select the top 4 items from an ordered result set:

select TOP 4 c1 from testable ORDER BY c1;

Equivalent to the following

SELECT ... LIMIT
statement.

select c1 from testable order by c1 limit 4;

This is a good first issue, aimed at guiding everyone interested in Rust and Databend to participate.

Issue #14972 | Feature: top k syntax support

Please let us know if you're interested in contributing to this feature, or pick up a good first issue at https://link.databend.com/i-m-feeling-lucky to get started.

New Contributors

We always open arms to everyone and can't wait to see how you'll help our community grow and thrive.

Changelog

You can check the changelog of Databend Nightly for details about our latest developments.

Full Changelog: https://github.com/datafuselabs/databend/compare/v1.2.371-nightly...v1.2.378-nightly

Share this post

Subscribe to our newsletter

Stay informed on feature releases, product roadmap, support, and cloud offerings!