Blog

This Week in Databend #137

PsiACEMar 25, 2024
This Week in Databend #137

Databend is a modern cloud data warehouse, serving your massive-scale analytics needs at low cost and complexity. Open source alternative to Snowflake. Also available in the cloud: https://app.databend.com .

What's New

Stay informed about the latest features of Databend.

Adding Support for Query Matching Inverted Index

Inverted indexes are the most commonly used data structures in document retrieval systems, storing the mapping of where a word is located in a document or a set of documents in full text search.

Databend now supports the use of

match
to match inverted indexes in queries.

CREATE TABLE t (id int, content string)

INSERT INTO t VALUES
(1, 'The quick brown fox jumps over the lazy dog'),
(2, 'A picture is worth a thousand words'),
(3, 'The early bird catches the worm'),
(4, 'Actions speak louder than words'),
(5, 'Time flies like an arrow; fruit flies like a banana'),
(6, 'Beauty is in the eye of the beholder'),
(7, 'When life gives you lemons, make lemonade'),
(8, 'Put all your eggs in one basket'),
(9, 'You can not judge a book by its cover'),
(10, 'An apple a day keeps the doctor away')

CREATE INVERTED INDEX IF NOT EXISTS idx1 ON t(content)

REFRESH INVERTED INDEX idx1 ON t

SELECT id, score(), content FROM t WHERE match(content, 'word')
----
2 1.5948367 A picture is worth a thousand words
4 1.6550698 Actions speak louder than words

If you would like to learn more, please contact the Databend team or refer to the resources listed below:

Code Corner

Discover some fascinating code snippets or projects that showcase our work or learning journey.

Leveraging TASK & STREAM to Capture and Track User Activity in Real-Time

Streams in Databend are dynamic real-time representations of table changes. Creating a stream can capture and track modifications to related tables for ongoing analysis.

A task encapsulates specific SQL statements intended to be executed at predefined time intervals, upon specific event triggers, or as part of a broader task sequence.

When creating a task, you can follow the workflow below:

The following example demonstrates how to combine TASK and STREAM to capture and track user activity in real-time, periodically synchronizing the

user_activity_profiles
table with data in
activities_stream
to ensure
user_activity_profiles
always accurately reflects the latest user activities

-- Define a task in Databend
CREATE TASK user_activity_task
WAREHOUSE = 'default'
SCHEDULE = 1 MINUTE
-- Trigger task when new data arrives in activities_stream
WHEN stream_status('activities_stream') AS
-- Insert new records into user_activity_profiles
INSERT INTO user_activity_profiles
SELECT
-- Join activities_stream with user_profiles based on user_id
a.user_id, p.username, p.location, a.activity, a.timestamp
FROM
activities_stream AS a
JOIN user_profiles AS p
ON a.user_id = p.user_id
-- Include only rows where the action is 'INSERT'
WHERE a.change$action = 'INSERT';

If you're interested, please read the documentation below to learn how to complete this task using Databend Cloud.

Highlights

We have also made these improvements to Databend that we hope you will find helpful:

  • Added support for
    show views
    and
    desc view
    .
  • Added
    is_error
    ,
    is_not_error
    ,
    error_or
    functions.
  • Added pagination for
    task_history
    .
  • Added support for the PRQL query language.

What's Up Next

We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.

Supporting
CHANGES
Clause

The

CHANGES
clause allows querying for change tracking metadata of tables or views within a specified time interval without the need to create streams with explicit transaction offsets.

Combining multiple queries can be used to retrieve change tracking metadata between different transaction boundaries.

SELECT ...
FROM ...
CHANGES ( INFORMATION => { DEFAULT | APPEND_ONLY } )
AT ( { TIMESTAMP => <timestamp> | OFFSET => <time_difference> | STREAM => '<name>' } )
[ END( { TIMESTAMP => <timestamp> | OFFSET => <time_difference> } ) ]
[ ... ]

Issue #15028 | Feature: support CHANGES clause

Please let us know if you're interested in contributing to this feature, or pick up a good first issue at https://link.databend.com/i-m-feeling-lucky to get started.

New Contributors

We always open arms to everyone and can't wait to see how you'll help our community grow and thrive.

Changelog

You can check the changelog of Databend Nightly for details about our latest developments.

Full Changelog: https://github.com/datafuselabs/databend/compare/v1.2.378-nightly...v1.2.386-nightly

Share this post

Subscribe to our newsletter

Stay informed on feature releases, product roadmap, support, and cloud offerings!