Blog

This Week in Databend #72

PsiACEDec 14, 2022

Databend is a powerful cloud data warehouse. Built for elasticity and efficiency. Free and open. Also available in the cloud: https://app.databend.com .

What's New

Check out what we've done this week to make Databend even better for you.

Features & Improvements

Multiple Catalogs

  • extends show databases SQL (#9152)

Stage

  • support select from URI (#9247)

Streaming Load

  • support
    file_format
    syntax in streaming load insert sql (#9063)

Planner

  • push down
    limit
    to
    union
    (#9210)

Query

  • use
    analyze table
    instead of
    optimize table statistic
    (#9143)
  • fast parse insert values (#9214)

Storage

  • use distinct count calculated by the xor hash function (#9159)
  • read_parquet
    read meta before read data (#9154)
  • push down filter to parquet reader (#9199)
  • prune row groups before reading (#9228)

Open Sharing

  • add prototype open sharing and add sharing stateful tests (#9177)

Code Refactoring 🎉

*

  • simplify the global data registry logic (#9187)

Storage

  • refactor deletion (#8824)

Build/Testing/CI Infra Changes 🔌

  • release databend deb package and databend with hive (#9138, #9241, etc.)

Bug Fixes 🔧

Format

  • support ASCII control code hex as format field delimiter (#9160)

Planner

  • prewhere_column empty and predicate is not const will return empty (#9116)
  • don't push down topk to Merge when it's child is Aggregate (#9183)
  • fix nullable column validity not equal (#9220)

Query

  • address unit test hang on test_insert (#9242)

Storage

  • too many io requests for read blocks during compact (#9128)
  • collect orphan snapshots (#9108)

What's On In Databend

Stay connected with the latest news about Databend.

Breaking Change: Unified File Format Options

To simplify, we're rolling out a set of unified file format options as follows for the COPY INTO command, the Streaming Load API, and all the other cases where users need to describe their file formats:

[ FILE_FORMAT = ( TYPE = { CSV | TSV | NDJSON | PARQUET | XML} [ formatTypeOptions ] ) ]
  • Please note that the current format options starting with
    format_*
    will be deprecated.
  • ... FORMAT CSV ...
    will still be accepted by the ClickHouse handler.
  • Support for customized formats created by
    CREATE FILE FORMAT ...
    will be added in a future release:
    ... FILE_FORMAT = (format_name = 'MyCustomCSV') ....
    .

Learn More

Open Sharing

Open Sharing is a simple and secure data-sharing protocol designed for databend-query nodes running in a multi-cloud environment.

  • Simple & Free: Open Sharing is open-source and basically a RESTful API implementation.
  • Secure: Open Sharing verifies incoming requesters' identities and access permissions, and provides an audit log.
  • Multi-Cloud: Open Sharing supports a variety of public cloud platforms, including AWS, Azure, GCP, etc.

Learn More

What's Up Next

We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.

We're about to run stage-related tests again using the Streaming Load API to move files to a stage instead of an AWS command like this:

aws --endpoint-url ${STORAGE_S3_ENDPOINT_URL} s3 cp s3://testbucket/admin/data/ontime_200.csv s3://testbucket/admin/stage/internal/s1/ontime_200.csv >/dev/null 2>&1

This is because Databend users do not need to take care of, or do not even know the stage paths that the AWS command requires.

Issue 8528: refactor stage related tests

Please let us know if you're interested in contributing to this issue, or pick up a good first issue at https://link.databend.com/i-m-feeling-lucky to get started.

Changelog

You can check the changelog of Databend Nightly for details about our latest developments.

Contributors

Thanks a lot to the contributors for their excellent work this week.

ariesdevilb41shBohuTANGChasen-ZhangClSlaiddantengsky
ariesdevilb41shBohuTANGChasen-ZhangClSlaiddantengsky
drmingdrmerhantmaclichuangmergify[bot]PsiACERinChanNOWWW
drmingdrmerhantmaclichuangmergify[bot]PsiACERinChanNOWWW
soyeric128sundy-liwubxXuanwoxudong963youngsofun
soyeric128sundy-liwubxXuanwoxudong963youngsofun
ZhiHanZzhyasszzzdong
ZhiHanZzhyasszzzdong
Share this post

Subscribe to our newsletter

Stay informed on feature releases, product roadmap, support, and cloud offerings!