Skip to main content

This Week in Databend #93

Databend is a modern cloud data warehouse, serving your massive-scale analytics needs at low cost and complexity. Open source alternative to Snowflake. Also available in the cloud: https://app.databend.com .

The upgrade tool meta-upgrade-09 will no longer be available in the release package. If you're using Databend 0.9 or an earlier version, you can seek help from the community.

What's On In Databend

Stay connected with the latest news about Databend.

Databend's Segment Caching Mechanism Now Boasts Improved Memory Usage

Databend's segment caching mechanism has received a significant upgrade that reduces its memory usage to 1.5/1000 of the previous usage in a test scenario.

The upgrade involves a different "representation" of cached segments, called CompactSegmentInfo. This presentation consists mainly of two components:

  • The decoded min/max indexes and other statistical information.
  • The undecoded (and compressed) raw bytes of block-metas.

During segment pruning, if any segments are pruned, there is no need to decode the block-metas represented by raw bytes. If they are not pruned, then their raw bytes are decoded on-the-fly for block pruning and scanning purposes (and dropped if no longer needed).

If you are interested in learning more, please check out the resources listed below.

Code Corner

Discover some fascinating code snippets or projects that showcase our work or learning journey.

Bind databend into Python

Databend now offers a Python binding that allows users to execute SQL queries against Databend using Python even without deploying a Databend instance.

To use this functionality, simply import SessionContext from databend module and create an instance of it:

from databend import SessionContext

ctx = SessionContext()

You can then run SQL queries using the sql() method on your session context object:

df = ctx.sql("select number, number + 1, number::String as number_p_1 from numbers(8)")

The resulting DataFrame can be converted to PyArrow or Pandas format using the to_py_arrow() or to_pandas() methods respectively:

df.to_pandas() # Or, df.to_py_arrow()

Feel free to integrate it with your data science workflow.

Highlights

Here are some noteworthy items recorded here, perhaps you can find something that interests you.

  • Read the two new tutorials added to Transform Data During Load to learn how to perform arithmetic operations during loading and load data into a table with additional columns.
  • Read Working with Stages to gain a deeper understanding and learn how to manage and use it effectively.
  • Added functions: date_format, str_to_date and str_to_timestamp.

What's Up Next

We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.

Add open-sharing Binary to Databend Image

Open Sharing is a cheap and secure data sharing protocol for databend query on multi-cloud environments. Databend provides a binary called open-sharing, which is a tenant-level sharing endpoint. You can read databend | sharing-endpoint - README.md to learn more information.

To facilitate the deployment of open-sharing endpoint instances using K8s or Docker, it is recommended to add it to Databend's docker image.

Issue #11182 | Feature: added open-sharing binary in the databend-query image

Please let us know if you're interested in contributing to this issue, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.

New Contributors

We always open arms to everyone and can't wait to see how you'll help our community grow and thrive.

  • @Mehrbod2002 made their first contribution in #11367. Added validation for max_storage_io_requests.
  • @DongHaowen made their first contribution in #11362. Specified database in benchmark.

Changelog

You can check the changelog of Databend Nightly for details about our latest developments.

Full Changelog: https://github.com/datafuselabs/databend/compare/v1.1.30-nightly...v1.1.38-nightly


Contributors

A total of 24 contributors participated

We are very grateful for the outstanding work of the contributors.