Blog

This Week in Databend #107

PsiACEAug 20, 2023

Databend is a modern cloud data warehouse, serving your massive-scale analytics needs at low cost and complexity. Open source alternative to Snowflake. Also available in the cloud: https://app.databend.com .

What's On In Databend

Stay connected with the latest news about Databend.

Understanding Connection Parameters

The connection parameters refer to a set of essential connection details required for establishing a secure link to supported external storage services, like Amazon S3. These parameters are enclosed within parentheses and consists of key-value pairs separated by commas or spaces. It is commonly utilized in operations such as creating a stage, copying data into Databend, and querying staged files from external sources.

For example, the following statement creates an external stage on Amazon S3 with the connection parameters:

CREATE STAGE my_s3_stage
's3://load/files/'
CONNECTION = (
ACCESS_KEY_ID = '<your-access-key-id>',
SECRET_ACCESS_KEY = '<your-secret-access-key>'
);

If you are interested in learning more, please check out the resources listed below.

Adding Storage Parameters for Hive Catalog

Over the past week, Databend introduced storage parameters for the Hive Catalog, allowing the configuration of specific storage services. This means that the catalog no longer relies on the storage backend of the default catalog.

The following example shows how to create a Hive Catalog using MinIO as the underlying storage service:

CREATE CATALOG hive_ctl
TYPE = HIVE
CONNECTION =(
ADDRESS = '127.0.0.1:9083'
URL = 's3://warehouse/'
AWS_KEY_ID = 'admin'
AWS_SECRET_KEY = 'password'
ENDPOINT_URL = 'http://localhost:9000/'
)

If you are interested in learning more, please check out the resources listed below.

Code Corner

Discover some fascinating code snippets or projects that showcase our work or learning journey.

Using
gitoxide
to Speed Up Git Dependency Downloads

gitoxide
is a high-performance, modern Git implementation written in Rust. Utilizing the
gitoxide
feature of cargo (Unstable), the
gitoxide
crate can replace
git2
to perform various Git operations, thereby achieving several times performance improvement when downloading crates-index and git dependencies.

Databend has recently enabled this feature for

cargo {build | clippy | test}
in CI. You can also try to add the -Zgitoxide option to speed up the build process during local development:

cargo -Zgitoxide=fetch,shallow-index,shallow-deps build

If you are interested in learning more, please check out the resources listed below:

Highlights

We have also made these improvements to Databend that we hope you will find helpful:

  • VALUES
    clause
    can be used without being combined with
    SELECT
    .
  • You can now set a default value when modifying the type of a column. See Docs | ALTER TABLE COLUMN for details.
  • Databend can now automatically recluster a table after write operations such as
    COPY INTO
    and
    REPLACE INTO
    .

What's Up Next

We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.

Enhancing
infer_schema
for All File Locations

Currently, it is possible to query files using file locations or from stages in Databend.

select * from 'fs:///home/...';
select * from 's3://bucket/...';
select * from @stage;

However, the

infer_schema
function only works with staged files. For example:

select * from infer_schema(location=>'@stage/...');

When attempting to use

infer_schema
with other file locations, it leads to a panic:

select * from infer_schema(location =>'fs:///home/...'); -- this will panic.

So, the improvement involves extending the

infer_schema
capability to encompass all types of file paths, not limited to staged files. This will enhance system consistency and the usefulness of the
infer_schema
function.

Issue #12458 | Feature:

support normal file path

Please let us know if you're interested in contributing to this feature, or pick up a good first issue at https://link.databend.com/i-m-feeling-lucky to get started.

Changelog

You can check the changelog of Databend Nightly for details about our latest developments.

Full Changelog: https://github.com/datafuselabs/databend/compare/v1.2.62-nightly...v1.2.74-nightly

Share this post

Subscribe to our newsletter

Stay informed on feature releases, product roadmap, support, and cloud offerings!