Skip to main content

Databend Integrates PRQL: A Small Step for Modern Data Processing

Alt text

PRQL, pronounced as "Prequel", is a query language that stands alongside SQL. Its unique feature is the pipeline syntax, which makes querying relational databases more intuitive and efficient.

Databend Embraces PRQL

In the v1.2.380-nightly release, thanks to a significant PR by community contributor @ncuwaln, Databend has successfully integrated support for the PRQL language. This new feature further enhances the flexibility and user-friendliness of Databend queries.

Users only need to make a simple configuration change to enable the PRQL dialect:

set sql_dialect = 'prql';

Integration Details

Both Databend and PRQL are developed in Rust, which facilitated the integration of the PRQL compiler prqlc.

prqlc = "0.11.3"

This PR implemented support for PRQL in less than a hundred lines of code, with a new SQL dialect option prql.

Upon enabling this option, Databend calls prqlc to parse PRQL statements and convert them into SQL statements, which are then further processed by Databend's SQL parser.

let final_sql: String = match sql_dialect == Dialect::PRQL {
true => {
let options = prqlc::Options::default();
match prqlc::compile(sql, &options) {
Ok(res) => {
prql_converted = true;
res
}
Err(e) => {
warn!(
"Try convert prql to sql failed, still use raw sql to parse. error: {}",
e.to_string()
);
sql.to_string()
}
}
}
false => sql.to_string(),
};

It's worth mentioning that prqlc uses the chumsky library as its parser, while Databend uses nom-rule. Although they differ in technical implementation, both provide user-friendly error recovery information.

Error Message Comparison

Here are examples of error messages from the PRQL parser and the Databend parser:

PRQL Parser Error Message:

from tracks
select {album_id, name, unit_price}
sort {-unit_price, name}
grouq album_id (
aggregate {
track_count = count name,
album_price = sum unit_price
}
)

Error:
╭─[:4:1]

4 │ grouq album_id (
│ ──┬──
│ ╰──── Unknown name `grouq`
───╯

Databend Parser Error Message:

🐳 :) SELECT
album_id,
COUNT(*) AS track_count,
COALESCE(SUM(unit_price), 0) AS album_price
FROM
tracks
GROUP BX
album_id;
error: APIError: ResponseError with 1005: error:
--> SQL:7:7
|
1 | SELECT
| ------ while parsing `SELECT ...`
.
4 | COALESCE(SUM(unit_price), 0) AS album_price
5 | FROM
6 | tracks
7 | GROUP BX
| ^^ unexpected `BX`, expecting `BY`

How to Use PRQL

Users just need to make a simple configuration to switch to the PRQL dialect and start enjoying the intuitive, pipeline-style query experience of PRQL. Here's a simple query example:

🐳 :) set sql_dialect = 'prql';

🐳 :) from lineitem ## run tpch q1
filter `l_shipdate` <= '1998-09-02'
group {l_returnflag, l_linestatus} (
aggregate {
sum_qty = sum l_quantity,
sum_base_price = sum l_extendedprice,
sum_disc_price = sum l_extendedprice * (1 - l_discount),
sum_charge = sum l_extendedprice * (1 - l_discount) * (1 + l_tax),
avg_qty = average l_quantity,
avg_price = average l_extendedprice,
avg_disc = average l_discount,
count_order = count 0 ,
}
);

Through this example, we can see the powerful functionality and elegant syntax design of PRQL, which simplifies complex data queries into something simple and intuitive. Of course, users can switch back to the standard SQL dialect or any other supported dialects whenever needed:

🐳 :) set sql_dialect = 'postgresql';

With this integration, Databend embraces the cutting-edge query language PRQL, bringing a more efficient and enjoyable data processing experience to our users.