Skip to main content

Optimizing Databend Binary Builds with Profile-guided Optimization

Recently someone in the community suggested that we try profile-guided optimization (#9387). Let's see how we can use Rust to build a PGO-optimized Databend!


Profile-guided Optimization is a compiler optimization technique, which collects typical execution data (possible branches) during program execution and then optimizes for inlining, conditional branches, machine code layout, register allocation, etc.

The reason to introduce this technique is that static analysis techniques only consider code performance improvements without actually executing the program. However, these optimizations may not be fully effective. In the absence of runtime information, the compiler cannot take into account the actual execution of the program.

PGO allows data to be collected based on application scenarios in a production environment, so the optimizer can optimize the speed for hot code paths and size for cold code paths and produce faster and smaller code for applications.

rustc supports PGO by building data collection into the binaries, then collecting perf data during runtime to prepare for the final compilation optimization. The implementation relies entirely on LLVM.


Follow the workflow below to generate a PGO-optimized program:

  • Compile the program with instrumentation enabled.
  • Run the instrumented program to generate a profraw file.
  • Convert the .profraw file into a .profdata file using LLVM's llvm-profdata tool.
  • Compile the program again with the profiling data.


The data collected during the run will be eventually converted with llvm-profdata. To do so, install the llvm-tools-preview component via rustup, or consider using the program provided by a recent LLVM or Clang version.

rustup component add llvm-tools-preview

After the installation, llvm-profdata may need to be added to the following PATH:



The following procedure uses Databend's SQL logic tests for demonstration purposes only to help us understand how it works, so you may not get positive results for performance. Use a typical workload for your production environment.

The caveat, however, is that the sample of data fed to the program during the profiling stage must be statistically representative of the typical usage scenarios; otherwise, profile-guided feedback has the potential to harm the overall performance of the final build instead of improving it.

  1. Make sure there is no left-over profiling data from previous runs.

    rm -rf /tmp/pgo-data
  2. Build the instrumented binaries (with release profile), using the RUSTFLAGS environment variable in order to pass the PGO compiler flags to the compilation of all crates in the program.

    RUSTFLAGS="-Cprofile-generate=/tmp/pgo-data" \
    cargo build --release --target=x86_64-unknown-linux-gnu
  3. Instrumented binaries were run with some typical workload and we strongly recommend using workload that is statistically representative of the real scenario. This example runs SQL logic tests for reference only.

  • Start a stand-alone Databend via a script, or a Databend cluster. Note that a production environment is more likely to run in cluster mode.

  • Import the dataset and run a typical query workload.

    BUILD_PROFILE=release ./scripts/ci/deploy/
    ulimit -n 10000;ulimit -s 16384; cargo run -p sqllogictests --release -- --enable_sandbox --parallel 16 --no-fail-fast
  1. Merge the .profraw files into a .profdata file with llvm-profdata.

    llvm-profdata merge -o /tmp/pgo-data/merged.profdata /tmp/pgo-data
  2. Use the .profdata file for guiding optimizations. In fact, you can notice that both builds use the --release flag, because in an actual runtime case we always use the release build binary.

    RUSTFLAGS="-Cprofile-use=/tmp/pgo-data/merged.profdata -Cllvm-args=-pgo-warn-missing-function" \
    cargo build --release --target=x86_64-unknown-linux-gnu
  3. Run the compiled program again with the previous workload and check the performance:

    BUILD_PROFILE=release ./scripts/ci/deploy/
    ulimit -n 10000;ulimit -s 16384; cargo run -p sqllogictests --release -- --enable_sandbox --parallel 16 --no-fail-fast