Blog

Databend Monthly Report: October 2025 — Production-Ready Stored Procedures

avatarDatabendLabsNov 7, 2025
Databend Monthly Report: October 2025 — Production-Ready Stored Procedures

Hey Databend community! 🚀

October was all about production automation. Stored procedures are now production-ready with experimental flags removed, and the SQL scripting engine gained dynamic schema support for complex ETL workflows. We also shipped inverted indexes for JSON search, intelligent memory spilling, and hardened operational tooling—all in eleven releases.

By the Numbers

22 new features, 19 bug fixes, 15 refactors, and 12 maintenance updates landed in October.

Monthly Highlights

🔥 Major New Features

  • Production-ready stored procedures — Dynamic schema introspection and experimental flags removed; procedures now ship with first-class scripting support
  • ANY JOIN operations — INNER/LEFT/RIGHT ANY JOIN for efficient set membership checks and semi-join patterns
  • JSON full-text search — Inverted indexes now scan Variant inner fields for fast text search in semi-structured data
  • Copy history tracking — New
    copy_history
    system table provides audit trail for all COPY operations
  • Vacuum protection — Irreversible DROP operations now require explicit confirmation to prevent accidental data loss
  • Math function expansion — Added
    isnan()
    and
    isinf()
    for IEEE 754 floating-point validation

⚡ Performance & Operations

  • Intelligent memory spilling — BackpressureSpiller prevents OOM by coordinating early spills under memory pressure
  • Pipeline optimization — Fixed max_threads calculation to properly size parallel execution based on pipe width
  • S3 intelligent tiering — Enhanced support for AWS S3 storage class transitions with reduced latency variance
  • Expression parsing — Faster SQL parsing for complex expressions
  • Statistics enhancements
    SHOW STATISTICS
    now includes virtual column stats with min/max values for better query planning

🔧 Meta-Service Improvements

  • Operational diagnostics
    databend-meta -V
    shows detailed build info; new
    metactl dump-raft-log-wal
    for debugging
  • I/O timing — Tracks log entry application timing to identify performance bottlenecks
  • RPC consolidation — Unified timing metrics in RpcHandler for better observability
  • Atomic watch streams — Fixed race condition in watch stream registration for cluster stability

🐛 Stability & Reliability

  • CSV memory fixes — Large CSV COPY operations no longer OOM thanks to streaming improvements
  • JWT optimization — Smarter key lookup avoids unnecessary JWKS refresh requests
  • Parquet streaming — Fixed deserialization errors when base table columns change types
  • Grouping sets — Corrected shuffle mode handling in complex GROUP BY operations
  • Row access policies — Fixed parameter ordering and case sensitivity issues
  • NULL constraints — Proper validation during column modifications

🎯 Default Settings

  • Geo and Virtual Columns — Now enabled by default for all new deployments
  • MERGE INTO — Removed experimental flag; fully production-ready

🤝 Community

  • Welcome to new contributors @camilesing and @RiversJin — thanks for jumping in!

Spotlight: Stored Procedures for AI Training Automation

Stored procedures are the star of October. The SQL scripting engine now supports dynamic schema introspection, and experimental flags are gone—procedures work out of the box. This means you can automate complex data workflows entirely in SQL, no external tools needed.

Here's a real-world scenario: autonomous driving model training. Road test videos are extracted into frames and AI automatically annotates each frame (pedestrians, vehicles, weather, lighting). You need to filter frames by specific criteria—like "urban night scenes with pedestrians" or "rainy city footage"—to build targeted training datasets.

This walkthrough demonstrates

LET
variables,
IF
conditions, temporary tables, sub-queries, multi-table aggregations, and
RETURN TABLE
—all the core features for production automation.

Step-by-Step Tutorial

1. Create database and table

CREATE DATABASE IF NOT EXISTS av_training;
USE av_training;

-- Video frame annotations (AI-generated labels)
CREATE TABLE video_frames (
video_id STRING,
frame_id INT,
scene_type STRING, -- 'urban', 'highway'
has_pedestrian BOOLEAN,
has_vehicle BOOLEAN,
lighting STRING, -- 'day', 'night', 'dusk'
weather STRING, -- 'clear', 'rainy', 'foggy', 'sunny'
pedestrian_count INT,
confidence FLOAT, -- AI annotation confidence
capture_date DATE
);

2. Insert sample data

INSERT INTO video_frames VALUES
-- Urban night scenes with pedestrians
('V001', 1, 'urban', true, true, 'night', 'clear', 2, 0.95, '2025-10-20'),
('V001', 2, 'urban', true, false, 'night', 'clear', 1, 0.92, '2025-10-20'),

-- Highway day scene
('V002', 1, 'highway', false, true, 'day', 'sunny', 0, 0.98, '2025-10-21'),

-- Urban night rainy scenes
('V003', 1, 'urban', true, true, 'night', 'rainy', 1, 0.89, '2025-10-22'),
('V003', 2, 'urban', true, false, 'night', 'rainy', 2, 0.91, '2025-10-22'),

-- Highway night (no pedestrians)
('V004', 1, 'highway', false, true, 'night', 'clear', 0, 0.97, '2025-10-23'),

-- Urban day scenes
('V005', 1, 'urban', true, true, 'day', 'clear', 3, 0.88, '2025-10-24'),
('V005', 2, 'urban', true, false, 'day', 'foggy', 1, 0.93, '2025-10-24');

3. Create the frame filtering procedure

This procedure keeps the business logic simple: it always pulls high-confidence (≥ 0.85) pedestrian scenes, while letting you narrow by scene type and weather:

CREATE PROCEDURE filter_training_frames(
scene_filter STRING, -- 'urban', 'highway', 'ALL'
weather_filter STRING, -- 'clear', 'rainy', 'ALL'
date_from DATE,
date_to DATE
)
RETURNS TABLE (
scene_type STRING,
lighting STRING,
weather STRING,
frame_count INT,
pedestrians INT,
avg_confidence DOUBLE,
total_frames INT
)
LANGUAGE SQL
AS $$
BEGIN
LET min_conf := 0.85;
LET total_frames := 0;

CREATE OR REPLACE TEMPORARY TABLE matched_frames AS
SELECT
video_id, frame_id, scene_type, lighting,
weather, pedestrian_count, confidence
FROM video_frames
WHERE capture_date BETWEEN :date_from AND :date_to
AND confidence >= min_conf
AND (:scene_filter = 'ALL' OR scene_type = :scene_filter)
AND (:weather_filter = 'ALL' OR weather = :weather_filter)
AND has_pedestrian = true;

LET total_frames := (SELECT COUNT(*) FROM matched_frames);

IF total_frames = 0 THEN
RETURN TABLE(
SELECT NULL::STRING AS scene_type,
NULL::STRING AS lighting,
NULL::STRING AS weather,
0::INT AS frame_count,
0::INT AS pedestrians,
0.0::DOUBLE AS avg_confidence,
0::INT AS total_frames
);
END IF;

RETURN TABLE(
SELECT
scene_type,
lighting,
weather,
COUNT(*) AS frame_count,
SUM(pedestrian_count) AS pedestrians,
AVG(confidence) AS avg_confidence,
:total_frames AS total_frames
FROM matched_frames
GROUP BY scene_type, lighting, weather
ORDER BY frame_count DESC
);
END;
$$;

4. Run the procedure: Filter urban pedestrian scenes across any weather

CALL PROCEDURE filter_training_frames(
'urban', -- Urban scenes only
'ALL', -- Any weather
DATE '2025-10-20',
DATE '2025-10-24'
);

Expected Output:

scene_type | lighting | weather | frame_count | pedestrians | avg_confidence | total_frames
urban | night | rainy | 2 | 3 | 0.90 | 6
urban | night | clear | 2 | 3 | 0.94 | 6
urban | day | foggy | 1 | 1 | 0.93 | 6
urban | day | clear | 1 | 3 | 0.88 | 6

What happened:

  • Found 6 urban frames with pedestrians (confidence ≥ 0.85)
  • Grouped by lighting and weather conditions
  • 4 night frames + 2 day frames
  • Provides scene distribution for training data diversity
  • Set
    weather_filter
    to
    'rainy'
    (or any value) to zero in on specific conditions without touching the code

Key Features Demonstrated

Core Capabilities

  • LET
    for counters/thresholds, early
    IF
    exits, temporary tables
  • :variable
    interpolation inside queries and aggregated
    RETURN TABLE

Production Use Cases

This pattern applies to:

  • AI/ML data preparation — Filter and analyze training datasets
  • Data quality checks — Validate and report on data issues
  • ETL pipelines — Multi-step transformations with intermediate staging
  • Business reporting — Parameterized reports with aggregations
  • Data validation — Conditional checks with early exits

Learn more: Stored Procedure & SQL Scripting Documentation

What This Means

October's focus on stored procedures marks a major milestone: Databend is now a complete data platform, not just a query engine. You can build, test, and deploy complex data pipelines entirely in SQL—no external orchestrators needed.

Combined with existing capabilities, this creates a powerful foundation:

  • Structured analytics — Fast columnar queries for traditional BI
  • Semi-structured search — Inverted indexes for JSON full-text search
  • Vector search — HNSW indexing for AI/ML workloads
  • Geospatial — Location analytics enabled by default
  • Automation — Production-ready stored procedures for ETL

All in one unified Rust engine. All Snowflake-compatible. All open source.

We're also heads-down on table branching: versioned tables that keep data pipelines clean by letting you branch, test, and merge just like code. Development has kicked off—if you need specific workflows supported, drop us an issue on GitHub so we can build it together.


Try it today: https://databend.com

The Databend Team

Share this post

Subscribe to our newsletter

Stay informed on feature releases, product roadmap, support, and cloud offerings!