Blog

Databend Monthly Report: October 2025 — Production-Ready Stored Procedures

DatabendLabsNov 7, 2025

Hey Databend community! 🚀

October was all about production automation. Stored procedures are now production-ready with experimental flags removed, and the SQL scripting engine gained dynamic schema support for complex ETL workflows. We also shipped inverted indexes for JSON search, intelligent memory spilling, and hardened operational tooling—all in eleven releases.

By the Numbers

22 new features, 19 bug fixes, 15 refactors, and 12 maintenance updates landed in October.

Monthly Highlights

🔥 Major New Features

Production-ready stored procedures — Dynamic schema introspection and experimental flags removed; procedures now ship with first-class scripting support
ANY JOIN operations — INNER/LEFT/RIGHT ANY JOIN for efficient set membership checks and semi-join patterns
JSON full-text search — Inverted indexes now scan Variant inner fields for fast text search in semi-structured data
Copy history tracking — New
```
copy_history
```
system table provides audit trail for all COPY operations
Vacuum protection — Irreversible DROP operations now require explicit confirmation to prevent accidental data loss
Math function expansion — Added
```
isnan()
```
and
```
isinf()
```
for IEEE 754 floating-point validation

⚡ Performance & Operations

Intelligent memory spilling — BackpressureSpiller prevents OOM by coordinating early spills under memory pressure
Pipeline optimization — Fixed max_threads calculation to properly size parallel execution based on pipe width
S3 intelligent tiering — Enhanced support for AWS S3 storage class transitions with reduced latency variance
Expression parsing — Faster SQL parsing for complex expressions
Statistics enhancements —
```
SHOW STATISTICS
```
now includes virtual column stats with min/max values for better query planning

🔧 Meta-Service Improvements

Operational diagnostics —
```
databend-meta -V
```
shows detailed build info; new
```
metactl dump-raft-log-wal
```
for debugging
I/O timing — Tracks log entry application timing to identify performance bottlenecks
RPC consolidation — Unified timing metrics in RpcHandler for better observability
Atomic watch streams — Fixed race condition in watch stream registration for cluster stability

🐛 Stability & Reliability

CSV memory fixes — Large CSV COPY operations no longer OOM thanks to streaming improvements
JWT optimization — Smarter key lookup avoids unnecessary JWKS refresh requests
Parquet streaming — Fixed deserialization errors when base table columns change types
Grouping sets — Corrected shuffle mode handling in complex GROUP BY operations
Row access policies — Fixed parameter ordering and case sensitivity issues
NULL constraints — Proper validation during column modifications

🎯 Default Settings

Geo and Virtual Columns — Now enabled by default for all new deployments
MERGE INTO — Removed experimental flag; fully production-ready

🤝 Community

Welcome to new contributors @camilesing and @RiversJin — thanks for jumping in!

Spotlight: Stored Procedures for AI Training Automation

Stored procedures are the star of October. The SQL scripting engine now supports dynamic schema introspection, and experimental flags are gone—procedures work out of the box. This means you can automate complex data workflows entirely in SQL, no external tools needed.

Here's a real-world scenario: autonomous driving model training. Road test videos are extracted into frames and AI automatically annotates each frame (pedestrians, vehicles, weather, lighting). You need to filter frames by specific criteria—like "urban night scenes with pedestrians" or "rainy city footage"—to build targeted training datasets.

This walkthrough demonstrates

LET

variables,

IF

conditions, temporary tables, sub-queries, multi-table aggregations, and

RETURN TABLE

—all the core features for production automation.

Step-by-Step Tutorial

1. Create database and table

CREATE DATABASE IF NOT EXISTS av_training;
USE av_training;

-- Video frame annotations (AI-generated labels)
CREATE TABLE video_frames (
    video_id STRING,
    frame_id INT,
    scene_type STRING,        -- 'urban', 'highway'
    has_pedestrian BOOLEAN,
    has_vehicle BOOLEAN,
    lighting STRING,          -- 'day', 'night', 'dusk'
    weather STRING,           -- 'clear', 'rainy', 'foggy', 'sunny'
    pedestrian_count INT,
    confidence FLOAT,         -- AI annotation confidence
    capture_date DATE
);

2. Insert sample data

INSERT INTO video_frames VALUES
-- Urban night scenes with pedestrians
('V001', 1, 'urban', true, true, 'night', 'clear', 2, 0.95, '2025-10-20'),
('V001', 2, 'urban', true, false, 'night', 'clear', 1, 0.92, '2025-10-20'),

-- Highway day scene
('V002', 1, 'highway', false, true, 'day', 'sunny', 0, 0.98, '2025-10-21'),

-- Urban night rainy scenes
('V003', 1, 'urban', true, true, 'night', 'rainy', 1, 0.89, '2025-10-22'),
('V003', 2, 'urban', true, false, 'night', 'rainy', 2, 0.91, '2025-10-22'),

-- Highway night (no pedestrians)
('V004', 1, 'highway', false, true, 'night', 'clear', 0, 0.97, '2025-10-23'),

-- Urban day scenes
('V005', 1, 'urban', true, true, 'day', 'clear', 3, 0.88, '2025-10-24'),
('V005', 2, 'urban', true, false, 'day', 'foggy', 1, 0.93, '2025-10-24');

3. Create the frame filtering procedure

This procedure keeps the business logic simple: it always pulls high-confidence (≥ 0.85) pedestrian scenes, while letting you narrow by scene type and weather:

CREATE PROCEDURE filter_training_frames(
    scene_filter STRING,       -- 'urban', 'highway', 'ALL'
    weather_filter STRING,     -- 'clear', 'rainy', 'ALL'
    date_from DATE,
    date_to DATE
)
RETURNS TABLE (
    scene_type STRING,
    lighting STRING,
    weather STRING,
    frame_count INT,
    pedestrians INT,
    avg_confidence DOUBLE,
    total_frames INT
)
LANGUAGE SQL
AS $$
BEGIN
    LET min_conf := 0.85;
    LET total_frames := 0;

    CREATE OR REPLACE TEMPORARY TABLE matched_frames AS
    SELECT
        video_id, frame_id, scene_type, lighting,
        weather, pedestrian_count, confidence
    FROM video_frames
    WHERE capture_date BETWEEN :date_from AND :date_to
      AND confidence >= min_conf
      AND (:scene_filter = 'ALL' OR scene_type = :scene_filter)
      AND (:weather_filter = 'ALL' OR weather = :weather_filter)
      AND has_pedestrian = true;

    LET total_frames := (SELECT COUNT(*) FROM matched_frames);

    IF total_frames = 0 THEN
        RETURN TABLE(
            SELECT NULL::STRING AS scene_type,
                   NULL::STRING AS lighting,
                   NULL::STRING AS weather,
                   0::INT AS frame_count,
                   0::INT AS pedestrians,
                   0.0::DOUBLE AS avg_confidence,
                   0::INT AS total_frames
        );
    END IF;

    RETURN TABLE(
        SELECT
            scene_type,
            lighting,
            weather,
            COUNT(*) AS frame_count,
            SUM(pedestrian_count) AS pedestrians,
            AVG(confidence) AS avg_confidence,
            :total_frames AS total_frames
        FROM matched_frames
        GROUP BY scene_type, lighting, weather
        ORDER BY frame_count DESC
    );
END;
$$;

4. Run the procedure: Filter urban pedestrian scenes across any weather

CALL PROCEDURE filter_training_frames(
    'urban',                   -- Urban scenes only
    'ALL',                     -- Any weather
    DATE '2025-10-20',
    DATE '2025-10-24'
);

Expected Output:

scene_type | lighting | weather | frame_count | pedestrians | avg_confidence | total_frames
urban      | night    | rainy   |           2 |           3 |           0.90 |            6
urban      | night    | clear   |           2 |           3 |           0.94 |            6
urban      | day      | foggy   |           1 |           1 |           0.93 |            6
urban      | day      | clear   |           1 |           3 |           0.88 |            6

What happened:

Found 6 urban frames with pedestrians (confidence ≥ 0.85)
Grouped by lighting and weather conditions
4 night frames + 2 day frames
Provides scene distribution for training data diversity
Set
```
weather_filter
```
to
```
'rainy'
```
(or any value) to zero in on specific conditions without touching the code

Key Features Demonstrated

Core Capabilities

```
LET
```
for counters/thresholds, early
```
IF
```
exits, temporary tables
```
:variable
```
interpolation inside queries and aggregated
```
RETURN TABLE
```

Production Use Cases

This pattern applies to:

AI/ML data preparation — Filter and analyze training datasets
Data quality checks — Validate and report on data issues
ETL pipelines — Multi-step transformations with intermediate staging
Business reporting — Parameterized reports with aggregations
Data validation — Conditional checks with early exits

Learn more: Stored Procedure & SQL Scripting Documentation

What This Means

October's focus on stored procedures marks a major milestone: Databend is now a complete data platform, not just a query engine. You can build, test, and deploy complex data pipelines entirely in SQL—no external orchestrators needed.

Combined with existing capabilities, this creates a powerful foundation:

Structured analytics — Fast columnar queries for traditional BI
Semi-structured search — Inverted indexes for JSON full-text search
Vector search — HNSW indexing for AI/ML workloads
Geospatial — Location analytics enabled by default
Automation — Production-ready stored procedures for ETL

All in one unified Rust engine. All Snowflake-compatible. All open source.

We're also heads-down on table branching: versioned tables that keep data pipelines clean by letting you branch, test, and merge just like code. Development has kicked off—if you need specific workflows supported, drop us an issue on GitHub so we can build it together.

Try it today: https://databend.com

The Databend Team

Share this post

Subscribe to our newsletter

Stay informed on feature releases, product roadmap, support, and cloud offerings!

Recent Advances in Databend AI and ML Capabilities

Databend Monthly Report: October 2025 — Production-Ready Stored Procedures

DatabendLabsNov 7, 2025

By the Numbers

Monthly Highlights

🔥 Major New Features

⚡ Performance & Operations

🔧 Meta-Service Improvements

🐛 Stability & Reliability

🎯 Default Settings

🤝 Community

Spotlight: Stored Procedures for AI Training Automation

Step-by-Step Tutorial

Key Features Demonstrated

Production Use Cases

What This Means

Subscribe to our newsletter

Get Started

Talk to us

Products

Resources

Community

Company

Solutions

Partners