Databend Monthly Report: October 2025 — Production-Ready Stored Procedures
DatabendLabsNov 7, 2025
Hey Databend community! 🚀
October was all about production automation. Stored procedures are now production-ready with experimental flags removed, and the SQL scripting engine gained dynamic schema support for complex ETL workflows. We also shipped inverted indexes for JSON search, intelligent memory spilling, and hardened operational tooling—all in eleven releases.
By the Numbers
22 new features, 19 bug fixes, 15 refactors, and 12 maintenance updates landed in October.
Monthly Highlights
🔥 Major New Features
- Production-ready stored procedures — Dynamic schema introspection and experimental flags removed; procedures now ship with first-class scripting support
- ANY JOIN operations — INNER/LEFT/RIGHT ANY JOIN for efficient set membership checks and semi-join patterns
- JSON full-text search — Inverted indexes now scan Variant inner fields for fast text search in semi-structured data
- Copy history tracking — New system table provides audit trail for all COPY operations
copy_history - Vacuum protection — Irreversible DROP operations now require explicit confirmation to prevent accidental data loss
- Math function expansion — Added and
isnan()for IEEE 754 floating-point validationisinf()
⚡ Performance & Operations
- Intelligent memory spilling — BackpressureSpiller prevents OOM by coordinating early spills under memory pressure
- Pipeline optimization — Fixed max_threads calculation to properly size parallel execution based on pipe width
- S3 intelligent tiering — Enhanced support for AWS S3 storage class transitions with reduced latency variance
- Expression parsing — Faster SQL parsing for complex expressions
- Statistics enhancements — now includes virtual column stats with min/max values for better query planning
SHOW STATISTICS
🔧 Meta-Service Improvements
- Operational diagnostics — shows detailed build info; new
databend-meta -Vfor debuggingmetactl dump-raft-log-wal - I/O timing — Tracks log entry application timing to identify performance bottlenecks
- RPC consolidation — Unified timing metrics in RpcHandler for better observability
- Atomic watch streams — Fixed race condition in watch stream registration for cluster stability
🐛 Stability & Reliability
- CSV memory fixes — Large CSV COPY operations no longer OOM thanks to streaming improvements
- JWT optimization — Smarter key lookup avoids unnecessary JWKS refresh requests
- Parquet streaming — Fixed deserialization errors when base table columns change types
- Grouping sets — Corrected shuffle mode handling in complex GROUP BY operations
- Row access policies — Fixed parameter ordering and case sensitivity issues
- NULL constraints — Proper validation during column modifications
🎯 Default Settings
- Geo and Virtual Columns — Now enabled by default for all new deployments
- MERGE INTO — Removed experimental flag; fully production-ready
🤝 Community
- Welcome to new contributors @camilesing and @RiversJin — thanks for jumping in!
Spotlight: Stored Procedures for AI Training Automation
Stored procedures are the star of October. The SQL scripting engine now supports dynamic schema introspection, and experimental flags are gone—procedures work out of the box. This means you can automate complex data workflows entirely in SQL, no external tools needed.
Here's a real-world scenario: autonomous driving model training. Road test videos are extracted into frames and AI automatically annotates each frame (pedestrians, vehicles, weather, lighting). You need to filter frames by specific criteria—like "urban night scenes with pedestrians" or "rainy city footage"—to build targeted training datasets.
This walkthrough demonstrates
LET
IF
RETURN TABLE
Step-by-Step Tutorial
1. Create database and table
CREATE DATABASE IF NOT EXISTS av_training;
USE av_training;
-- Video frame annotations (AI-generated labels)
CREATE TABLE video_frames (
video_id STRING,
frame_id INT,
scene_type STRING, -- 'urban', 'highway'
has_pedestrian BOOLEAN,
has_vehicle BOOLEAN,
lighting STRING, -- 'day', 'night', 'dusk'
weather STRING, -- 'clear', 'rainy', 'foggy', 'sunny'
pedestrian_count INT,
confidence FLOAT, -- AI annotation confidence
capture_date DATE
);
2. Insert sample data
INSERT INTO video_frames VALUES
-- Urban night scenes with pedestrians
('V001', 1, 'urban', true, true, 'night', 'clear', 2, 0.95, '2025-10-20'),
('V001', 2, 'urban', true, false, 'night', 'clear', 1, 0.92, '2025-10-20'),
-- Highway day scene
('V002', 1, 'highway', false, true, 'day', 'sunny', 0, 0.98, '2025-10-21'),
-- Urban night rainy scenes
('V003', 1, 'urban', true, true, 'night', 'rainy', 1, 0.89, '2025-10-22'),
('V003', 2, 'urban', true, false, 'night', 'rainy', 2, 0.91, '2025-10-22'),
-- Highway night (no pedestrians)
('V004', 1, 'highway', false, true, 'night', 'clear', 0, 0.97, '2025-10-23'),
-- Urban day scenes
('V005', 1, 'urban', true, true, 'day', 'clear', 3, 0.88, '2025-10-24'),
('V005', 2, 'urban', true, false, 'day', 'foggy', 1, 0.93, '2025-10-24');
3. Create the frame filtering procedure
This procedure keeps the business logic simple: it always pulls high-confidence (≥ 0.85) pedestrian scenes, while letting you narrow by scene type and weather:
CREATE PROCEDURE filter_training_frames(
scene_filter STRING, -- 'urban', 'highway', 'ALL'
weather_filter STRING, -- 'clear', 'rainy', 'ALL'
date_from DATE,
date_to DATE
)
RETURNS TABLE (
scene_type STRING,
lighting STRING,
weather STRING,
frame_count INT,
pedestrians INT,
avg_confidence DOUBLE,
total_frames INT
)
LANGUAGE SQL
AS $$
BEGIN
LET min_conf := 0.85;
LET total_frames := 0;
CREATE OR REPLACE TEMPORARY TABLE matched_frames AS
SELECT
video_id, frame_id, scene_type, lighting,
weather, pedestrian_count, confidence
FROM video_frames
WHERE capture_date BETWEEN :date_from AND :date_to
AND confidence >= min_conf
AND (:scene_filter = 'ALL' OR scene_type = :scene_filter)
AND (:weather_filter = 'ALL' OR weather = :weather_filter)
AND has_pedestrian = true;
LET total_frames := (SELECT COUNT(*) FROM matched_frames);
IF total_frames = 0 THEN
RETURN TABLE(
SELECT NULL::STRING AS scene_type,
NULL::STRING AS lighting,
NULL::STRING AS weather,
0::INT AS frame_count,
0::INT AS pedestrians,
0.0::DOUBLE AS avg_confidence,
0::INT AS total_frames
);
END IF;
RETURN TABLE(
SELECT
scene_type,
lighting,
weather,
COUNT(*) AS frame_count,
SUM(pedestrian_count) AS pedestrians,
AVG(confidence) AS avg_confidence,
:total_frames AS total_frames
FROM matched_frames
GROUP BY scene_type, lighting, weather
ORDER BY frame_count DESC
);
END;
$$;
4. Run the procedure: Filter urban pedestrian scenes across any weather
CALL PROCEDURE filter_training_frames(
'urban', -- Urban scenes only
'ALL', -- Any weather
DATE '2025-10-20',
DATE '2025-10-24'
);
Expected Output:
scene_type | lighting | weather | frame_count | pedestrians | avg_confidence | total_frames
urban | night | rainy | 2 | 3 | 0.90 | 6
urban | night | clear | 2 | 3 | 0.94 | 6
urban | day | foggy | 1 | 1 | 0.93 | 6
urban | day | clear | 1 | 3 | 0.88 | 6
What happened:
- Found 6 urban frames with pedestrians (confidence ≥ 0.85)
- Grouped by lighting and weather conditions
- 4 night frames + 2 day frames
- Provides scene distribution for training data diversity
- Set to
weather_filter(or any value) to zero in on specific conditions without touching the code'rainy'
Key Features Demonstrated
Core Capabilities
- for counters/thresholds, early
LETexits, temporary tablesIF - interpolation inside queries and aggregated
:variableRETURN TABLE
Production Use Cases
This pattern applies to:
- AI/ML data preparation — Filter and analyze training datasets
- Data quality checks — Validate and report on data issues
- ETL pipelines — Multi-step transformations with intermediate staging
- Business reporting — Parameterized reports with aggregations
- Data validation — Conditional checks with early exits
Learn more: Stored Procedure & SQL Scripting Documentation
What This Means
October's focus on stored procedures marks a major milestone: Databend is now a complete data platform, not just a query engine. You can build, test, and deploy complex data pipelines entirely in SQL—no external orchestrators needed.
Combined with existing capabilities, this creates a powerful foundation:
- Structured analytics — Fast columnar queries for traditional BI
- Semi-structured search — Inverted indexes for JSON full-text search
- Vector search — HNSW indexing for AI/ML workloads
- Geospatial — Location analytics enabled by default
- Automation — Production-ready stored procedures for ETL
All in one unified Rust engine. All Snowflake-compatible. All open source.
We're also heads-down on table branching: versioned tables that keep data pipelines clean by letting you branch, test, and merge just like code. Development has kicked off—if you need specific workflows supported, drop us an issue on GitHub so we can build it together.
Try it today: https://databend.com
The Databend Team
Subscribe to our newsletter
Stay informed on feature releases, product roadmap, support, and cloud offerings!



