Blog

Databend Monthly Report (July 2025)

DatabendLabsAug 6, 2025

Hey Databend community! 🚀

July was a big month for us. We've been heads-down optimizing one thing: JSON query performance. The main story? We've turbocharged our Virtual Columns feature and the results are pretty impressive - 3x faster JSON queries with 26x less data scanning.

By the Numbers

This month: 25+ new features, 25+ bug fixes, 20+ performance optimizations, and 35+ other improvements. But honestly, the JSON performance work is what I'm most excited about.

Monthly Highlights Summary

🔥 Major Improvements

Virtual Columns performance breakthrough - 3x faster JSON queries, 26x less data scanning
Enhanced RBAC - now supports connection and sequence objects
Shuffle sort optimization - much better performance for large ordered datasets
Stream processing improvements - virtual columns now work with streaming data
Workload management - memory percentage quotas for better resource control

🛠 Developer Experience

40+ JSON functions - comprehensive toolkit for JSON operations
Multiple access patterns - both Snowflake and PostgreSQL syntax work
Enhanced UDFs - IMMUTABLE support for better performance
Better debugging - improved metactl tools

⚡ Performance & Infrastructure

Meta-service improvements - millisecond precision, better architecture
Query optimizations - including grouping sets rewrites
Memory management - percentage-based quotas

🐛 Stability

25+ bug fixes - JSON handling, decimal ops, query execution
Better error handling - edge cases and concurrent operations
Transaction reliability - temporary tables and system history

What's New and Game-Changing

🚀 Virtual Columns: Automatic JSON Indexing

So here's what we did. We took our existing Virtual Columns feature and completely optimized it for JSON workloads. Think of it as automatic JSON indexing - no manual intervention required.

Virtual Columns work like smart, automatic indexes for your JSON data. When you load JSON, we automatically:

Analyze access patterns in your JSON documents
Create optimized indexes for commonly-accessed paths
Route queries to these fast indexes instead of execute the jsonb function every time

It's like having a database admin who never sleeps, constantly creating the perfect indexes for your JSON queries.

Example JSON:

{
  "customer_id": 123,
  "order": {
    "items": [{ "name": "Shoes", "price": 59.99 }]
  },
  "user": { "name": "Alice", "email": "alice@example.com" }
}

Automatic indexes created for:

```
customer_id
```
→ Fast integer index
```
order.items[0].price
```
→ Fast numeric index
```
user.name
```
→ Fast string index

Zero configuration. Zero maintenance. Just faster queries.

📊 Real Numbers

Here's a real example from our testing:

Before (traditional JSON parsing):

SELECT data['account_balance'], data['address']['city']
FROM user_logs;
-- 3.76 seconds, 11.90 GB processed

After (automatic Virtual Column indexes):

-- Exact same query
SELECT data['account_balance'], data['address']['city']
FROM user_logs;
-- 1.32 seconds, 461 MB processed

That's 3x faster and 26x less data. And you don't change a single line of code.

🔧 All Your JSON Syntax Works

We support the syntax you're already using:

Snowflake style:

SELECT data['user']['name'] FROM customers;       -- brackets
SELECT data:user:name FROM customers;             -- colons
SELECT data['user'].profile.name FROM customers;  -- mixed

PostgreSQL style:

SELECT data->'user'->'name' FROM customers;       -- arrows

All of it gets the same automatic indexing performance boost.

What's Next: One Architecture, Continuous Evolution

July's JSON optimization is actually part of something much bigger we're building. We're working toward mastering all types of data with one unified architecture:

Our multi-modal data approach:

✅ Structured data - World-class columnar performance (already there)
✅ Semi-structured data - Automatic Virtual Column indexing (July milestone)
🚀 Unstructured data - Advanced vectorization with predicate pushdown (August preview)

🔮 August Preview: Unstructured Data Storage & Compute

While July was all about automatic indexing for semi-structured JSON, August is going to be about unstructured data. We're launching comprehensive capabilities for vector storage, compute, and query acceleration:

What's coming in August:

Vectorized storage - Native vector data types with optimized columnar storage
HNSW indexing - Hardware-accelerated similarity search with sub-second query times
Cross-modal SQL joins - Query across structured, semi-structured, and unstructured data in single queries

Imagine queries like this:

-- Find similar products using vector embeddings
-- AND analyze their sales performance from structured data
-- AND extract sentiment from JSON reviews
-- All in one SQL query

SELECT p.product_name,
       sales.total_revenue,
       reviews.data['sentiment']['score'] as sentiment,
       COSINE_SIMILARITY(p.embedding, :search_vector) as similarity
FROM products p
JOIN sales_data sales ON p.id = sales.product_id  -- structured
JOIN product_reviews reviews ON p.id = reviews.product_id  -- semi-structured JSON
WHERE COSINE_SIMILARITY(p.embedding, :search_vector) > 0.8  -- unstructured vectors
  AND reviews.data['sentiment']['score'] > 0.7
ORDER BY similarity DESC, total_revenue DESC;

This is one platform for the AI era - where structured analytics, JSON document queries, and vector similarity search all work together seamlessly. No data movement, no separate systems, just SQL that works across all your data types.

Early feedback from production users has been really encouraging. People are seeing immediate performance improvements on their JSON workloads while getting positioned for AI use cases down the road.

Check out what we're building at https://github.com/databendlabs/databend.

Thanks for following along!

The Databend Team

Share this post

Subscribe to our newsletter

Stay informed on feature releases, product roadmap, support, and cloud offerings!

How We Eliminated Our AWS Lambda Bill in the S3 Data Lake with Databend

Databend Cloud Makes Data Easy for Everyone

Databend Monthly Report (July 2025)

DatabendLabsAug 6, 2025

By the Numbers

Monthly Highlights Summary

🔥 Major Improvements

🛠 Developer Experience

⚡ Performance & Infrastructure

🐛 Stability

What's New and Game-Changing

🚀 Virtual Columns: Automatic JSON Indexing

📊 Real Numbers

🔧 All Your JSON Syntax Works

What's Next: One Architecture, Continuous Evolution

🔮 August Preview: Unstructured Data Storage & Compute

Subscribe to our newsletter

Get Started

Talk to us

Products

Resources

Community

Company

Solutions

Partners