Blog

Databend Monthly Report (August 2025)

avatarDatabendLabsSep 10, 2025
Databend Monthly Report (August 2025)

Hey Databend community! 🚀

August was all about vector indexing. We delivered 23x faster similarity search with HNSW acceleration, making AI workloads production-ready. Combined with our existing structured and JSON capabilities, Databend is now a complete multi-modal data warehouse.

By the Numbers

15+ new features, 20+ bug fixes, 15+ performance optimizations. The highlight: vector indexing that makes AI applications blazing fast.

Monthly Highlights

🔥 Major Features

  • Vector Indexing with HNSW - 23x faster similarity search
  • Time slice functions - Advanced temporal data analysis
  • Enhanced JSON5 parsing - More flexible JSON processing

Performance & Stability

  • Stack overflow prevention - Fixed CTE and PhysicalPlan recursion issues
  • Memory management - Structured spill configuration for large operations
  • Meta service optimization - 40% reduction in service pressure
  • Vector index reliability - Fixed data loss during refresh operations

📊 23x Performance Improvement

Before:

SELECT title, cosine_distance(embedding, :query) as score
FROM documents ORDER BY score LIMIT 10;
-- 8.2 seconds, full table scan

After (with HNSW index):

-- Same query, same results
SELECT title, cosine_distance(embedding, :query) as score
FROM documents ORDER BY score LIMIT 10;
-- 0.35 seconds, index-accelerated

Vector Search Made Simple

Getting started with production-ready semantic search:

1. Create Table with Vector Index

CREATE TABLE products (
id INT,
name VARCHAR,
embedding VECTOR(1024),
-- Automatic 23x speedup with HNSW index
VECTOR INDEX idx(embedding) distance='cosine'
);

2. Insert Data

INSERT INTO products VALUES
(1, 'Wireless Headphones', [0.1, 0.2, ...]::VECTOR(1024)),
(2, 'Bluetooth Speaker', [0.3, 0.1, ...]::VECTOR(1024));
-- Find similar products instantly
SELECT name FROM products
ORDER BY cosine_distance(embedding, :search_vector)
LIMIT 5;

That's it. Sub-second similarity search for millions of vectors.

Supported Distance Functions

  • Cosine - Best for text/semantic similarity
  • L2 - Best for image/visual similarity
  • L1 - Best for feature comparison

What This Enables: Multi-Modal Data Warehouse

Traditional approach: Multiple separate systems

  • Structured data → PostgreSQL
  • JSON documents → Elasticsearch
  • Vector embeddings → Pinecone
  • Complex data pipelines to connect everything

Databend approach: One platform for everything

  • Structured data - World-class columnar analytics
  • JSON documents - 3x faster with automatic virtual columns (July)
  • Vector embeddings - 23x faster with HNSW indexing (August)

All in one SQL platform. No data movement. No system integration complexity.

This means you can build complete AI applications - recommendation engines, semantic search, RAG systems - using just SQL queries that work across all your data types seamlessly.


Databend: The multi-modal data warehouse - built for the AI era.

Try it today: https://databend.com

The Databend Team

Share this post

Subscribe to our newsletter

Stay informed on feature releases, product roadmap, support, and cloud offerings!