Hey Databend community! 🚀
August was all about vector indexing. We delivered 23x faster similarity search with HNSW acceleration, making AI workloads production-ready. Combined with our existing structured and JSON capabilities, Databend is now a complete multi-modal data warehouse.
By the Numbers
15+ new features, 20+ bug fixes, 15+ performance optimizations. The highlight: vector indexing that makes AI applications blazing fast.
Monthly Highlights
🔥 Major Features
- Vector Indexing with HNSW - 23x faster similarity search
- Time slice functions - Advanced temporal data analysis
- Enhanced JSON5 parsing - More flexible JSON processing
⚡ Performance & Stability
- Stack overflow prevention - Fixed CTE and PhysicalPlan recursion issues
- Memory management - Structured spill configuration for large operations
- Meta service optimization - 40% reduction in service pressure
- Vector index reliability - Fixed data loss during refresh operations
Game-Changing: Vector Search
📊 23x Performance Improvement
Before:
SELECT title, cosine_distance(embedding, :query) as score
FROM documents ORDER BY score LIMIT 10;
-- 8.2 seconds, full table scan
After (with HNSW index):
-- Same query, same results
SELECT title, cosine_distance(embedding, :query) as score
FROM documents ORDER BY score LIMIT 10;
-- 0.35 seconds, index-accelerated
Vector Search Made Simple
Getting started with production-ready semantic search:
1. Create Table with Vector Index
CREATE TABLE products (
id INT,
name VARCHAR,
embedding VECTOR(1024),
-- Automatic 23x speedup with HNSW index
VECTOR INDEX idx(embedding) distance='cosine'
);
2. Insert Data
INSERT INTO products VALUES
(1, 'Wireless Headphones', [0.1, 0.2, ...]::VECTOR(1024)),
(2, 'Bluetooth Speaker', [0.3, 0.1, ...]::VECTOR(1024));
3. Search
-- Find similar products instantly
SELECT name FROM products
ORDER BY cosine_distance(embedding, :search_vector)
LIMIT 5;
That's it. Sub-second similarity search for millions of vectors.
Supported Distance Functions
- Cosine - Best for text/semantic similarity
- L2 - Best for image/visual similarity
- L1 - Best for feature comparison
What This Enables: Multi-Modal Data Warehouse
Traditional approach: Multiple separate systems
- Structured data → PostgreSQL
- JSON documents → Elasticsearch
- Vector embeddings → Pinecone
- Complex data pipelines to connect everything
Databend approach: One platform for everything
- Structured data - World-class columnar analytics
- JSON documents - 3x faster with automatic virtual columns (July)
- Vector embeddings - 23x faster with HNSW indexing (August)
All in one SQL platform. No data movement. No system integration complexity.
This means you can build complete AI applications - recommendation engines, semantic search, RAG systems - using just SQL queries that work across all your data types seamlessly.
Databend: The multi-modal data warehouse - built for the AI era.
Try it today: https://databend.com
The Databend Team
Subscribe to our newsletter
Stay informed on feature releases, product roadmap, support, and cloud offerings!