Hey there, Databend community! 🎉
June has been absolutely massive for us - and I mean that in the best possible way. We've been heads-down building some seriously powerful features that I think you're going to love. The big story this month? Enterprise-grade audit capabilities that'll make your compliance teams very, very happy.
By the Numbers
This month we shipped 45+ new features, fixed 30+ bugs, and delivered 15+ performance optimizations along with 30+ other improvements. But honestly, the raw numbers don't tell the whole story - the quality and impact of these changes is what really gets me excited.
Monthly Highlights Summary
🔥 Major New Features
- Comprehensive audit trail system with access_history, login_history, and query_history tables
- Enhanced Decimal64 support for better precision in financial calculations
- Runtime filters in shuffle joins - significantly faster complex analytical queries
- Automatic CTE materialization - optimizer now intelligently caches common table expressions
- ASOF joins - perfect for time-series analysis and event correlation
- Python UDF improvements with imports and packages support
- Streaming load enhancements with placeholder support and better syntax
🛠 Developer Experience
- New SQL functions: ,
regexp_split_to_table
,bool_and
,bool_or
,age
, and moretrunc
- "REPORT ISSUE" syntax for quick error debugging and troubleshooting
- Better workload management with max_concurrency quotas for workload groups
- ZIP compression support for improved storage efficiency
⚡ Performance & Infrastructure
- Meta-service improvements with better RPC error handling and observability
- Query pipeline refactoring using graphs for more efficient execution
- Operator caching for faster repeated operations
- Enhanced memory management and spill handling for large datasets
🐛 Stability Improvements
- 30+ bug fixes across query execution, storage, and meta-service components
- Better error handling for edge cases in aggregation and join operations
- Improved transaction handling for temporary tables and concurrent operations
🚀 AI & Vector Capabilities (Preview)
- Vector data type support - laying the foundation for AI-powered applications
- HNSW indexing - early implementation for similarity search
- 🚧 Note: These features are in active development and not yet production-ready
What's New and Game-Changing
🔐 Enterprise Audit Trail - The Star of the Show
Alright, let's talk about the elephant in the room - or should I say, the feature that's going to make your security and compliance teams do a little happy dance. We've built a comprehensive audit trail system that automatically tracks everything happening in your database.
This isn't just "we log some stuff" - this is enterprise-grade, compliance-ready, full-visibility auditing that works out of the box.
What Gets Tracked:
- Every single query - who ran what, when, and how long it took
- All data access - which tables, columns, and files were touched
- Authentication events - successful logins, failed attempts, the works
- Schema changes - DDL operations with full before/after details
- System events - because sometimes you need to know what happened under the hood
📊 The Five Pillars of Audit Excellence
We've organized everything into five specialized history tables, each designed for specific use cases:
Table | What It Does | Why You Need It |
---|---|---|
| Complete SQL execution audit | Performance analysis, compliance tracking, usage monitoring |
| Data access and modification logs | Data lineage, compliance reporting, change management |
| User authentication tracking | Security auditing, failed login monitoring |
| Detailed query execution profiles | Performance optimization, resource planning |
| Raw system logs and events | System troubleshooting, operational monitoring |
🚨 Real-World Security Scenarios
Let me show you how this actually works in practice:
Catching Suspicious Activity:
-- Spot those failed login attempts that might indicate trouble
SELECT event_time, user_name, client_ip, error_message
FROM system_history.login_history
WHERE event_type = 'LoginFailed'
ORDER BY event_time DESC;
Compliance Reporting Made Easy:
-- Track who accessed sensitive customer data this week
SELECT query_id, query_start, user_name, base_objects_accessed
FROM system_history.access_history
WHERE base_objects_accessed LIKE '%customer_data%'
AND query_start >= TODAY() - INTERVAL 7 DAY;
Change Management Tracking:
-- Monitor all schema changes with full details
SELECT query_id, query_start, user_name, object_modified_by_ddl
FROM system_history.access_history
WHERE object_modified_by_ddl != '[]'
ORDER BY query_start DESC;
Complete Query Audit Trail:
-- Get comprehensive query execution details
SELECT query_id, sql_user, query_text, query_start_time,
query_duration_ms, client_address
FROM system_history.query_history
WHERE event_date >= TODAY() - INTERVAL 7 DAY
ORDER BY query_start_time DESC;
The best part? This all happens automatically. No complex setup, no performance impact on your regular queries, no forgetting to enable logging for that one critical table.
📋 Audit Use Cases That Matter
Security Monitoring:
- Track failed login attempts to identify potential security threats
- Monitor unusual access patterns or unauthorized data access attempts
- Investigate security incidents with complete authentication history
Compliance Reporting:
- Maintain complete audit trails for regulatory requirements (SOX, GDPR, HIPAA)
- Track who accessed what data and when for data governance
- Monitor DDL operations for change management compliance
Operational Intelligence:
- Analyze query performance and resource usage patterns
- Identify optimization opportunities and bottlenecks
- Monitor database activity for capacity planning
Configuration That Actually Makes Sense
Databend Cloud
✅ Automatically enabled - All system history tables are ready to use without any configuration.
Self-Hosted Databend
📝 Manual configuration required - To enable system history tables, you must configure all 5 tables in your `databend-query.toml`. You'll need to specify each table name with optional retention settings (default: 7 days).
Optional Custom Storage: By default, history tables use your main database storage. You can optionally configure separate S3 storage for audit data.
For complete configuration details and examples, see the System History Tables documentation.
Access Control & Security
We've built robust security into the audit system. System history tables are protected against unauthorized modifications - users can only SELECT or DROP them, never ALTER. To query audit data, users need appropriate SELECT permissions:
-- Example: Create an audit role for compliance team
CREATE ROLE audit_team;
GRANT SELECT ON system_history.* TO ROLE audit_team;
CREATE USER compliance_officer IDENTIFIED BY 'secure_password' WITH DEFAULT_ROLE='audit_team';
GRANT ROLE audit_team TO USER compliance_officer;
Looking Forward
June has been transformative for Databend in multiple ways. Our enterprise audit capabilities represent a major milestone for compliance-focused organizations, while our ongoing AI vector development signals our commitment to the future of data analytics.
We're building the unified data platform of tomorrow. Databend already excels at massive structured data workloads and offers best-in-class semi-structured data analytics - from JSON schema auto-detection to intelligent indexing that makes complex nested queries lightning-fast. But we're not stopping there.
The next frontier is unstructured data. We're actively developing capabilities to handle text, video, audio, and other unstructured formats natively within Databend. Imagine being able to query your video content for specific scenes, analyze audio transcripts, or perform sentiment analysis on text documents - all using familiar SQL syntax, all within the same platform that handles your traditional analytics.
This isn't just about adding features - it's about creating a truly unified data experience. No more ETL pipelines between different systems for different data types. No more choosing between analytics performance and AI capabilities. One platform, one query language, all your data.
We're building for the AI era. Traditional data warehouses handle historical analysis beautifully, but the next generation of applications needs to seamlessly blend structured analytics with AI workloads. Our investment in vector capabilities and unstructured data processing isn't just about following trends - it's about creating a platform where SQL becomes the universal language for all data types.
The response from production users has been fantastic, and it's driving our roadmap for the rest of 2025.
What's Next?
We're continuing to expand both our audit capabilities and AI foundations. Check out our progress at https://github.com/databendlabs/databend - whether you're looking for enterprise compliance features today or want to prepare for a future where all your data - structured, semi-structured, and unstructured - lives in one intelligent platform, we're building exactly that.
Thanks for being part of this journey! 🚀
The Databend Team
Subscribe to our newsletter
Stay informed on feature releases, product roadmap, support, and cloud offerings!