Blog

Databend GIS Upgrade: Spatial Index Now Available, Spatial Query Performance Up to 8.3x Faster

avatarbaishenApr 22, 2026
Databend GIS Upgrade: Spatial Index Now Available, Spatial Query Performance Up to 8.3x Faster

Databend Spatial Index is officially live! Leveraging R-Tree and Hilbert Clustering optimizations, it significantly accelerates range scans and spatial JOINs, delivering up to an 8.3x performance boost. This release completes a vital piece of the puzzle for large-scale GIS analytics, providing powerful native support for LBS, logistics, and IoT sectors.

When working on GIS, LBS, logistics tracking, or IoT spatial analysis, everyone runs into the same problem:

Being able to store spatial data doesn't mean spatial queries are fast enough.

Especially as data volume grows, range filtering, proximity lookups, and spatial JOINs tend to slow down quickly. Without a dedicated spatial index, even the most powerful spatial functions struggle to support real production workloads.

Databend now officially ships spatial index support, providing native acceleration for large-scale spatial data queries. Built on the classic and efficient R-Tree structure, Databend can index

Geometry
type data to significantly improve the execution performance of spatial filtering, region retrieval, proximity queries, and spatial JOINs.

In the SpatialBenchmark standard test suite, Databend spatial index delivers up to 8.3x performance improvement in typical scenarios.


Why Do GIS Workloads Require Spatial Indexes?

Spatial data is fundamentally different from ordinary structured data.

In practice, the most common operations are not simple equality lookups, but rather:

  • Querying objects within a given region
  • Querying objects within a certain distance of a point
  • Determining whether two spatial objects intersect or contain each other
  • Performing spatial join analysis across two spatial tables

Without index support, these queries typically fall back to full table scans. The result:

  • Slow queries: Response time climbs noticeably as data volume increases
  • High resource consumption: Spatial computation drives up CPU and I/O usage
  • Unable to support real-time workloads: Nearby search, trajectory analysis, and spatial joins can't run at low latency

Spatial indexes are not a nice-to-have — they are the critical step that takes GIS capability from "usable" to "production-ready."


Databend GIS: Filling the Critical Gap

Databend has long natively supported both

GEOMETRY
and
GEOGRAPHY
spatial data types, along with a comprehensive set of spatial functions.

For example:

ST_GeomFromText
parses WKT text into geometry objects;
ST_Distance
computes the distance between geometries;
ST_Area
calculates the area of polygon objects;
ST_Intersects
checks spatial intersection; and
ST_Transform
converts geometries to a specified spatial reference system — covering the most common spatial processing needs.

These capabilities already support spatial data storage, parsing, transformation, and computation.

The newly released spatial index fills the remaining gap in Databend's GIS query performance, making large-scale spatial analysis genuinely viable in production.


How Does the Databend Spatial Index Accelerate Queries?

The core design of Databend's spatial index can be summarized in three layers:

1. Foundation: Bounding Box

BBox (Bounding Box, or minimum bounding rectangle) is the fundamental data unit of Databend's spatial index. It uses a compact and efficient four-dimensional coordinate structure to represent a rectangular extent in 2D space — lightweight to compute and ideal for high-speed spatial filtering. A standard BBox consists of four double-precision floating-point values representing the left, bottom, right, and top boundary extremes of a geometry in the 2D plane, in the fixed format:

(minX, minY, maxX, maxY)
.

Whether the original geometry is a Point, LineString, or Polygon, this unified structure reduces spatial relationship checks between complex geometries to simple rectangle intersection and containment tests — requiring only a few numeric comparisons, at a fraction of the cost of operating directly on complex geometries.

2. Spatial Index Filtering

Databend's spatial index uses a two-level filtering mechanism, combining coarse-grained Block-level filtering with fine-grained R-Tree index filtering to accelerate spatial queries efficiently.

Databend maintains BBox-based spatial statistics for each data block, recording the overall bounding rectangle of all geometry objects within that block, fully covering all spatial data in the block. When executing a spatial query, the system first performs a fast intersection check between the query geometry and each block's BBox statistics, immediately eliminating completely irrelevant blocks — without reading any index or data files, completing coarse-grained filtering at minimal cost.

After the coarse pass, remaining blocks undergo fine-grained precise matching. Each block builds an R-Tree index over the BBoxes of all its geometry objects. R-Tree is a classic spatial index structure, organized similarly to a B-Tree — balanced, hierarchical, and ordered — enabling efficient spatial data lookup and filtering. Through this "coarse block filter first, fine index match second" two-level mechanism, Databend's spatial index significantly reduces I/O and computation overhead at scale, greatly improving spatial query efficiency.

3. Hilbert Clustering for Optimal Spatial Data Layout

For the spatial index to achieve maximum filtering efficiency, geographically nearby data must be co-located in the same data block as much as possible. If spatial data is randomly and evenly distributed across blocks, each block's BBox statistics will heavily overlap with similar extents, rendering index filtering ineffective.

Databend supports combining

CLUSTER BY
with the
ST_HILBERT(...)
function to physically cluster spatial data for storage optimization: the
ST_HILBERT
function maps 2D geographic coordinates to a 1D ordered encoding, so geographically adjacent data generates consecutive encoding values.
CLUSTER BY
then groups data by these values into the same block, co-locating spatially nearby records, making each block's BBox more compact and filtering more precise. For large-scale spatial tables, this is critically important.


Which Query Scenarios Benefit Directly?

Databend's spatial index currently auto-accelerates the following 4 core functions:

  • ST_Intersects
    — determines whether two geometry objects have an intersecting relationship; the most commonly used spatial filter function
  • ST_Contains
    — determines whether one geometry object completely contains another
  • ST_Within
    — determines whether one geometry object is completely inside another; the inverse of
    ST_Contains
  • ST_DWithin
    — determines whether the distance between two geometry objects is less than a specified threshold; commonly used for proximity search

These functions cover the two most common categories of GIS queries.

Scenario 1: Spatial Filtering in WHERE Clauses

Examples:

  • Query stores, vehicles, or devices within a given region
  • Query trajectory points within a certain radius of a point
  • Query objects that fall inside a geofence

These queries are the foundational capability for LBS, local services, IoT, and similar workloads.

Scenario 2: Spatial JOIN Analysis

Examples:

  • Join order locations with service areas
  • Match trajectory points against administrative boundaries
  • Analyze device positions against building footprints or campus boundaries

In spatial JOIN scenarios, Databend's optimizer automatically leverages the index and runtime filters to reduce unnecessary data processing and improve large-scale spatial join efficiency.


How to Use Databend Spatial Index

Databend supports defining a spatial index directly at table creation time, and it can be combined with

CLUSTER BY ST_HILBERT(...)
.

Example:

CREATE TABLE trip (
t_tripkey INT64,
t_pickuploc GEOMETRY,
t_dropoffloc GEOMETRY,
SPATIAL INDEX idx_trip(t_pickuploc, t_dropoffloc)
)
CLUSTER BY (
st_hilbert(t_pickuploc, [-180, -90, 180, 90]),
st_hilbert(t_dropoffloc, [-180, -90, 180, 90])
);

After loading data, run the following SQL to apply reclustering:

ALTER TABLE trip RECLUSTER FINAL;

This optimizes the block layout for spatial filtering.

Once defined, common queries will automatically hit the spatial index.

For example, query trip pickup points within a specified geographic polygon for region-based spatial filtering:

SELECT t_tripkey, t_pickuploc
FROM trip
WHERE ST_Within(
t_pickuploc,
TO_GEOMETRY('POLYGON((-124 37, -124 38, -122 38, -122 37, -124 37))')
);

Query trip origins within a certain distance of a given point, suitable for LBS proximity search:

SELECT t_tripkey,
ST_Distance(t_pickuploc, TO_GEOMETRY('POINT(-122.4 37.7)')) AS distance
FROM trip
WHERE ST_DWithin(
t_pickuploc,
TO_GEOMETRY('POINT(-122.4 37.7)'),
0.05
)
ORDER BY distance ASC
LIMIT 5;

Query all trips with pickup locations inside building boundaries for high-precision spatial matching:

SELECT b.b_name, t.t_tripkey
FROM building b
JOIN trip t
ON ST_Intersects(t.t_pickuploc, b.b_boundary);

How Much Performance Improvement?

To validate the spatial index, we ran comparative tests using the SpatialBenchmark standard dataset.

The results are clear: Databend spatial index delivers significant acceleration across typical GIS queries.

1. Nearby Trip Query: 5.5x Faster

Proximity query using

ST_DWithin
:

  • Without index: 1.328 seconds
  • With spatial index: 0.243 seconds
  • Improvement: 5.5x

2. Regional Trip Aggregation: 6.6x Faster

Spatial range filtering combined with aggregation:

  • Without index: 2.445 seconds
  • With spatial index: 0.368 seconds
  • Improvement: 6.6x

3. Spatial JOIN Aggregation: 8.3x Faster

Complex multi-table spatial join:

  • Without index: 2315.718 seconds
  • With spatial index: 279.571 seconds
  • Improvement: 8.3x

The gains are especially pronounced in complex spatial JOIN scenarios. This means Databend is not just capable of simple spatial filtering — it can handle large-scale spatial analysis workloads.


Business Scenarios That Benefit Directly

With spatial index now available, Databend is better positioned to support the following typical scenarios:

LBS and Local Services

  • Nearby store search
  • Service area matching
  • Location-based recommendations

Logistics and Trajectory Analysis

  • Vehicle trajectory point-in-region checks
  • Route range analysis
  • Delivery and regional correlation statistics

IoT and Spatiotemporal Analysis

  • Device geofence alerting
  • Sensor spatial aggregation analysis
  • Regional event statistics

GIS and Spatial Data Platforms

  • Administrative boundary joins
  • Building, road, and region analysis
  • Multi-source spatial data fusion and computation

Current Limitations

Databend spatial index currently supports the Geometry type.

The Geography type does not yet support index acceleration. If needed, convert to Geometry first before querying with index optimization.

In other words, Databend is currently especially well-suited for:

  • Spatial workloads based on planar or projected coordinates
  • Large-scale Geometry data filtering and join analysis
  • Scenarios that combine OLAP analytics with GIS queries

Closing Thoughts

The real challenge of spatial data analysis is not just "can it compute" — it's "can it still compute fast at scale."

The release of Databend spatial index fills the critical gap in its GIS capability: not only does it support spatial data types and spatial functions, it now delivers high-performance query execution for massive spatial datasets.

For LBS, logistics, IoT, and urban governance workloads, this means Databend can take on an integrated role — from spatial data storage all the way through analytical querying.

If you're looking for a data platform that handles both modern analytics and GIS query workloads, Databend's spatial capabilities are worth a closer look.

Upgrade today and experience Databend spatial index — build faster, more efficient spatial data analytics applications.

准备好体验 Table Branching 与空间索引了吗?

Get started in minutes with Databend Cloud—the agent-ready data warehouse for analytics, search, AI, and Python Sandbox—and receive $200 in free credits.

Share this post

Subscribe to our newsletter

Stay informed on feature releases, product roadmap, support, and cloud offerings!