Learning spatial SQL with PostgreSQL and PostGIS involves understanding both standard SQL and the specialized spatial extensions provided by PostGIS. This guide outlines the most important SQL terms, actions, and concepts you'll need to master for effective spatial data management and analysis.
Before diving into spatial-specific features, a solid understanding of core SQL is essential:
INTEGER
, TEXT
, DATE
, BOOLEAN
, and JSON
. These are used to store non-spatial attributes associated with your spatial data.CREATE TABLE
: Used to define new tables, including specifying column names, data types, and constraints. Example:
CREATE TABLE cities (
id SERIAL PRIMARY KEY,
name TEXT,
population INTEGER
);
ALTER TABLE
: Used to modify existing table structures, such as adding or removing columns.DROP TABLE
: Used to delete tables.INSERT
: Used to add new rows of data into a table. Example:
INSERT INTO cities (name, population) VALUES ('San Francisco', 870000);
UPDATE
: Used to modify existing data within a table. Example:
UPDATE cities SET population = 900000 WHERE name = 'San Francisco';
DELETE
: Used to remove rows from a table. Example:
DELETE FROM cities WHERE population < 100000;
SELECT
: Used to retrieve data from one or more tables. Example:
SELECT * FROM cities;
WHERE
: Used to filter data based on specific conditions. Example:
SELECT name, population FROM cities WHERE population > 500000;
GROUP BY
: Used to group rows that have the same values in specified columns into summary rows.ORDER BY
: Used to sort the result set in ascending or descending order.JOIN
: Used to combine rows from two or more tables based on a related column.CREATE INDEX
to improve query performance, especially on frequently queried columns.PostGIS introduces specialized data types for handling spatial data:
GEOMETRY
: Represents planar spatial data, such as points, lines, and polygons. This is the most common type for general spatial data. Example:
CREATE TABLE points (
id SERIAL PRIMARY KEY,
geom GEOMETRY(Point, 4326)
);
GEOGRAPHY
: Represents geodetic data, such as latitude and longitude on a spherical model of the Earth. This is more accurate for large-scale operations. Example:
CREATE TABLE cities (
id SERIAL PRIMARY KEY,
location GEOGRAPHY(Point, 4326)
);
BOX
: Represents a bounding box, useful for spatial indexing and preliminary filtering of spatial data.GEOMETRYCOLLECTION
: Represents a collection of different geometry types.PostGIS provides a rich set of functions for analyzing and manipulating spatial data. Here are some of the most important:
ST_Point(longitude, latitude, SRID)
: Creates a point geometry. Example:
SELECT ST_Point(-71.060316, 48.432044, 4326);
ST_MakePoint(longitude, latitude, SRID)
: An alternative to ST_Point
.ST_LineString(point1, point2, ...)
: Creates a line string geometry.ST_Polygon(linestring)
: Creates a polygon geometry from a closed linestring.ST_GeomFromText(WKT, SRID)
: Creates a geometry from Well-Known Text (WKT). Example:
SELECT ST_GeomFromText('POINT(-0.138702 51.501220)', 4326);
ST_GeographyFromText(WKT, SRID)
: Creates a geography from Well-Known Text (WKT).ST_Intersects(geometry1, geometry2)
: Checks if two geometries intersect. Example:
SELECT ST_Intersects(geom1, geom2) FROM spatial_table;
ST_DWithin(geometry1, geometry2, distance)
: Checks if two geometries are within a specified distance of each other. Example:
SELECT geom FROM geom_table WHERE ST_DWithin(geom, 'SRID=312;POINT(100000 200000)', 100);
ST_Contains(geometry1, geometry2)
: Checks if geometry1 completely contains geometry2. Example:
SELECT m.name, sum(ST_Length(r.geom))/1000 as roads_km FROM bc_roads AS r JOIN bc_municipality AS m ON ST_Contains(m.geom, r.geom) GROUP BY m.name;
ST_Within(geometry1, geometry2)
: Checks if geometry1 is completely within geometry2.ST_Crosses(geometry1, geometry2)
: Checks if two geometries cross.ST_Disjoint(geometry1, geometry2)
: Checks if two geometries are disjoint (do not intersect).ST_Equals(geometry1, geometry2)
: Checks if two geometries are spatially equal.ST_Overlaps(geometry1, geometry2)
: Checks if two geometries overlap.ST_Touches(geometry1, geometry2)
: Checks if two geometries touch.ST_Distance(geometry1, geometry2)
: Calculates the distance between two geometries. Example:
SELECT ST_Distance(
ST_GeomFromText('POINT(-72.1235 42.3521)', 4326),
ST_GeomFromText('POINT(-72.1260 42.45)', 4326)
);
PostGIS Documentation on ST_Distance
ST_Area(geometry)
: Calculates the area of a polygon. Example:
SELECT name, ST_Area(geom)/10000 AS hectares FROM bc_municipality ORDER BY hectares DESC LIMIT 1;
ST_Length(geometry)
: Calculates the length of a line.ST_Buffer(geometry, distance)
: Creates a buffer around a geometry. Example:
SELECT ST_Buffer(geom, 10) FROM spatial_table;
ST_Union(geometry1, geometry2, ...)
: Combines multiple geometries into one.ST_Transform(geometry, SRID)
: Transforms a geometry from one spatial reference system to another. Example:
SELECT ST_Transform(geom, 3857) FROM spatial_table;
ST_Simplify(geometry, tolerance)
: Simplifies a geometry by reducing the number of vertices.ST_Intersection(geometry1, geometry2)
: Returns the intersection of two geometries.ST_Difference(geometry1, geometry2)
: Returns the geometry difference between two geometries.ST_Split(geometry1, geometry2)
: Splits a geometry by another geometry.ST_Collect(geometry1, geometry2, ...)
: Aggregates geometries into a single geometry collection.ST_SetSRID(geometry, SRID)
: Assigns an SRID to a geometry.Spatial indexes are crucial for optimizing the performance of spatial queries:
CREATE INDEX idx_geom ON spatial_table USING GIST(geom);
Spatial queries combine standard SQL with PostGIS functions to analyze spatial relationships:
SELECT * FROM spatial_table WHERE ST_Intersects(geom, ST_Buffer(ST_MakePoint(-71.060316, 48.432044, 4326), 1000));
SELECT a.id, b.id
FROM table_a a, table_b b
WHERE ST_Intersects(a.geom, b.geom);
SELECT * FROM spatial_table WHERE ST_DWithin(geom, ST_Point(longitude, latitude, 4326), distance);
SELECT id FROM polygons WHERE ST_Contains(geom, ST_SetSRID(ST_Point(lon, lat), 4326));
Optimizing spatial queries is essential for efficient data analysis:
EXPLAIN ANALYZE
to understand query execution plans and identify bottlenecks. Example:
EXPLAIN ANALYZE SELECT * FROM spatial_table WHERE ST_Intersects(geom, ST_Buffer(ST_MakePoint(-71.060316, 48.432044, 4326), 1000));
ST_Simplify
to reduce the complexity of geometries.CLUSTER
to reorder physical rows according to an index.VACUUM
to clean up dead tuples and ANALYZE
to update statistics used by the query planner.Explore these advanced topics as you become more comfortable with PostGIS:
ST_Value
(retrieves the value of a raster at a specific point) and ST_AsRaster
(converts geometries to raster format).
Z
(elevation) and M
(measure) dimensions.PostGIS_Tiger_Geocoder
for geocoding and PostGIS Topology
for maintaining topological relationships.Engage with the community to learn best practices and solve problems:
By focusing on these key terms and actions, you'll build a strong foundation in PostgreSQL with PostGIS and be able to effectively handle spatial data and analysis. Remember to practice regularly and explore real-world datasets to deepen your knowledge.