PostGIS
I plan on using PostGIS to manage geography objects in PostgreSQL, so I want to review the docs here to learn best practices and more about the extension.
References
- PostGIS Documentation
- Geospatial Data and GeoJSON
- QGIS
- A desktop GIS viewer/editor for quickly looking at geographic data.
- Simple Features for SQL
- A model that is a non-topological way to store geospatial data in a database
- The GeoJSON Specification
- Link to Census Database Files
Review of Geospatial Data
Things to know about geospatial data:
- Geospatial data consists of two components, location and attribute.
- When you use Google Maps to search for a restaurant, you see a red marker on the screen that points to its location in latitude and longitude coordinates, such as
41.7620891, -72.6856295
. Attributes include additional information such as the restaurant name, its human-friendly street address, and guest review comments. All of these attributes add value to your location data.
- When you use Google Maps to search for a restaurant, you see a red marker on the screen that points to its location in latitude and longitude coordinates, such as
- Geospatial data can be raster or vector.
- In digital maps, raster data often appears as satellite and aerial images, and the quality depends on the resolution of the camera that captured them. Raster data is like a grid of cells.
- Vector data appears on digital maps as pictorial images of buildings, rivers, and regions. Vector maps can be created by humans or algorithms when they draw points, polylines, and polygons from raster satellite or aerial images, or from devices such as GPS trackers that record runs or hikes, or from other sources.
- Unlike raster maps, vector maps remain sharply focused at any zoom level, because every point and line is represented by latitude and longitude coordinates, which can be expressed with precise decimals.
- While raster data is generally limited to one value per cell (such as color for traditional satellite images, or height above sea level for digital elevation models), vector data can contain multiple attributes about each object (such as its name, street address, and comments).
- Moreover, vector map files tend to be smaller in size than raster ones.
Common Vector Data File Formats
GeoJSON
- GeoJSON is a popular map fata format, based on open-standard created in 2016, with file extensions that end with
.geojson
or.json
- It is strongly recommended that you create and edit map data in GeoJSON format. Storing and sharing your geospatial data in GeoJSON ensures that you and others will be able to use the file without installing bulky or expensive GIS desktop applications.
- In GeoJSON format, coordinates are ordered in longitude-latitude format, the same as X-Y coordinates in mathematics. Bu this is the opposite of Google Maps and some other web map tools, which place coordinate values in latitude-longitude format.
Shapefiles
- The shapefile format was created in the 1990s by Ersi, the company that develops ArcGIS software. Shapefiles typically appear in a folder of files with extensions such as
.shp
,.shx
, and.dbf
, and the folder may be compressed into a.zip
file. - Government agencies commonly distribute map data in shapefile format.
GPS Exchange Format (GPX)
- GPX (
.gpx
files) is an open standard and is based on the XML markup langauge. Like GeoJSON, you can inspect the contents of a GPX file in any simple text editor. A GPX file most likely contains a collection of timestamps and latitude/longitude coordinates.
Keyhole Markup Language (or KML)
- The KML format rose in popularity during the late 2000s, when it was developed for Google Earth. It was drooped by Google in late 2019. Sometimes
kml
formats are compressed tokmz
format.
MapInfo TAB
- The proprietary TAB format is created and supported by MapInfo and is designed to work well with MapInfo Pro GIS software.
Notes
PostGIS extends the capabilities of the PostgreSQL relational database by adding support for storing, indexing, and querying geospatial data.
PostGIS features include:
Spatial Data Storage: Store different types of spatial data such as points, lines, polygons, and multi-geometries, in both 2D and 3D data
Spatial Indexing: Quickly search and retrieve spatial data based on its location.
Spatial Functions: A wise range of spatial functions that allow you to filter and analyze spatial data, measuring distances and area, intersecting geometries, buffering, and more.
Geometry Processing: Tools for processing and manipulating geometry data, such as simplification, conversion, and generalization
Raster Data Support: Storage and processing of raster data, such as elevation data and weather data
Geocoding and Reverse Geocoding: Functions for geocoding and reverse geocoding
Integration: Access and work with PostGIS using third party tools such as QGIS, GeoServer, MapServer, ArcGIS, Tableau
Installation and Upgrade
- AWS RDS databases already have PostGIS installed, you just need to enable it in your database:
CREATE EXTENSION postgis;
- You can check what version of PostGIS you have installed and what versions are available with the following commands:
SELECT PostGIS_Full_Version();
SELECT * FROM pg_available_extensions WHERE name = 'postgis';
- Update PostGIS:
SELECT postgis_extensions_upgrade();
PostGIS Workshop
Introduction
What is a Spatial Database
- PostGIS is a spatial database. Spatial databases store and manipulate spatial objects like any other object in the database.
- Three aspects that associate spatial data with a database:
- Spatial data types refer to shapes such as point, line, and polygon
- Multi-dimensional spatial indexing is used for efficient processing of spatial operations
- Spatial functions, posed in SQL, are for querying of spatial properties and relationships
Spatial Data Types
- An ordinary database has strings, numbers, and dates. A spatial database adds additional (spatial) types for representing geographic features. These spatial data types abstract and encapsulate spatial structures such as boundary and dimensions. In many respects, spatial data types can be understood simply as shapes.
- In the diagram above, each sub-type inherits the structure (attributes) and the behavior (methods of functions) of its super-type.
Spatial Indexes and Bounding Boxes
- Real spatial databases provide a "spatial index" that answers the question
which objects are within this particular bounding box?
- A bounding box is the smallest rectangle - parallel to the coordinate axes - capable of containing a given feature.
- Indexes have to perform quickly in order to be useful. Instead of providing exact results, spatial indexes provide approximate results.
- The actual spatial indexes implemented by various databases vary widely. The most common implementations are the R-Tree and Quadtree (used in PostGIS), but there are also grid-based indexes and GeoHash Indexes implemented in other spatial indexes.
Spatial Functions
- For manipulating data during a query, an ordinary database provides functions such as concatenating strings, performing hash operations on strings, doing mathematics on numbers, and extracting information from dates.
- A spatial database provides a complete set of functions for analyzing geometric components, determining spatial relationships, and manipulating geometries. These spatial functions serve as the building box for any spatial project.
- The majority of spatial functions can be grouped into the following categories:
- Conversion: Functions that convert between geometries and external data formats.
- Management: Functions that manage information about spatial tables and PostGIS administration.
- Retrieval: Functions that retrieve properties and measurements of a Geometry.
- Comparison: Functions that compare two geometries with respect to their spatial relation.
- Generation: Functions that generate new geometries from others.
- PostGIS turns the PostgreSQL Database Management System into a spatial database by adding support for the three features: spatial types, spatial indexes, and spatial functions.
Loading Data
- A
shapefile
commonly refers to a collection of files with.shp
,.shx
,.dbf
, and other extensions on a common prefix name (e.g.,nyc_census_blocks
). The actual shapefile relates specifically to files with the.shp
extension. However, the.shp
file alone is incomplete for distribution without the required supporting files. - Mandatory files:
.shp
- shape format; the feature geometry itself.shx
- shape index format; a positional index of the feature geometry.dbf
- attribute format; columnar attributes for each shape, in dBase III
- Optional files:
.prj
- projection format; the coordinate system and projection information, a plain text file describing the projection using well-known text format
- The
shp2pgsql
converts Shape files into SQL. - An
SRID
stands forSpatial Reference Identifier
. It defines all the parameters of our data's geographic coordinate system and projection. An SRID is convenient because it packs all the information about a map projection (which can be quite complex) into a single number.
Geometries
Metadata Tables
- In conformance with the Simple Features for SQL (SFSQL) specification, PostGIS provides two tables to track and report on the geometry types available in a given database.
- The first table,
spatial_ref_sys
, defines all the spatial reference systems known to the database and will be described in greater detail later. - The second table (actually, a view),
geometry_columns
, provides a listing of all "features" (defined as an object with geometric attributes), and the basic details of those features.
- The first table,
-- By querying this table, GIS clients and libraries can determine what to expect when retrieving data and
SELECT * FROM geometry_columns;
|
|
|
|
|
|
|
---|---|---|---|---|---|---|
postgres | public | community_location_filter | geojson | 2 | 0 | GEOMETRY |
postgres | tiger | county | the_geom | 2 | 4269 | MULTIPOLYGON |
postgres | tiger | state | the_geom | 2 | 4269 | MULTIPOLYGON |
postgres | tiger | place | the_geom | 2 | 4269 | MULTIPOLYGON |
postgres | tiger | cousub | the_geom | 2 | 4269 | MULTIPOLYGON |
postgres | tiger | edges | the_geom | 2 | 4269 | MULTILINESTRING |
postgres | tiger | addrfeat | the_geom | 2 | 4269 | LINESTRING |
postgres | tiger | faces | the_geom | 2 | 4269 | MULTIPOLYGON |
postgres | tiger | zcta5 | the_geom | 2 | 4269 | MULTIPOLYGON |
postgres | tiger | tabblock20 | the_geom | 2 | 4269 | MULTIPOLYGON |
postgres | tiger | tract | the_geom | 2 | 4269 | MULTIPOLYGON |
postgres | tiger | tabblock | the_geom | 2 | 4269 | MULTIPOLYGON |
postgres | tiger | bg | the_geom | 2 | 4269 | MULTIPOLYGON |
postgres | public | census_location | feature_geometry | 2 | 0 | GEOMETRY |
- The
f_table_catalog
,f_table_schema
, andf_table_name
provide the fully qualified name of the feature table containing a given geometry. Because PostgreSQL doesn't make use of catalogs,f_table_catalog
will tend to be empty. f_geometry_column
is the name of the column that geometry containing column - for feature tables with multiple geometry columns, there will be one record for each.coord_dimension
andsrid
define the dimension of the geometry (2-, 3-, or 4-dimensional) and the Spatial Reference system identifier that refers to thespatial_ref_sys
table respectively.- The
type
column defined the type of geometry as described below.
Representing Real World Objects
- The Simple Features for SQL (SFSQL) specification, the original guiding standard for PostGIS development, defines how a real-world object is represented. By taking a continuous shape and digitizing it at a fixed resolution, we achieve a passable representation of the object.
- We can collection information about geometry objects by reading that geometry metadata:
ST_GeometryTyp(geometry) -- returns the type of the geometry
ST_NDims(geometry) -- returns the number of dimensions of the geometry
ST_SRID(geometry) -- returns the spatial reference identifer number of the geometry
Points
- A spatial point represents a single location on the Earth. The point is represented by a single coordinate (either, 2-, 3-, or 4- dimensions). Points are used to represent objects when the exact details, such as shape and size, are not important as the target scale.
- Specific Spatial Functions working with points:
ST_X(geometry) -- returns the X coordinate
ST_Y(geometry) -- returns the Y coordinate
Linestrings
- A linestring is a path between locations. It takes the form of an ordered series of two or points. Roads and rivers are typically represented as linestrings. A linestring is said to be closed if it starts and ends on the same point. It is said to be simple if it does not cross or touch itself (except at its endpoints if it is closed). A linestring can be both closed and simple.
- Specific Spatial Functions working with linestrings:
ST_Length(geometry) -- returns the length of the linestring
ST_StartPoint(geometry) -- returns the first coordinate as a point
ST_EndPoint(geometry) -- returns the last coordinate as a point
ST_NPoints(geometry) -- returns the number of coordinates in the linestring
Polygons
- A polygon is a representation of an area. The outer boundary of the polygon is represented by a ring. This ring is a linestring that is both closed and simple as defined above. Holes within the polygon are also represented by rings.
- Polygons are used to represent objects whose size and shape are important.
- GIS systems are relatively unique in allowing polygons to explicitly have holes.
- Specific Spatial Functions working with polygons:
ST_Area(geometry) -- returns the area of the polygons
ST_NRings(geometry) -- returns the number of rings (usually 1, more if there are holes)
ST_ExteriorRing(geometry) --returns the outer ring as a linestring
ST_InteriorRing(geometry,n) -- returns a specified interior ring as a linestring
ST_Perimeter(geometry) -- returns the length of all the rings
Collections
- There are four collection types, which group multiple simple geometries into sets.
- MultiPoint - a collection of points
- MultiLineString - a collection of linestrings
- MultiPolygon - a collection of polygons
- GeometryCollection - a heterogeneous collection of any geometry (including other collections)
- Collections are another concept that shows up in GIS software more than in generic software. They are useful for directly modeling real world objects as spatial objects.
- Specific Spatial Functions working with collections:
ST_NumGeometries(geometry) -- returns the number of parts in a collection
ST_GeometryN(geometry,n) --returns the specified part
ST_Area(geometry) -- returns the total area of all polygonal parts
ST_Length(geometry) -- returns the total length of ll linear parts
Geometry Input and Output
- Within the database, geometries are stored on disk in a format used by the PostGIS program. In order for external programs to insert and retrieve useful geometries, they need to be converted into a format that other applications can understand. Fortunately, PostGIS supports emitting and consuming geometries in a large number of formats:
- Well-known Text
ST_GeomFromText(text,srid)
returnsgeometry
ST_AsText(geometry)
returnstext
ST_AsEWKT(geometry)
returnstext
- Well-known binary
ST_GeomFromWKB(bytea)
returnsgeometry
ST_AsBinary(geometry)
returnsbytea
ST_AsEWKB(geometry)
returnsbytea
- Geographic Mark-up Language
ST_GeomFromGML(text)
returnsgeometry
ST_AsGML(geometry)
returnstext
- Keyhole Mark-up Language
ST_GeomFromKML(text)
returnsgeometry
ST_AsKML(geometry)
returnstext
- GeoJSON
ST_AsGeoJSON(geometry)
returnstext
- Scalable Vector Graphics
ST_AsSVG(geometry)
returnstext
- Most actual processes prefer WKB as the choice of data format.
- Other Ways to Create Geometries from Well-known Text or Similar Formatted Inputs:
-- Using ST_GeomFromText with the SRID parameter
SELECT ST_GeomFromText('POINT(2 2)',4326);
-- Using ST_GeomFromText without the SRID parameter
SELECT ST_SetSRID(ST_GeomFromText('POINT(2 2)'),4326);
-- Using a ST_Make* function
SELECT ST_SetSRID(ST_MakePoint(2, 2), 4326);
-- Using PostgreSQL casting syntax and ISO WKT
SELECT ST_SetSRID('POINT(2 2)'::geometry, 4326);
-- Using PostgreSQL casting syntax and extended WKT
SELECT 'SRID=4326;POINT(2 2)'::geometry;
Spatial Relationships
- Spatial databases are powerful because they not only store geometry, they also have the ability to compare relationships between geometries.
ST_Equals
- tests the spatial equality of two geometries
- It returns
TRUE
if two geometries have the same type and have identical; coordinate values, i.e. fi the second shape is equal to the first shape
ST_Intersects
, ST_Disjoint
, ST_Crosses
, and ST_Overlaps
- These functions test whether interiors of the geometries intersect
ST_Intersects(geometry A, geometry B)
- returns TRUE if the two shapes have any space in common, i.e. if their boundaries or interiors intersect
ST_Disjoint(geometry A, geometry B)
- if two geometries are disjoint, they do not intersect, and vice-versa. In fact, it is often more efficient to test "not intersects" rather than to test "disjoint" because the intersects tests can be spatially indexed
ST_Crosses(geometry A, geometry B)
- For multipoint/polygon, multipoint/linestring, linestring/linestring, linestring/polygon, and linestring/MultiPolygon comparisons, returns TRUE if the intersection results in a geometry whose dimension is now less than the maximum dimension of the two source geometries and the intersection set is interior to both source geometries
ST_Overlaps(geometry A, geometry B)
- compares two geometries of the same dimension and returns TRUE if their intersection results in a geometry different from both but of the same dimension
ST_Touches
- Tests whether two geometries touch at their boundaries, but do not intersect in their interiors
ST_Touches(geometry A, geometry B)
returnsTRUE
if ether of the geometries' boundaries intersect or if only one of the geometry's interiors intersects the other's boundary
ST_Within
and ST_Contains
ST_Within
andST_Contains
test whether one geometry is fully within the other.
ST_Within(geometry A, geometry B)
returnsTRUE
if the first geometry is completely within the second geometry.ST_Within
tests for the exact opposite result ofST_Contains
ST_Contains(geometry A, geometry B)
returnsTRUE
if the second geometry is completely contained by the first geometry
ST_Distance
and ST_DWithin
- An extremely common GIS question is
find all the stuff within distance X of this other stuff
- The
ST_Distance(geometry A, geometry B)
calculates the shortest distance between two geometries and returns it as a float. This is useful for actually reporting back the distance between objects. - For testing whether two objects are within a distance of one another, the
ST_DWithin
function provides an index-accelerated true/false test. - Spatial joins are the bread and butter of spatial databases. They allow you to combine information from different tables by using spatial relationships such as the join key.
Spatial Indexing
- Indexing speeds up searching by organizing the data into a search tree which can be quickly traversed to find a particular record. They are one of the greatest assets of PostGIS.
- The larger the table, the larger the relative speed improvement of an indexed query will be.
How Spatial Indexes Work
- Standard database indexes create a hierarchal tree based on the values of the column being indexed. Spatial indexes are a little different - they are unable to index the geometric features themselves and instead index the bounding boxes of the features.
The way the database efficiently answers the question “what lines intersect the yellow star” is to first answer the question “what boxes intersect the yellow box” using the index (which is very fast) and then do an exact calculation of “what lines intersect the yellow star” only for those features returned by the first test.
- For a large table, this “two pass” system of evaluating the approximate index first, then carrying out an exact test can radically reduce the amount of calculations necessary to answer a query.
- Both PostGIS and Oracle Spatial share the same "R-Tree" spatial index structure. R-Trees break up data into rectangles, and sub-rectangles, and sub-sub rectangles, etc. It is a self-tuning index structure that automatically handles variable data density, differing amounts of object overlap, and object size.
Index Only Queries
- Most of the commonly used functions in PostGIS (
ST_Contains
,ST_Intersects
,ST_DWithin
, etc.) include an index filter automatically, but some do not. - To do a bounding-box search using the index (and no filtering), make use of the
&&
operator. For geometries, the&&
operator meansbounding box overlap or touch
in the same way that for numbers the=
operator meansvalues are the same
. - Running an indexed query is not always faster.
- It is wise to use the
ANALYZE
command after bulk data loads and deletes in your table. This force the statistics system to gather data for all your indexed columns.
Projecting Data
- The earth is not flat, and there is no simple way of putting it down on a flat paper map. Common to all projects is that they transform the (spherical) world onto a flat Cartesian coordinate system, and which projection to choose depends on how you will be using the data.
- PostGIS includes built-in support for changing the projection of data, using the
ST_Transform(geometry,srid)
function. For managing the spatial reference identifiers on geometries, PostGIS provides theST_SRID(geometry)
andST_SRID(geometry,sid)
functions.
Geography
- Unlike coordinates in Mercator, UTM, or Stateplane, geographic coordinates are not Cartesian coordinates. Geographic coordinates do not represent a linear distance from an origin as plotted on a plane. Rather, these spherical coordinates describe angular coordinates on a globe. In spherical coordinates a point is specified by the angle of rotation from a reference meridian (longitude), and the handle from the equator (latitude).
- All return values from
geography
calculations are in meters. - In order to load geometry data into a geography table, the geometry first needs to be projected into EPSG:4326 (longitude/latitude), then it needs to be changed into geography. The
ST_Transform(geometry,srid)
function converts coordinates to geographics and theGeography(geometry)
function of the::geography
suffixcasts
to geography.
- Geographics are universally accepted coordinates - everyone understands what latitude/longitude mean, but very few people understand what UTM coordinates mean. Why not use geography all the time?
- First, as noted earlier, there are far fewer functions available (right now) that directly support the geography type. You may spend a lot of time working around geography type limitations.
- Second, the calculations on a sphere are computationally far more expensive than Cartesian calculations. For example, the Cartesian formula for distance (Pythagoras) involves one call to sqrt(). The spherical formula for distance (Haversine) involves two sqrt() calls, an arctan() call, four sin() calls and two cos() calls. Trigonometric functions are very costly, and spherical calculations involve a lot of them.
- If your data is geographically compact, use the geometry type with a Cartesian projection that makes sense with your data.
- If you need to measure distance with a dataset that is geographically dispersed, use the geography type.
Geometry Constructing Functions
Geometry constructing functions
take geometries as inputs and output new shapes.ST_Centroid(geometry)
returns a point that is approximately on the center of mass of the input argument.ST_PointOnSurface(geometry)
returns a point that is guaranteed to be inside the input argument.ST_Buffer(geometry,distance)
takes in a buffer distance and geometry type and outputs a polygon with a boundary the buffer distance away from the input geometry.- The
ST_Intersection(geometry A, geometry B)
function returns the spatial area (or line, or point) that both arguments have in common. - The
ST_Union([geometry])
function takes inputs and removes common lines.
Validity
In 90% of the cases the answer to the question,why is my query giving me a 'TypologyException
' errorisone or more of the inputs are invalid. Which begs the question: what does it mean to be invalid, and why should we care?
- Validity is most important for polygons, which define bounded areas and require a good deal of structure. Lines are very simple and cannot be invalid, nor can points.
- Rules for valid polygons"
- Polygon rings must close
- Rings that define holes should be inside rings that define exterior boundaries
- Rings may not self-intersect (they may nether touch nor cross themselves)
- Rings may not touch other rings, except at a point
- Elements of multi-polygons may bot touch each other.
- The
ST_IsValid(geometry)
function tells us whether a polygon is valid. - The
ST_IsValidReason(geometry)
function can tell us the source of the validity.
Dimensionally Extended 9-Intersection Model
- The
Dimensionally Extended 9-Intersection model
(DE9IM) is a framework for modelling how two spatial objects interact. - Every spatial object has:
- An interior
- The interior is the part of the line bounded by the ends for a line. For a polygon this is obvious.
- A boundary
- An exterior
- For a line, the exterior is everything in the plane not within the bounds of the line.
- The relationships between any pair of special features can be characterized using the dimensionality of the nine possible intersections between the interiors/boundaries/exteriors of a pair of objects.
Clustering on Indices
- One way to speed up access to data is to ensure that records which are likely to be retrieved together in the same result are located in similar physical locations on the hard disk platters. This is called
clustering
. - The right clustering scheme to use can be tricky, but a general rule applies: indexes define a natural ordering scheme for data which is similar to the access pattern that will be used in retrieving the data.
- Clustering based on spatial index makes sense for spatial data that is going to be accessed with spatial queries: similar things tend to have similar locations.
-- Cluster the blocks based on their spatial index
CLUSTER nyc_census_blocks USING nyc_census_blocks_geom_idx;
- Clustering is still necessary in cache databases. Keeping records that are
near each other
in spacenear each other
in memory increases the odds that related records will move up the serversmemory cache hierarchy
together, and thus make memory accesses faster.
3D
- PostGIS supports additional dimensions on all geometry types, a
Z
dimension to add height information and aM
dimension for additional dimensional information (commonly time, or road-mile, or upstream-distance information) for each coordinate. - Adding the extra dimension results in three extra possible geometry types for each geometry primitive:
- Point (a 2-D type) is joined by
PointZ
,PointM
andPointZM
types. - Linestring (a 2-D type) is joined by
LinestringZ
,LinestringM
andLinestringZM
types. - Polygon (a 2-D type) is joined by
PolygonZ
,PolygonM
andPolygonZM
types.
- Point (a 2-D type) is joined by
- Functions built to calculate relationships between 3-D objects:
ST_3DClosestPoint
- Returns the 3-dimensional point on g1 that is closest to g2. This is the first point of the 3D shortest line.
ST_3DDistance
- For geometry type Returns the 3-dimensional cartesian minimum distance (based on spatial ref) between two geometries in projected units.
ST_3DDWithin
- For 3d (z) geometry type Returns true if two geometries 3d distance is within number of units.
ST_3DDFullyWithin
- Returns true if all of the 3D geometries are within the specified distance of one another.
ST_3DIntersects
- Returns TRUE if the Geometries “spatially intersect” in 3d - only for points and linestrings
ST_3DLongestLine
- Returns the 3-dimensional longest line between two geometries
ST_3DMaxDistance
- For geometry type Returns the 3-dimensional cartesian maximum distance (based on spatial ref) between two geometries in projected units.
ST_3DShortestLine
- Returns the 3-dimensional shortest line between two geometries
Nearest-Neighbor Searching
- A frequently posed spatial query is:
what is the nearest <candidate feature> to <query feature>?
- Unlike a distance search, the
nearest neighbor
search doesn't include any measurement restricting how far away candidate geometries might be, features of any distance away will be accepted, as long as they are the nearest. - PostgreSQL solves the nearest neighbor problem by introducing an
order by distance
(<->
) operator that induces the database to use an index to speed up a sorted return set. With anorder by distance
operator in place, a nearest neighbor query can return theN nearest features
just by adding an ordering and limiting the result set to N entries. - The order by distance operator works on geography and geometry types.
- Returns the same as
ST_Distance
for geometry - Returns the spherical distance for geography
- Returns the same as
-- Plug the geometry into a nearest-neighbor query
SELECT streets.gid, streets.name,
ST_Transform(streets.geom, 4326),
streets.geom <-> 'SRID=26918;POINT(583571.9 4506714.3)'::geometry AS dist
FROM
nyc_streets streets
ORDER BY
dist
LIMIT 3;
Rasters
- PostGIS supports another kind of spatial data type called a raster. Raster data, like geometry data, uses Cartesian coordinates and a spatial reference system. Raster data is represented as an n-dimensional matrix consisting of pixels and bands. The bands defines the number of matrices you have. Each pixel stores a value corresponding to each band. So a 3-banded raster such as an RGB image, would have 3 values for each pixel corresponding to the Red-Green-Blue bands.
- A raster is a matrix, pinned on a coordinate system, that has values that can represent anything you want them to represent.
- Since rasters live in cartesian space, rasters can interact with geometries. PostGIS offers many functions that take as input both raters and geometries.
Topology
- PostGIS supports SQL/MM SQL-MM 3 Top-Geo and Top-Net 3 specifications via an extension called
postgis_topology
. You can learn about the functions and types provided by the extension in Manual: PostGIS Topology . Thepostgis_topology
extension includes another kind or core spatial type, called a topogeometry. In addition to the topogeometry spatial type, you will find functions for building topologies and populating topologies.
Tracking Edit History Using Triggers
- A common requirement for production databases is the ability to track history: how has the data changed between two dates, who made the changes, and where did they occur? Using the database and the trigger system, it's possible to add history tracking to any table, while maintaining simple
direct edit
access to the primary table.
Basic PostgreSQL Tuning
- In order to ensure the database will run properly for many different environments, the default configuration is very conservative and not terribly appropriate for high-performance production databases. Add the fact that geospatial databases have different usage patterns , and the data tend to consist of fewer, much larger records than non-geospatial databases, and you can see that the default configuration will not be totally appropriate for our purposes.
- All of these configuration parameters can be edited in the
postgresql.conf
database configuration file. This is a regular text file and the changes will not take effect until the server is restarted. - Once changes are made, reload configuration
shared_buffers
- Sets the amount of memory the database server uses for shared memory buffers. They are shared amongst back-end processes.
- Default value: typically 32MB
- Recommended Value: about 75% of database memory up to a max of about 2GB
effective_cache_size
- The
effective_cache_size
is approximately the amount of memory on the machine, less theshared_buffers
, less thework_mem
times the expected number of connections, less any memory required for any other processes running on the machine, less about 1GB for other random operating system needs. - Default Value: typically 4GB
- Recommended Value: any amount of
free
memory expected to be around ordinary operating conditions
work_mem
- Defines the amount of memory that internal sorting operations, indexing operations, and hash tables can consume before the database switches to on-disk files. This value defines the available memory for each operation; complex queries may have several sort or hash operations running in parallel, and each connected session may be executing a query.
As such you must consider how many connections and the complexity of expected queries before increasing this value. The benefit to increasing is that the processing of more of these operations, including ORDER BY, and DISTINCT clauses, merge and hash joins, hash-based aggregation and hash-based processing of subqueries, can be accomplished without incurring disk writes. The cost of increasing is memory that will be used per connection, which can be quite high with production levels of connections.
- Default Value: 1MB
- Recommended Value: 32MB
maintenance_work_mem
- Defines the amount of memory used for maintenance operations, including vacuuming, index, and foreign key creation.
- Default Value: 16MB
- Recommended Value: 128MB
wal_buffers
- Sets the amount of memory used for write-ahead-log data.
- Default Value: 64kB
- Recommended Value: 1MB
checkpoint_segments
- This value sets the maximum number of log file segments (typically 16MB_ that can be filled between automatic WAL checkpoints.
- Default Value: 3
- Recommended Value: 6
random_page_cost
- This is a unit-less value that represents the cost of a random page access from disk.
- Default Value: 4.0
- Recommended Value: 2.0
seq_page_cost
- This is the parameter that controls the cost of a sequential page access.
- Default Value: 1.0
- Recommended Value: 1.0
PostgreSQL Schemas
- Schemas are like folders, and they can hold tables, views, functions, and other relations. Each database starts out with one schema, the
public
schema. - Inside that schema, the default install of PostGIS creates the
geometry_columns
,geography_columns
, andspatial_ref_sys
metadata relations, as well as the types and functions used by PostGIS.
- Inside that schema, the default install of PostGIS creates the
- Reasons for using schemas:
- Data that is managed in a schema is easier to apply bulk actions to
- Users can be restricted in their work to single schemas to allow isolation of analytical and test tables from production tables.
Comments
There are currently no comments to show for this article.