Spatial analysis involves working with geographically referenced data (with coordinates) and is increasingly important across disciplines. R provides a powerful, open-source environment for spatial analysis, leveraging its data manipulation, modeling and visualization tools for geographic data. Major advantages include a vast ecosystem of spatial packages and statistical methods, while a drawback is less built-in GUI interactivity compared to desktop GIS software.
In R, spatial data are typically of two kinds: vector (points, lines, polygons with attributes) and raster (gridded data values). R’s modern spatial ecosystem centers on the sf (Simple Features) and terra packages (and their predecessors sp and raster). These provide classes and functions to represent, read, write, and manipulate spatial objects. Spatial analysis in R ranges from mapping and geoprocessing to advanced statistical modeling (e.g. spatial regression, kriging, point-pattern analysis).
Key R Packages for Spatial Data Analysis
R’s spatial functionality is organized into packages, many maintained by the r-spatial community. Below are some of the most important:
- sf: Implements the “Simple Features” standard for vector (point/line/polygon) data. It “provides simple features access for R” by representing geometries in a
data.frame
(or tibble) with a geometry column. It supports all SF geometry types, CRS transformations, and interfaces to GEOS (geometric operations), GDAL (file I/O), and Proj (projections). Most modern vector GIS work in R uses sf. (Predecessor: the sp package, which defined SpatialPoints/DataFrames etc., andrgdal
/rgeos
for I/O/operations; sf largely supersedes these.) - terra (and raster): Packages for raster (gridded) data. terra is the current package (successor to raster) for spatial (raster) data, designed for large datasets and speed. It has functions to create, read, manipulate, and write rasters. The
SpatRaster
class represents multi-layer grids; theSpatVector
class handles vector data too. According to CRAN, “terra: Spatial Data Analysis – methods for spatial data analysis with vector (points, lines, polygons) and raster (grid) data… processing of very large files is supported… ‘terra’ replaces the ‘raster’ package (‘terra’ can do more, and it is faster and easier to use)”. terra provides operations like raster algebra, focal/zonal statistics, resampling, and I/O with GDAL. (The older raster package had similar functionality, but terra is now recommended.) - tmap: A high-level mapping package for thematic maps. It “offers a flexible, layer-based, and easy to use approach to create thematic maps, such as choropleths and bubble maps”. tmap’s syntax is similar to ggplot2 and supports both static (“plot”) and interactive (“view”) modes. It works with sf, sp, and raster data.
- spatstat: A comprehensive toolbox for spatial point pattern analysis. As CRAN describes, “spatstat: Spatial Point Pattern Analysis, Model-Fitting, Simulation, Tests – a comprehensive open-source toolbox for analysing spatial point patterns”. It focuses on 2-D point processes (with support for 3-D and spatio-temporal patterns) and provides hundreds of functions for exploratory analysis (e.g. K-functions, intensity estimation), model-fitting (Poisson, Gibbs, cluster processes), and inference on point data.
- gstat: For geostatistical modeling and kriging. Implements variogram estimation and (co)kriging, including spatio-temporal kriging and (co-)simulation. The CRAN description highlights “variogram modelling; simple, ordinary and universal point or block (co)kriging; spatio-temporal kriging; sequential Gaussian or indicator (co)simulation; variogram and variogram map plotting… supports sf and stars”. In short, gstat lets you fit spatial variograms and make gridded predictions (kriging) of continuous fields.
- spdep / spatialreg: Functions for spatial weights and spatial regression (autocorrelation). spdep provides tools to define spatial neighbors/weights from polygon contiguity or point distances, and to compute spatial autocorrelation statistics (Moran’s I, Geary’s C, etc.). It also offers local indicators (LISA), spatial regressions (via
spatialreg
), and related tests. For example,poly2nb()
andnb2listw()
build adjacency lists and weight matrices;moran.test()
andspautolm()
perform autocorrelation tests and spatial regression. - R-INLA: A framework for Bayesian spatial/spatio-temporal models. The R-INLA package uses Integrated Nested Laplace Approximations for latent Gaussian models (a class covering many spatial models). It is widely used for modeling spatial fields and disease maps, offering much faster inference than classic MCMC for complex hierarchical models. For instance, INLA with the SPDE approach allows Gaussian process models over continuous domains (useful in geostatistics and spatial epidemiology). (Usage: define meshes, latent fields, and fit Bayesian spatial models – an advanced topic.)
- Other packages: Many other specialized spatial packages exist. For network data, sfnetworks provides a “tidy” graph structure for spatial networks (integrating
sf
andtidygraph
). leaflet is popular for interactive web mapping. rgeos, rgdal, maptools, etc., are older packages (largely replaced by sf). stars handles spatio-temporal arrays (multidimensional rasters). The spData package offers example datasets for learning (points, polygons, rasters).
Common Spatial Data Types in R
Spatial data typically fall into four categories, each handled by specific R classes:
- Vector data (points, lines, polygons): Represent geographic features like country borders, roads, or observation points. In R, vector data are handled by sf (
sf
objects) or older sp (SpatialPoints/DataFrame
,SpatialPolygons
, etc.). Each feature has attributes (columns) and geometry. For example, an sf data frame might store cities with point geometries and population attributes. Operations include joins, overlays, buffering, intersection, etc. (R’ssf
functions likest_read()
,st_intersection()
,st_buffer()
, etc., operate on these.) - Raster data (grids/pixels): Represent continuous surfaces (e.g. elevation, satellite imagery, climate grids). A raster is “a grid of equal size cells … commonly used to represent spatially continuous data,” where each cell has one or more values. R raster data are stored in SpatRaster (terra) or RasterLayer/Stack (raster) objects. Each layer is a 2D matrix of values with an associated extent and CRS. R can read GeoTIFF, NetCDF, and many raster formats. Typical raster operations include resampling, reclassification, zonal statistics, map algebra, and visualization.
- Point pattern data: A special case of vector data, representing observed event locations (e.g. tree positions, disease cases, crime incidents). These are often handled as point pattern objects in spatstat (class
ppp
) or simply as an sf point set. spatstat provides dedicated analysis (e.g. estimating intensity, testing complete spatial randomness, fitting point process models). - Network data: Geospatial networks (e.g. roads, utility lines) can be represented as graph objects with spatial geometries. Packages like sfnetworks combine
sf
andigraph
to represent edges and nodes (sfnetwork
class). Other packages (e.g. stplanr, dodgr) also handle routing and connectivity. Network data allow analyses like shortest paths, connectivity, and network statistics.
R supports data in many file formats: shapefiles, GeoJSON, GeoPackage for vectors; GeoTIFF, NetCDF for rasters; and databases (PostGIS) via sf
. Coordinate reference systems (CRS) are managed by the sf and terra packages (through PROJ). The spatial objects are typically data frames (vectors) or arrays (rasters) with attached georeferencing.
Example Spatial Analysis Workflows in R
Below are illustrative workflows showing how spatial tasks are performed in R. (Code assumes necessary packages are installed.)
Vector Data: Reading and Mapping
library(sf)
# Read example vector dataset (North Carolina counties included in sf)
nc <- st_read(system.file("shape/nc.shp", package="sf"))
# Simple attribute and geometry summary
print(nc)
# Plot map of North Carolina, coloring by area
plot(nc["AREA"], main="NC County Areas", col=terrain.colors(10))
This code loads an sf object (nc
) with polygon geometries. We then plot the geometry colored by the AREA
attribute. Typical next steps might include spatial joins (e.g. joining to point data), computing centroids (st_centroid
), or subsetting by spatial conditions (e.g. st_within
).
Raster Data: Reading and Arithmetic
library(terra)
# Read example raster (elevation data from terra package)
r <- rast(system.file("ex/elev.tif", package="terra"))
plot(r, main="Elevation (meters)")
# Example: compute a derived raster (e.g. difference or normalized index)
# Here, we simulate two layers for illustration
r2 <- r * 1.2 # pretend second band (e.g., a scaled version)
diff <- r2 - r
plot(diff, main="Differenced Raster")
This shows reading a raster file into a SpatRaster
and plotting it. In practice, one might compute indices (e.g., NDVI from red/infrared bands), reproject to another CRS (project()
), crop/mask (crop()
), or perform algebraic operations on layers.
Spatial Overlay / Join Example
library(sf)
# Suppose we have a point dataset of cities and a polygon dataset of regions
cities <- st_as_sf(data.frame(
city=c("A","B","C"), lat=c(35,36,35.5), lon=c(-80,-79,-78),
stringsAsFactors=FALSE), coords=c("lon","lat"), crs=4326)
regions <- st_make_grid(st_bbox(c(xmin=-81, xmax=-77, ymin=34, ymax=37)), n=2)
regions <- st_sf(id=1:4, geometry=regions)
# Spatial join: count cities in each region
cities$dummy <- 1
counts <- aggregate(dummy ~ id, data=st_join(cities, regions), FUN=sum)
print(counts)
plot(regions["id"], main="Regions with City Counts")
This (contrived) example creates point and polygon data and uses st_join
to overlay them, aggregating city counts by region. R’s sf functions (st_join
, aggregate
) make overlay analysis straightforward.
Point Pattern Analysis (spatstat)
# Generate a random point pattern in a unit square
library(spatstat)
pp <- rpoispp(lambda=100) # 100 points per unit area
plot(pp, main="Simulated Point Pattern")
# Estimate and plot Ripley's K-function
K <- Kest(pp)
plot(K, main="Estimated K-function", legendargs=list(x="topright"))
This uses spatstat to analyze a point process: generating 100 points and computing Ripley’s K-function. spatstat’s Kest
, Gest
, etc., help diagnose clustering vs dispersion in point data.
Spatial Autocorrelation (spdep)
library(spdep)
data(columbus, package="spData") # example areal data (Ohio region)
# Build adjacency (queen contiguity) and weight matrix
nb <- poly2nb(columbus, queen=TRUE)
lw <- nb2listw(nb, style="W")
# Moran's I test for spatial autocorrelation in crime rates
moran.test(columbus$CRIME, lw)
Here, spdep is used on an example dataset (columbus
) to test for spatial autocorrelation. The output of moran.test
shows a Moran’s I statistic and p-value, indicating if values are clustered.
Use Cases Across Domains
Spatial R is used in virtually every field involving geography. For example:
- Environmental science: Mapping and modeling phenomena like land cover, habitat suitability, climate variables, or pollution. R’s raster packages allow processing satellite imagery (e.g. computing NDVI from multispectral data). Species distribution modeling often uses spatial covariates via terra and statistical models (e.g. GLMs or machine learning with caret, mgcv, etc. on spatial predictors). Climate data (temperature, precipitation) are handled as rasters, and geostatistical interpolation (kriging with gstat) is common.
- Urban planning and transportation: Analyzing city spatial patterns (zoning, land use, traffic), service-area mapping, and network analysis. For instance, planners might use sf and tmap or ggplot2 to visualize demographic data by census tract, or sfnetworks to model road networks. Accessibility analysis (e.g. distance to parks or hospitals) often uses spatial joins and network distances (via stplanr or dodgr).
- Public health and epidemiology: Disease mapping and exposure studies frequently use spatial R. A classic example is John Snow’s cholera map (1854) which highlighted clustering of cases around a water pump. Modern analyses use R to map incidence rates, identify clusters (e.g. with Kulldorff’s scan statistic), and fit Bayesian disease models (R-INLA or CARBayes). R’s leaflet or shiny are also used for interactive disease dashboards. Bayesian spatial models (using R-INLA or spatialreg) account for spatial dependence in health data