The best way to get an introduction to the package ‘spatialwarnings’ is through our freely-available paper, which describes the scientific background behind early-warning signals, and how they can be computed with the package. The purpose of this document is to provide a more technical introduction, and answer more advanced questions/issues. Please note that if you make a significant use of our package in your work, we would appreciate that you cite the above paper.

This document assumes some basic familiarity with R code. All the required files and source code needed to build this document are available there.

A 5-minutes introduction to `spatialwarnings`

As ecological systems degrade along gradients, several “Early-Warning signals” have been suggested in the literature to monitor such degradation. In practice, they are indicators that measure a property of an ecological system, whose trend along a gradient (or in time) is expected to change before an abrupt shift in ecosystem condition. For example, the spatial variability (e.g. variation in vegetation cover) of a given ecosystem is expected to rise before some regime shifts.

A variety of such indicators have been suggested in the literature: the package spatialwarnings focuses on computing those that are based on spatial patterns. As a result the base type of data used by spatialwarnings are matrices (2D arrays of numbers), that typically come from raster data (e.g. aerial images). For example, using the package, we can compute the spatial variance of the following matrix, which has been suggested as a widely-applicable EWS (Guttal et al. 2009):


# Load the package and enable parallel processing
library(spatialwarnings) 
plan(multicore)

data(serengeti) # get more information on this dataset by using ?serengeti

# Extract and display matrix from one of the example datasets
example_matrix <- serengeti[[1]]
display_matrix(example_matrix)

Here, the function raw_cg_variance() can be used to compute the variance on the coarse-grained matrix:

# This function returns the coarse-grained spatial variance when fed a matrix.
# See ?raw_cg_variance for more details about its computation.
raw_cg_variance(example_matrix)
#>   variance 
#> 0.01057542

When working with indicators, we are often interested in how they change along a gradient rather than in a single value. A typical use-case is to have several aerial images (matrices), each corresponding to a given location along a gradient. The usual task is to compute indicator values on each image, and display how their values change along the gradient.

spatialwarnings provides all of what is needed for such task. By storing matrices in a list, indicators can be computed on all of them at once. In the following example, we compute the spatial variance on a list of matrices (R object serengeti), which are aerial snapshots taken along a spatial gradient of rainfall in Serengeti, Tanzania:

trend <- compute_indicator(serengeti, raw_cg_variance)
trend
#> Spatial Early-Warning: raw_cg_variance 
#> 
#>  13 matrices (size: 250x250)
#> 
#>  Mat. # variance
#>       1    0.011
#>       2    0.012
#>       3    0.015
#>       4    0.022
#>       5    0.024
#>       6    0.031
#>       7    0.034
#>       8    0.041
#>       9    0.040
#>      10    0.040
#>      11    0.054
#>      12    0.074
#>      13    0.131
#> 
#> The following methods are available: 
#> as.data.frame indictest plot summary

The package has computed the spatial variance for all matrices and displayed a text summary. We can now plot the trend along the gradient by calling the generic function plot() on the returned object. Here the example dataset is along a gradient of increasing rainfall, so we will use the rainfall values (contained in the R object serengeti.rain) as our x-axis in a trend plot:

plot(trend, along = serengeti.rain)

Because the tree cover changes along the gradient, this may affect the indicator trends, which may then not reflect the proximity to the shift, but rather changes in cover. Using spatialwarnings, we assess whether the indicator values deviate from their null expectation (their expected value given the vegation cover). This is done by randomizing the spatial structure of the matrix, and recomputing indicator values on such matrix:

variance_test <- indictest(trend)
variance_test
#> Spatial Early-Warning: raw_cg_variance 
#> 
#>  13 matrices (size: 250x250)
#> 
#>  Mat # variance P>null    
#>      1    0.011      0 ***
#>      2    0.012      0 ***
#>      3    0.015      0 ***
#>      4    0.022      0 ***
#>      5    0.024      0 ***
#>      6    0.031      0 ***
#>      7    0.034      0 ***
#>      8    0.041      0 ***
#>      9    0.040      0 ***
#>     10    0.040      0 ***
#>     11    0.054      0 ***
#>     12    0.074      0 ***
#>     13    0.131      0 ***
#> 
#>  Null model: perm (999 null matrices used)
#>  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
#> 
#> The following methods are available:
#> as.data.frame plot summary

Here all observed values of spatial variance are above their null distribution (P-value indistiguishable from zero). Again, we can plot trends and investigate how these null values compare to the indicator trends:

plot(variance_test)

The grey ribbon indicates the 0.05 and 0.95 quantiles of the null distribution (they appear as a thin grey line on the graph).

spatialwarnings allows computing any indicator following such three-step workflow. For convenience, some indicators are already included and can be computed altogether at once. For example, the spatial variance above is part of the so-called “generic indicators”, which also includes lag-1 spatial autocorrelation and skewness. We can compute and display all of them at once using the generic_sews function:


# Compute all generic indicators at once
generic_trend <- generic_sews(serengeti)

# Here we adjust and customize the plot using ggplot2 directives 
plot(generic_trend, along = serengeti.rain) + 
  geom_vline(xintercept = 733, color = "red", linetype = "dashed") +
  labs(x = "Annual rainfall",
       y = "Mean cover/indicator value",
       title = paste("Early warning signals of a shift in tree cover in", 
                     "Serengeti, Tanzania (Eby et al. 2016)"), 
       subtitle = paste("Grey ribbons indicate the 5-95% quantiles of the null", 
                        "distribution"))

Which indicators are available in `spatialwarnings`?

spatialwarnings implements the following indicators found in the literature:

lag-1 autocorrelation (Moran’s I, Dakos et al. 2010)
Variance (Guttal et al. 2008)
Skewness (Guttal et al. 2008)
SDR Ratio (Kéfi et al. 2014)
Patch-size distribution shape (Kéfi et al. 2011)
Power-law range (Berdugo et al. 2017)
Flowlength (Rodriguez et al. 2017)
Kolmogorov complexity (Dakos et al. 2017)

A complete list of the related functions can be found by accessing the package help main page, using ?spatialwarnings.

Do not hesitate to get in touch through Github if you want to see this list grow! Implementing your own indicator is also relatively easy, allowing you to easily try out new spatial metrics (see further down in this document).

What do the reported P-values mean?

The choice made in spatialwarnings is to report P-values that are low when the indicator value is above the null distribution. This is motivated by the fact that most indicators tend to have increasing values near a possible transition. For example, a P-value of 0.01 means that the observed indicator value is above 99% of the values observed in the null distribution.

However, using this convention means that indicators which are expected to show significant deviation when below their corresponding null distribution will show apparently non-significant P-values (i.e. close to one).

spatialwarnings is slow !

While a best effort is made to make spatialwarnings as fast as possible, it relies heavily on permutating matrices and recomputing indicators, which can take a while to compute on personal computers. You can enable parallel processing to alleviate this. In that case, when computing indicators on lists of matrices, spatialwarnings will use one core per matrix. Parallel processing can be enabled using the function plan() (provided by the future framework):

plan(multisession)

Note that the future package provides many other ways to parallelize computations (e.g. across multiple computers in a cluster). For more information, please refer to the official manual.

How can I deal with missing values in the data?

Put simply, there is no single way to deal with missing values (NAs), this is why it’s up to the user of the package to deal with those before computing indicators. It depends on the meaning of those NAs.

If the matrix is has continuous values, then NAs generally represent missing measurements. In this case, interpolation may be an option, but computing indicators on interpolated data may be lead to wrong conclusions (because it increases autocorrelation in the data, which is by itself an indicator !), see e.g. this blog post.

If the matrix has discrete (TRUE/FALSE) values, then NAs may represent either (i) missing measurements, and in this case you are in the above case or (ii) a pixel that has been observed, but where the value represents the non-focal state (i.e. what should be in fact, a FALSE value). In this case, it should be safe to replace the NAs in the matrix by FALSE values.

For example, a matrix could contain TRUE values where the pixel contains forest and NA if it contains any other thing(buildings, grassland, etc.). If you are interested in the spatial dynamics of forest patches, then it should be safe to replace those NAs by FALSE values and compute indicators.

How can I add another indicator?

‘spatialwarnings’ provides the function create_indicator, which allows the user to define new indicators, and use them as if they were *_sews functions (such as generic_sews, which is used in the introduction of this document). This way we only focus on defining the spatial metric, and ‘spatialwarnings’ handles the rest for us such as testing significance or plotting results.

The process is in two steps, we first define in R a function that computes the metric. This function should take a matrix as input, and return a single, numerical value. Here, we define a function that returns the maximum patch size in a matrix.

# This function takes a matrix and a returns a single value.
get_maxpatchsize <- function(mat) {
  max(patchsizes(mat))
}

We now apply this function to a matrix or a set of matrices by passing our data and the above function to compute_indicator():

# Create the indicator function
maxpatch_indic <- compute_indicator(serengeti, get_maxpatchsize)

# Display the results as a plot
plot(maxpatch_indic, along = serengeti.rain)

Trends and significance of custom indicators can be also assessed using the generic functions plot() and indictest():

# Apply this function on the serengeti dataset. We reduce the number of
# permutations when testing significance to speed up computations. 
maxpatch_indic <- indictest(maxpatch_indic, nulln = 99)

# Display the trends, now with significance ribbons
plot(maxpatch_indic, along = serengeti.rain)

Please note that it is good practice to have an indicator function that returns values with names. Otherwise ‘spatialwarnings’ assigns default names to the returned values, such as ‘indic_1’, which appears in the previous plot. As we often work with several spatial metrics at once, this can get quickly confusing. In the snippet below we rewrite our function to return named values: these names will be used in text summaries and plots.


# This function takes a matrix and a returns a single value, but this time 
# it adds names to the returned value.
get_maxpatchsize <- function(mat) {
  c(maxpatchsize = max(patchsizes(mat)))
}
maxpatch_indic <- compute_indicator(serengeti, get_maxpatchsize)

# Display the results as a plot. The header now displays the name of the value
plot(maxpatch_indic, along = serengeti.rain)

Technical considerations

Using parameters in indicator functions

If your spatial metric requires parameters for its computation, compute_indicator() will take care of handling them for you:


# Return the estimated truncation of the patch-size distribution 
psd_quantile <- function(mat, log, q) { 
  psd <- patchsizes(mat)
  if ( log ) { 
    psd <- log10(psd)
  }
  a <- quantile(psd, q)
  names(a) <- paste0("psd_quantile_", q)
  return(a)
}

quantiles_indic <- compute_indicator(forestgap, psd_quantile, 
                                     log = TRUE, q = 0.95)

summary(quantiles_indic)
#> Spatial Early-Warning: psd_quantile 
#> 
#>  9 matrices (size: 400x400)
#> 
#>  Mat. # psd_quantile_0.95
#>       1              2.84
#>       2              0.37
#>       3              0.70
#>       4              1.40
#>       5              1.43
#>       6              1.00
#>       7              0.70
#>       8              0.48
#>       9              0.30
#> 
#> The following methods are available: 
#>

Returning multiple values

Indicator functions can also return multiple values at once, for example, the following indicator function will work with ‘spatialwarnings’:


multi_indic_function <- function(mat) { 
  c(mean     = mean(mat), 
    sd       = sd(coarse_grain(mat, subsize = 4)), 
    coef_var = sd(coarse_grain(mat, subsize = 4)) / mean(mat))
}

multi_indics <- compute_indicator(forestgap, multi_indic_function)
summary(multi_indics)
#> Spatial Early-Warning: multi_indic_function 
#> 
#>  9 matrices (size: 400x400)
#> 
#>  Mat. #  mean    sd coef_var
#>       1 0.918 0.077    0.084
#>       2 0.805 0.111    0.137
#>       3 0.689 0.127    0.184
#>       4 0.581 0.133    0.229
#>       5 0.465 0.133    0.287
#>       6 0.344 0.126    0.367
#>       7 0.230 0.108    0.470
#>       8 0.111 0.080    0.720
#>       9 0.053 0.056    1.053
#> 
#> The following methods are available: 
#> as.data.frame indictest plot summary

If you want to distribute a new indicator with the package, do not hesitate to open a new topic on our Github tracker!

Can I use `Raster*` objects with ‘spatialwarnings’?

Raster* objects are used by the package raster to encapsulate raster data, for example data read from Geotiff images. These objects can contain a single band (RasterLayer objects, with one value per pixel) or multiple bands (RasterStack/RasterBrick objects, with several values per pixel). Each band of a raster object is, by definition, a matrix of values.

Base spatialwarnings cannot use RasterLayer objects directly, but a companion package spatialwarningsGis (available there) provides extensions to work with these objects. You can install the package using devtools:

devtools::install_github('spatial-ews/spatialwarningsGis')
#> Skipping install of 'spatialwarningsGis' from a github remote, the SHA1 (d2a14dda) has not changed since last install.
#>   Use `force = TRUE` to force installation
library(spatialwarningsGis)

You can now use raster objects transparently with spatialwarnings, they will be converted to matrices internally and without extra work for the user. A workflow with RasterLayer object could then look like the following:


# It makes no sense computing indicators on those images, they just are
# example geotiffs
example_files <- 
  paste0("https://download.osgeo.org/geotiff/samples/spot/chicago/", 
         c("SP27GTIF.TIF","UTM2GTIF.TIF"))

# Read the list of files using functions from the package 'raster'
list_of_raster_objects <- lapply(example_files, function(f) {
  tmpfile <- tempfile()
  download.file(f, destfile = tmpfile)
  raster(tmpfile)
})

# ...and use them in spatialwarnings
generic_sews(list_of_raster_objects)
#> Warning in fun(mat, ...): Input matrix has continous values but will be coarse-
#> grained anyway. Set subsize=1 to disable coarse graining.

#> Warning in fun(mat, ...): Input matrix has continous values but will be coarse-
#> grained anyway. Set subsize=1 to disable coarse graining.
#> Spatial Early-Warning: Generic indicators 
#> 
#>  2 matrices (size: 929x699)
#> 
#>  Mat. # variance skewness moran mean
#>       1     1177     0.68  0.64  115
#>       2     1175     0.68  0.63  115
#> 
#> The following methods are available: 
#>

Right now some metadata of the raster object is lost in the process (extent, resolution, etc.). This information could be used by ‘spatialwarnings’ in a future release to adjust output (for example, adjust distance units when displaying the r-spectrum).

Please note that spatialwarnings will not use the multi-band objects RasterStack/RasterBrick as the implemented indicators only work with univariate values. It is unknown which band should be picked in multi-band objects to compute the indicators. You need to extract RasterLayers from these objects first, then use spatialwarnings.

Additional support for other formats representing raster data will be added to spatialwarningsGis.

How can I identify the periodicity in data?

EWS available in spatialwarnings assume that stress does not affect the periodicity of the spatial structure, as changes in periodicity may mask or alter trends in indicator values. Before computing the EWS, it is therefore needed to check for any possible periodicity in the input data. In this section, we contrast two images to show how the r-spectrum can be used to highlight periodicity (or its absence).


# Read images
image_periodic <- readPNG('./patterned_bush_niger.png')
image_nonper   <- readPNG('./non_patterned_spain.png')

# The images have three bands (R/G/B). We transform them into a black
# and white image using principal component analysis (PCA)
summarise_pca <- function(arr) {
  values <- matrix(as.vector(arr), ncol = 3)
  pca <- prcomp(values)
  values <- pca[["x"]][ ,1]
  matrix(values, ncol = ncol(arr), nrow = nrow(arr))
}

# Transform into one-band (black and white image) and compute r-spectrum.
image_periodic_pca <- summarise_pca(image_periodic)
rspec_periodic <- rspectrum(image_periodic_pca)

image_nonper_pca <- - summarise_pca(image_nonper)
rspec_nonper <- rspectrum(image_nonper_pca)

The periodic image below (tiger bush in Niger ¹) displays a typical banded regular pattern (a). Its scale (in number of pixels) is reflected by a hump in the r-spectrum around a size of ~ 15px (b). This corresponds to the width of one vegetation band in the image (~30m in reality).


img <- ggplot(mat2longdf(image_periodic_pca)) +
        geom_raster(aes(x = x, y = y, fill = value)) +
        scale_fill_gradient(low = "black", high = "white") +
        scale_y_reverse() +
        theme_minimal() +
        theme(legend.position = "none") +
        labs(x = "Pixel coordinate",
            y = "Pixel coordinate",
            title = paste0(lettercount(), ") Image of periodic vegetation"))

rsp <- ggplot(rspec_periodic) +
         geom_point(aes(x = dist, y = rspec), pch = 20) +
         scale_x + scale_y +
         theme_minimal() +
         labs(x = "Distance (number of pixels)",
             y = "r-spectrum value (power)",
             title = paste0(lettercount(), ") r-spectrum of periodic image"))

gridExtra::grid.arrange(img, rsp, ncol = 2)

This image displays strong periodicity and the indicators in spatialwarnings should be used with care to draw conclusions from it.

In contrast, an aperiodic image displays no such hump in its r-spectrum. The image below (c, arid grassland in Spain) is aperiodic, as shown by the decreasing r-spectrum (d). Indicators available in spatialwarnings may (in principle) be used on this image.


img <- ggplot(mat2longdf(image_nonper_pca)) +
        geom_raster(aes(x = x, y = y, fill = value)) +
        scale_fill_gradient(low = "black", high = "white") +
        scale_y_reverse() +
        theme_minimal() +
        theme(legend.position = "none") +
        labs(x = "Pixel coordinate",
            y = "Pixel coordinate",
            title = paste0(lettercount(), ") Image of non-periodic vegetation"))

rsp <- ggplot(rspec_nonper) +
         geom_point(aes(x = dist, y = rspec), pch = 20) +
         scale_x + scale_y +
         theme_minimal() +
         labs(x = "Distance (number of pixels)",
             y = "r-spectrum value (power)",
             title = paste0(lettercount(), ") r-spectrum of non-periodic image"))

gridExtra::grid.arrange(img, rsp, ncol = 2)

Converting multi-band images

Raster images from remote sensing often come with multiple bands, because of the way images are stored digitally (with red/blue/green channels), but also because multiple sensors can be used to capture a single image (e.g. a combination of visible/near-infrared sensors). These images are sometimes represented as arrays in R (matrices that are extended with a third dimension, “depth”).

As spatialwarnings computes indicators on matrix objects, these arrays need to be first transformed into matrices (effectively arrays with depth one). This requires collapsing the multi-dimensional data of each pixel to a single value. In addition, some indicators (e.g. patch-based indicators) rely on classified, TRUE/FALSE data that define which pixels belong to a patch and which do not. The classification algorithm employed is likely to be very system-dependent, but we provide a possible example below.

In what follows, we use a small example to show how to carry these two processes and apply the spatialwarnings package on images. Using an aerial image stored as an R/G/B PNG image, we classify each pixel as vegetated/non-vegetated or summarize the multi-channel pixel values to a single numeric value. We then compute indicators on the resulting matrix.

Note that in many cases, other methods can be used to derive univariate indices from remote sensing data that may be more appropriate. For example, a typical one-dimensional index for vegetation data is NDVI (Normalized Difference Vegetation Index), which measures the extent of green vegetation in a given pixel and is obtained as the normalized difference between the red a near-infrared bands of a multi-band image.

Classifying an RGB image to compute patch-based indicators

We first load the required packages and read the image in R.


# Read image
datadir <- "../figs/images/"
img1 <- readPNG('./crau_quercus_encroachment.png')

# Display image
grid::grid.raster(img1)

The vegetation in this image displays some patchiness, that can be characterized by spatialwarnings. img1 is a raster image (represented as an array in R) with three (red/blue/green) channels that need to be be classified into TRUE (pixel is in a vegetation patch) or FALSE values in order to compute patch-based compute indicators.

# Convert the image to a data.frame
img1_tab <- data.frame(expand.grid(x = seq.int(nrow(img1)),
                                   y = seq.int(ncol(img1))),
           as.data.frame(matrix(img1, ncol = 3)))
names(img1_tab) <- c('x', 'y', 'red', 'green', 'blue')

# A very bright object is present in the image, that distorts the results
# further down and needs to be removed. We set its pixel values to NAs.
luminance <- with(img1_tab, sqrt( 0.299*red^2 + 0.587*green^2 + 0.114*blue^2 ))
img1_tab[ ,c('red', 'green', 'blue')] <-
  img1_tab[ ,c('red', 'green', 'blue')] * ifelse(luminance > .5, NA, 1)

We can display the distribution of each channel values:

# fix level order so colors match in graph
df <- gather(img1_tab, channel, value, red, green, blue)
df[ ,'channel'] <- factor(df[ ,'channel'], levels = c("red", "green", "blue"),
                          ordered = TRUE)
ggplot( df ) +
  geom_density(aes(x = value, color = channel)) +
  theme_minimal() +
  labs(caption = "Distribution of channel values") +
  scale_color_manual(values = c('red', 'green', 'blue', 'black'))
#> Warning: Removed 564 rows containing non-finite values (stat_density).

Each individual channel can be also represented as a monochrome image:

ggplot(gather(img1_tab, channel, value, red, green, blue)) +
  geom_raster(aes(x = x, y = y, fill = value)) +
  facet_grid( ~ channel, labeller = label_both) +
  coord_fixed() +
  theme_minimal() +
  scale_fill_gradient(low = "#000000", high = "#FFFFFF") +
  labs(caption = "Monochrome representation of the three channels of the RGB image")

We can classify this image using an unsupervised k-means classification algorithm on the pixel data. Many other classification algorithms exist and may show better accuracy, but k-means is simple, generic and fast.


km <- kmeans(na.omit(img1_tab[ ,c('red', 'green', 'blue')]),
             centers = 2)
img1_tab[ ,'clust'] <- NA
img1_tab[!is.na(img1_tab[ ,'red']), 'clust'] <- km[['cluster']]

# The number identifying each cluster is random in k-means: we make sure that
# cluster 2 always has greener values, and thus always corresponds to potential
# vegetation patches.
img1_tab[ ,'clust'] <- with(img1_tab, as.integer(reorder(as.factor(clust), -green)))

ggplot(img1_tab) +
  geom_raster(aes(x = x, y = y,
                  fill = as.factor(clust))) +
  coord_fixed() +
  theme_minimal() +
  scale_fill_manual(name = "Vegetation",
                    values = c('#F4EAA4', '#0A8E0B'))
#> Warning: Removed 188 rows containing missing values (geom_raster).

Cluster 2 roughly identifies vegetation patches, but many single-pixel values are misclassified by the k-means algorithm. However, they most likely pertain to the nearest patch: we apply a smoothing filter to get rid of them.


clust_filt <- with(img1_tab,
                   gblur(matrix(!is.na(clust) & clust == 2,
                                nrow = max(x), ncol = max(y)),
                                sigma = 1.5)) > .2

img1_tab[ ,'clust_filt'] <- as.vector(clust_filt)

ggplot(img1_tab) +
  geom_raster(aes(x = x, y = y,
                  fill = as.factor(clust_filt))) +
  coord_fixed() +
  theme_minimal() +
  scale_fill_manual(name = "Vegetation",
                    values = c('#F4EAA4', '#0A8E0B'))

We can then apply the patch-based indicators on the resulting classified matrix.


psd_indic <- patchdistr_sews(clust_filt, fit_lnorm = TRUE)

summary(psd_indic)
#> Spatial Early-Warning: Patch-based indicators
#> 
#>   N(uniq.) Cover Percl.Full Percl.Empt  Type PLR
#>  1074(262)  0.52      FALSE       TRUE lnorm 57%
#> 
#> The following methods are available: 
#> 

plot_distr(psd_indic, best_only = FALSE) +
  labs(title = "Results of patch-size distribution fitting")

Note that this brief demonstration uses k-means, but many classification methods exist, either supervised or unsupervised (Liu and Mason, 2016) and may show better results depending on the focal system.

Reducing the dimensionality of pixel data

Indicators require each pixel to have a single, uni-dimensional value. One option is to use the values of each channel independently (e.g. using red values only), but these may not represent very well the biological variable of interest in each pixel.

Similar to classifications, supervised and unsupervised methods exist to summarize three-dimensional information to a single value. A simple generic unsupervised method is Principal Component Analysis (see e.g. Liu and Mason, 2016 for more details about its application on raster images), which we use here as example.

We carry out a PCA on the pixel data:


# We use the red/green/blue information and combine them using PCA
NApixel <- is.na(img1_tab[ ,'red'])
pca <- prcomp(img1_tab[!NApixel, c('red', 'green', 'blue')])

summary(pca)
#> Importance of components:
#>                           PC1     PC2     PC3
#> Standard deviation     0.1704 0.01893 0.01193
#> Proportion of Variance 0.9831 0.01212 0.00482
#> Cumulative Proportion  0.9831 0.99518 1.00000

# Import data back into data.frame
img1_tab[ ,'pca1'] <- NA
img1_tab[!NApixel,'pca1'] <- predict(pca)[ ,1]

ggplot(img1_tab) +
  geom_raster(aes(x = x, y = y, fill = pca1)) +
  coord_fixed() +
  theme_minimal()

Results from PCA show that the first axis explains 98% of the variance in RGB value, so it summarizes very well the variations in pixel values. We can then apply indicators on these univariate values (we use here the example of spectral indicators).

Note that because there are some missing values (0.03% of all pixels) in the input image (the pixels where a very bright object is present), we need to interpolate them before computing the indicators. Note that interpolation increases the autocorrelation in a given matrix and indicators should be interpreted with caution if a large amount of pixels have been interpolated.


# Convert to matrix format then compute indicators. Because there are NA
# values in the input image, we need to interpolate these values. We fill the
# pixels with NAs with the mean value of their direct neighbors.
pca_mat <- with(img1_tab, matrix(pca1, ncol = max(y), nrow = max(x)))
while ( any( is.na(pca_mat) ) ) {
  for (i in seq.int(nrow(pca_mat))) {
    if ( any(is.na(pca_mat[i, ])) ) {
      for (j in seq.int(ncol(pca_mat))) {

        # Bound check on the matrix
        i2 <- min(i, 1, nrow(pca_mat))
        j2 <- min(j, 1, ncol(pca_mat))
        if ( is.na(pca_mat[i, j]) ) {
          pca_mat[i, j] <- mean(pca_mat[(i2-1):(i2+1),
                                        (j2-1):(j2+1)])
        }
      }
    }
  }
}

We can then compute the indicators on the resulting matrix:


# Compute spectral indicators
ic <- spectral_sews(pca_mat,
                    sdr_low_range = c(0, .2),
                    sdr_high_range = c(.8, 1))
#> Warning in spectral_sews(pca_mat, sdr_low_range = c(0, 0.2), sdr_high_range = c(0.8, : The matrix is not square: only a squared subset of the left ", 
#>               "part will be used.
ic <- indictest(ic, nulln = 499)

# Display textual summary
summary(ic)
#> Spatial Early-Warning: Spectrum-based indicators 
#> 
#>  1 matrix (size: 810x810)
#> 
#>  Mat #  sdr P>null    
#>      1 4130      0 ***
#> 
#>  Null model: perm (499 null matrices used)
#>  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
#> 
#> The following methods are available:
#>

Note that while PCA may summarize well arbitrary variations in R/G/B pixel values, this does not mean that the resulting pixel values always capture well the variations in the system state. Before interpreting indicators, it is necessary to understand well how the numerical values in the image reflect the state of the ecological system. The use of indices known to reflect a specific aspect of the system (e.g. NDVI reflecting the amount of green vegetation cover) may be often more informative than using a generic algorithm to collapse multiple channels into one.

Using advanced null models

To test the significance of indicator values, by default spatialwarnings will take the original matrix and derive many ‘null’ matrices from it. A typical null matrix is one that has the same average, but a completely random spatial structure. For example, such a matrix can be obtained by shuffling all cells in the original matrix. By recomputing the indicators on a set of such random matrices, we get a distribution of ‘null’ values to which we can compare the observed indicator value. This way we check if the spatial pattern we observe is consistent with the null hypothesis of no spatial structure, or if there is something significant in the landscape.

In spatialwarnings, the default method used to produce null matrices is to reshuffle the cells randomly in the matrix (method perm in the arguments of indictest, see ?indictest). This will maintain the average value of the matrix (i.e. the proportion of TRUE cells when a matrix is made of TRUE/FALSE values), but remove any spatial structure. It is the default because it is reasonably fast.

Another method (method intercept) will fit an intercept-only generalized linear model using glm() to the values of the matrix, then use this model to predict null values in each cell. For example, if the matrix has continuous values, this method will fit a normal distribution to those values, and draw random values from this estimated distribution to create new null matrices.

For an example, let’s consider the following matrix:

indic_example <- generic_sews(serengeti[[13]])
display_matrix(indic_example)

We use the ‘intercept’ method and extract a null matrix from the indictest object:


test_example <- indictest(indic_example, nulln = 49, 
                          null_method = "intercept")
#> Warning in null_control_set_args(input, null_control, null_method): indictest:
#> using a binomial() family with default options
display_matrix( test_example$get_nullmat() )

This method fully randomizes the spatial structure, but the cover is preserved. Note that this is is very similar to shuffling the position of the cells in the matrix. The difference is that the intercept method will preserve the cover on average (for a large number of null matrices), while the perm method will preserve the cover exactly.

When fitting the generalized linear model, spatialwarnings tries to make a sensible default choice for the glm() family. It is crucial that this argument is well-chosen, as otherwise the fitted model may be inappropriate. By default, spatialwarnings uses a binomial() model ‘family’ (family argument in glm) for logical matrices, and a gaussian() family for real-valued matrices. A warning is produced when this automatic choice is made, and specifying explicitely the model family will remove the warning:


test_example <- indictest(indic_example, nulln = 49, 
                          null_method = "intercept", 
                          null_control = list(family = binomial()))

These two null models assume that all the spatial patterning found in a matrix is related to the proximity to a possible regime shift, i.e. without influence of any external driver. In other words, the null model assumes that the probability of finding a given value in the null matrix is constant over the whole matrix. This may be an unrealistic assumption for many real-world cases.

For example, let’s consider a case where vegetation cannot grow in some part of a landscape because of higher elevation, or the presence of a water body. In this case, spatial autocorrelation increases because vegetation is clustered into the remaining areas, where plants can grow. Such changes in correlation will change indicator values in ways that are unrelated to regime shifts, and lead to mis-interpretation. It is thus critical for correct interpretation of patterns to make sure no unaccounted-for variable drives changes in indicator values.

There is no “one size fits all” method to take care of such confounding environmental driver. Ideally, one would estimate the probability of observing a given value in a cell (e.g. vegetation cover), given the environmental drivers in that cell, but without the effect of the biological processes producing the regime shifts. This requires having detailed spatial data, i.e. not only the matrix with observed values, but also covariates driving these observed values (e.g. rainfall maps). This is not impossible to do but it is very much case-specific.

A possible null model of intermediate complexity, that does not need external data, is assuming that any large-scale changes in values are driven by environmental factors, while small-scale changes are related to biological processes (and thus reflect better the changes in indicator values related to regime shifts). To do so, we can estimate changes in matrix values over large scales, and use those estimated probabilities to produce the null matrices.

Let’s consider the previous matrix (describing the vegetation cover in serengeti):

display_matrix(indic_example)

It looks like overall, there is more vegetation in the bottom part of the matrix than in its upper part. In other words, a cell has a higher probability of being covered by forest when it is on the bottom side of the matrix. Such large-scale pattern could be related to a change in rainfall or altitude, rather than proximity to a regime shift. It would be good to take it into account to make sure it does not drive the trends in autocorrelation.

We can do so by fitting a smoothed version of the landscape over large scales, using gam(). This model will allow the average value of a cell to change depending on the position of the cell in the matrix. Here we use this null model and draw a random, ‘null’ matrix:


test_example <- indictest(indic_example, nulln = 99, 
                          null_method = "smooth")
#> Warning in null_control_set_args(input, null_control, null_method): indictest:
#> using a binomial() family with default options
display_matrix( test_example$get_nullmat() )

As expected, the model picks up the fact that the bottom side of the landscape tends to have more forest: in those areas, the null matrix has a higher probability of being covered.

Let’s compare the three null models using the ‘generic indicators’:


indics <- generic_sews(serengeti)

control <- list(family = binomial())

# Compute and export the values 
all_methods <- ldply(c("perm", "intercept", "smooth"), function(method) { 
  test <- indictest(indics, nulln = 99, 
                    null_method = method, 
                    null_control = control)
  data.frame(method = method, as.data.frame(test))
})

# Display the results
all_methods[ ,"x"] <- serengeti.rain[all_methods[ ,"matrixn"]]
ggplot(all_methods, aes(x = x)) + 
  geom_line(aes(y = null_mean, group = method), alpha = .2) + 
  geom_ribbon(aes(ymin = null_qinf, ymax = null_qsup, fill = method), 
              alpha = .8) + 
  geom_point(aes(y = value), size = 1) + 
  geom_line(aes(y = value)) + 
  facet_wrap( ~ indic, scales = "free_y") + 
  theme_minimal() + 
  labs(x = "Rainfall (mm)", 
       y = "Indicator value")

In the above figure, the observed indicator values are represented with the black line. The null values are represented by the ribbons, with the color indicating the null method used. The ‘intercept’ and ‘perm’ values give very similar results. The ‘smooth’ method produces values that are closer to the observed indicator value, i.e. it is more conservative than the shuffling or intercept-only model. This is expected as the smooth method produces null matrices that are closer to the original matrix.

When running analyses, it is probably worth trying these different sets of null models to make up your mind about how strongly values deviate from random expectations.

References

Dakos, V., and F. Soler-Toscano. 2017. Measuring complexity to infer changes in the dynamics of ecological systems under stress. Ecological Complexity 32:144–155.

Guttal, Vishwesha, and C. Jayaprakash. 2009. “Spatial Variance and Spatial Skewness: Leading Indicators of Regime Shifts in Spatial Ecological Systems.” Theoretical Ecology 2 (1): 3–12. https://doi.org/10.1007/s12080-008-0033-1.

Liu and Mason, 2016. Image Processing and GIS for Remote Sensing: Techniques and Applications. John Wiley & Sons, Ltd.

https://www.google.com/maps/place/12%C2%B030'47.3%22N+3%C2%B000'49.9%22E/@12.5131562,3.0116833,905m/data=!3m2!1e3!4b1!4m5!3m4!1s0x0:0x0!8m2!3d12.513151!4d3.013872?hl=en]↩

spatialwarnings - User Guide

2020-07-15

A 5-minutes introduction to `spatialwarnings`

Which indicators are available in `spatialwarnings`?

What do the reported P-values mean?

spatialwarnings is slow !

How can I deal with missing values in the data?

How can I add another indicator?

Technical considerations

Using parameters in indicator functions

Returning multiple values

Can I use `Raster*` objects with ‘spatialwarnings’?

How can I identify the periodicity in data?

Converting multi-band images

Classifying an RGB image to compute patch-based indicators

Reducing the dimensionality of pixel data

Using advanced null models

References

spatialwarnings - User Guide

2020-07-15

A 5-minutes introduction to spatialwarnings

Which indicators are available in spatialwarnings?

What do the reported P-values mean?

spatialwarnings is slow !

How can I deal with missing values in the data?

How can I add another indicator?

Technical considerations

Using parameters in indicator functions

Returning multiple values

Can I use Raster* objects with ‘spatialwarnings’?

How can I identify the periodicity in data?

Converting multi-band images

Classifying an RGB image to compute patch-based indicators

Reducing the dimensionality of pixel data

Using advanced null models

References

A 5-minutes introduction to `spatialwarnings`

Which indicators are available in `spatialwarnings`?

Can I use `Raster*` objects with ‘spatialwarnings’?