MarineSPEED: Spatial cross-validation and variograms

As one of the downloads of the MarineSPEED dataset I’m trying to create a spatially disjoint set of subsamplings of the original dataset in order to reduce the spatial sorting bias (Hijmans 2012). One of the possible approaches is an adaptation of the one used by sdmtoolbox for spatial jackknifing:

  1. create voronoi polygons for all points
  2. spatially cluster points
  3. merge voronoi polygons of the same cluster
  4. clip the voronoi polygons with some buffer around the points

For this last step I need to define the buffer distance and decided to use the fit variograms to a selection of the Bio-ORACLE and MARSPEC rasters and use the range as an indication of the buffer distance.

201, 325, 810, 1172, 433, 1919, 2254,1556,790,2145
,749 # salinity

Raster Range (cutoff=1000) Range (cutoff=2000)
calcite 100 201
ph 604 790
sst 1400 2717
chlorophyl 300 325
chlorophyl range 40 810
mean cloud fraction 700 1172
mean diffuse attenuation 255 433
dissolved oxygen 907 1919
nitrate 1402 2254
photosynthetically available radiation 755 1556
phosphate 1360 2154
salinity 536 749
silicate 240 2184
sst range 666 1238
bathymetry 325 530
salinity variance 390 223

I fitted gaussian variogram models with the gstat package from 20000 random sample points of the rasters. Cutoff was set to 1000 and another time to 2000. The 2 starting models tried where vgm(1,”Gau”,100,1) and vgm(1,”Gau”,1000,1) as they didn’t always converge when only trying one starting model. The Gaussian model was used as based on visual inspection it seemed to fit better models with shorter ranges which gives as a more conservative estimate of the range. Note that the cutoff has a big influence on model results especially on those models with range values around 1000 and more. For example refitting the model for “dissolved oxygen” with a cutoff of 500 gives a range of 448 as compared to a range 907 when the cutoff is set to 1000.

From these results we can conclude that range of spatial autocorrelation is very large for most rasters and picking a buffer size of 200km for the spatial cross-validation is a reasonable value with only 2 of 17 rasters having a smaller range a cutoff 1000 and no rasters having a smaller range when the cutoff is set to 2000.