As one of the downloads of the MarineSPEED dataset I’m trying to create a spatially disjoint set of subsamplings of the original dataset in order to reduce the spatial sorting bias (Hijmans 2012). One of the possible approaches is an adaptation of the one used by sdmtoolbox for spatial jackknifing:

- create voronoi polygons for all points
- spatially cluster points
- merge voronoi polygons of the same cluster
- clip the voronoi polygons with some buffer around the points

For this last step I need to define the buffer distance and decided to use the fit variograms to a selection of the Bio-ORACLE and MARSPEC rasters and use the range as an indication of the buffer distance.

201, 325, 810, 1172, 433, 1919, 2254,1556,790,2145

,749 # salinity

,2184

Raster |
Range (cutoff=1000) |
Range (cutoff=2000) |

calcite | 100 | 201 |

ph | 604 | 790 |

sst | 1400 | 2717 |

chlorophyl | 300 | 325 |

chlorophyl range | 40 | 810 |

mean cloud fraction | 700 | 1172 |

mean diffuse attenuation | 255 | 433 |

dissolved oxygen | 907 | 1919 |

nitrate | 1402 | 2254 |

photosynthetically available radiation | 755 | 1556 |

phosphate | 1360 | 2154 |

salinity | 536 | 749 |

silicate | 240 | 2184 |

sst range | 666 | 1238 |

bathymetry | 325 | 530 |

salinity variance | 390 | 223 |

I fitted gaussian variogram models with the gstat package from 20000 random sample points of the rasters. Cutoff was set to 1000 and another time to 2000. The 2 starting models tried where vgm(1,”Gau”,100,1) and vgm(1,”Gau”,1000,1) as they didn’t always converge when only trying one starting model. The Gaussian model was used as based on visual inspection it seemed to fit better models with shorter ranges which gives as a more conservative estimate of the range. Note that the cutoff has a big influence on model results especially on those models with range values around 1000 and more. For example refitting the model for “dissolved oxygen” with a cutoff of 500 gives a range of 448 as compared to a range 907 when the cutoff is set to 1000.

From these results we can conclude that range of spatial autocorrelation is very large for most rasters and picking a buffer size of 200km for the spatial cross-validation is a reasonable value with only 2 of 17 rasters having a smaller range a cutoff 1000 and no rasters having a smaller range when the cutoff is set to 2000.