MarineSPEED slow data preparation and improving the UI

Distribution records

Only the records for 190 species got processed last night. With another 300+ species to go performance could be improved. The bottleneck is looking up the environmental data so I tried to improve this part of the process.

Old version:

 add_environment <- function(row, data, layers) {
   print("add environment")
   environment <- extract(layers, data[,xycols])
   data <- cbind(data, environment)
   write.csv2(data, paste0("3_environment/", row$ScientificName, " ", row$AphiaID, ".csv"), row.names = FALSE)
   data
 }

New version:

 add_environment <- function(row, data, layers) {
   print("add environment")
   cells <- cellFromXY(layers, data[,lonlat])
   unique_cells <- unique(cells)
   matched_cells <- match(cells, unique_cells)
   environment <- extract(layers, unique_cells)
   data <- cbind(data, environment[matched_cells,])
   write.csv2(data, paste0("3_environment/", row$ScientificName, " ", row$AphiaID, ".csv"), row.names = FALSE)
   data
 }

But it didn’t improve the performance and might even have slowed everything down but its hard to test because there are huge caching effects.

I’ve finished writing the species info files with traits from WoRMS. Not sure it will be useful relevant information to try to include in modeling but we’ll see.

Shiny UI

I’ve also been working on the MarineSPEED UI. There where some bugs which are now gone and I’ve added a Leaflet map with species points but this is rather slow with large amounts of records (45000 records). So I’ve filtered out records in order to have less then 2000 points per species which still gives a good idea of where the points are without slowing down the map.

Useful leaflet help of the day: http://stackoverflow.com/questions/32107667/clearshapes-not-working-leaflet-for-r/32118413

Next week

  • Create a poster for the VLIZ Marine Scientist conference
  • MarineSPEED viewer:
    • add some overview information
    • create 5-fold/10-fold random cross validation sets of the data (filtered with environmental data)
    • add download links for the data
    • deploy the viewer and link marinespeed.org to it
    • add a short¬†about page