Introduction

This package adds resampling methods for the {mlr3} package framework suited for spatial, temporal and spatiotemporal data. These methods can help to reduce the influence of autocorrelation on performance estimates when performing cross-validation. While this article gives a rather technical introduction to the package, a more applied approach can be found in the mlr3book section on “Spatiotemporal Analysis”.

After loading the package via library("mlr3spatiotempcv"), the spatiotemporal resampling methods and example tasks provided by {mlr3spatiotempcv} are available to the user.

In mlr3, dictionaries are used for overview purposes of available methods. The following shows which dictionaries get appended with new entries.

Task Type

Additional task types:

  • TaskClassifST

  • TaskRegrST

mlr_reflections$task_types
#>       type          package          task        learner        prediction
#> 1: classif             mlr3   TaskClassif LearnerClassif PredictionClassif
#> 2: classif mlr3spatiotempcv TaskClassifST LearnerClassif PredictionClassif
#> 3:    regr             mlr3      TaskRegr    LearnerRegr    PredictionRegr
#> 4:    regr mlr3spatiotempcv    TaskRegrST    LearnerRegr    PredictionRegr
#>           measure
#> 1: MeasureClassif
#> 2: MeasureClassif
#> 3:    MeasureRegr
#> 4:    MeasureRegr

Task Column Roles

Additional column roles:

  • coordinates
mlr_reflections$task_col_roles
#> $regr
#> [1] "feature" "target"  "name"    "order"   "stratum" "group"   "weight" 
#> 
#> $classif
#> [1] "feature" "target"  "name"    "order"   "stratum" "group"   "weight" 
#> 
#> $classif_st
#> [1] "feature"     "target"      "name"        "order"       "stratum"    
#> [6] "group"       "weight"      "coordinates"
#> 
#> $regr_st
#> [1] "feature"     "target"      "name"        "order"       "stratum"    
#> [6] "group"       "weight"      "coordinates"

Resampling Methods

Additional resampling methods:

  • spcv_block

  • spcv_buffer

  • spcv_coords

  • spcv_env

  • sptcv_cluto

  • sptcv_cstf

and their respective repeated versions.

as.data.table(mlr_resamplings)
#>                      key                                  params iters
#>  1:            bootstrap                           repeats,ratio    30
#>  2:               custom                                             0
#>  3:                   cv                                   folds    10
#>  4:              holdout                                   ratio     1
#>  5:             insample                                             1
#>  6:                  loo                                            NA
#>  7:          repeated_cv                           repeats,folds   100
#>  8:  repeated_spcv_block folds,repeats,rows,cols,range,selection    10
#>  9: repeated_spcv_coords                           folds,repeats    10
#> 10:    repeated_spcv_env                  folds,repeats,features    10
#> 11: repeated_sptcv_cluto                           folds,repeats    10
#> 12:  repeated_sptcv_cstf                           folds,repeats    10
#> 13:           spcv_block         folds,rows,cols,range,selection    10
#> 14:          spcv_buffer               theRange,spDataType,addBG     0
#> 15:          spcv_coords                                   folds    10
#> 16:             spcv_env                          folds,features    10
#> 17:          sptcv_cluto                                   folds    10
#> 18:           sptcv_cstf                                   folds    10
#> 19:          subsampling                           repeats,ratio    30

Examples Tasks

Additional example tasks:

Upstream Packages and Scientific References

The following table lists all methods implemented in {mlr3spatiotempcv}, their upstream R package and scientific references.

Literature Package Reference mlr3 Sugar
Spatial Buffering blockCV Valavi et al. (2018) rsmp("spcv_buffer")
Spatial Blocking blockCV Valavi et al. (2018) rsmp("spcv_block")
Spatial CV sperrorest Brenning (2012) rsmp("spcv_coords")
Environmental Blocking blockCV Valavi et al. (2018) rsmp("spcv_env")
- - - rsmp("sptcv_cluto")
Leave-Location-and-Time-Out CAST Meyer et al. (2018) rsmp("sptcv_cstf")
Spatiotemporal Clustering skmeans Zhao and Karypis (2002) rsmp("repeated_sptcv_cluto")




Repeated Spatial Blocking blockCV Valavi et al. (2018) rsmp("repeated_spcv_block")
Repeated Spatial CV sperrorest Brenning (2012) rsmp("repeated_spcv_coords")
Repeated Env Blocking blockCV Valavi et al. (2018) rsmp("repeated_spcv_env")
- - - rsmp("repeated_sptcv_cluto")
Repeated Leave-Location-and-Time-Out CAST Meyer et al. (2018) | rsmp("repeated_sptcv_cstf")
Repeated Spatiotemporal Clustering skmeans Zhao and Karypis (2002) rsmp("repeated_sptcv_cluto")

References

Brenning, Alexander. 2012. “Spatial cross-validation and bootstrap for the assessment of prediction rules in remote sensing: The R package sperrorest.” In 2012 IEEE International Geoscience and Remote Sensing Symposium. IEEE. https://doi.org/10.1109/igarss.2012.6352393.

Meyer, Hanna, Christoph Reudenbach, Tomislav Hengl, Marwan Katurji, and Thomas Nauss. 2018. “Improving Performance of Spatio-Temporal Machine Learning Models Using Forward Feature Selection and Target-Oriented Validation.” Environmental Modelling & Software 101 (March): 1–9. https://doi.org/10.1016/j.envsoft.2017.12.001.

Valavi, Roozbeh, Jane Elith, Jose J. Lahoz-Monfort, and Gurutzeta Guillera-Arroita. 2018. “blockCV: an R package for generating spatially or environmentally separated folds for k-fold cross-validation of species distribution models.” bioRxiv, June. https://doi.org/10.1101/357798.

Zhao, Ying, and George Karypis. 2002. “Evaluation of Hierarchical Clustering Algorithms for Document Datasets.” 11th Conference of Information and Knowledge Management (CIKM), 515–24. http://glaros.dtc.umn.edu/gkhome/node/167.