(blockCV) Repeated "environmental blocking" resampling
Source:R/ResamplingRepeatedSpCVEnv.R
mlr_resamplings_repeated_spcv_env.Rd
Splits data by clustering in the feature space.
See the upstream implementation at blockCV::cv_cluster()
and
Valavi et al. (2018) for further information.
Details
Useful when the dataset is supposed to be split on environmental information which is present in features. The method allows for a combination of multiple features for clustering.
The input of raster images directly as in blockCV::cv_cluster()
is not
supported. See mlr3spatial and its raster DataBackends for such
support in mlr3.
Parameters
folds
(integer(1)
)
Number of folds.features
(character()
)
The features to use for clustering.
repeats
(integer(1)
)
Number of repeats.
References
Valavi R, Elith J, Lahoz-Monfort JJ, Guillera-Arroita G (2018). “blockCV: an R package for generating spatially or environmentally separated folds for k-fold cross-validation of species distribution models.” bioRxiv. doi:10.1101/357798 .
Super class
mlr3::Resampling
-> ResamplingRepeatedSpCVEnv
Active bindings
iters
integer(1)
Returns the number of resampling iterations, depending on the values stored in theparam_set
.
Methods
Method new()
Create an "Environmental Block" repeated resampling instance.
For a list of available arguments, please see blockCV::cv_cluster.
Usage
ResamplingRepeatedSpCVEnv$new(id = "repeated_spcv_env")
Method folds()
Translates iteration numbers to fold number.
Arguments
iters
integer()
Iteration number.
Method repeats()
Translates iteration numbers to repetition number.
Arguments
iters
integer()
Iteration number.
Method instantiate()
Materializes fixed training and test splits for a given task.
Arguments
task
Task
A task to instantiate.
Examples
# \donttest{
if (mlr3misc::require_namespaces(c("sf", "blockCV"), quietly = TRUE)) {
library(mlr3)
task = tsk("ecuador")
# Instantiate Resampling
rrcv = rsmp("repeated_spcv_env", folds = 4, repeats = 2)
rrcv$instantiate(task)
# Individual sets:
rrcv$train_set(1)
rrcv$test_set(1)
intersect(rrcv$train_set(1), rrcv$test_set(1))
# Internal storage:
rrcv$instance
}
#> row_id rep fold
#> <int> <int> <int>
#> 1: 1 1 4
#> 2: 2 1 4
#> 3: 3 1 1
#> 4: 4 1 4
#> 5: 5 1 4
#> ---
#> 1498: 747 2 4
#> 1499: 748 2 4
#> 1500: 749 2 4
#> 1501: 750 2 4
#> 1502: 751 2 4
# }