(CAST) Repeated spatiotemporal "leave-location-and-time-out" resampling
Source:R/ResamplingRepeatedSptCVCstf.R
mlr_resamplings_repeated_sptcv_cstf.Rd
Splits data using Leave-Location-Out (LLO), Leave-Time-Out (LTO) and
Leave-Location-and-Time-Out (LLTO) partitioning.
See the upstream implementation at CreateSpacetimeFolds()
(package CAST) and Meyer et al. (2018) for further information.
Details
LLO predicts on unknown locations i.e. complete locations are left out in the
training sets.
The "space"
role in Task$col_roles
identifies spatial units.
If stratify
is TRUE
, the target distribution is similar in each fold.
This is useful for land cover classification when the observations
are polygons.
In this case, LLO with stratification should be used to hold back complete
polygons and have a similar target distribution in each fold.
LTO leaves out complete temporal units which are identified by the
"time"
role in Task$col_roles
.
LLTO leaves out spatial and temporal units.
See the examples.
Parameters
folds
(integer(1)
)
Number of folds.stratify
IfTRUE
, stratify on the target column.
repeats
(integer(1)
)
Number of repeats.
References
Zhao Y, Karypis G (2002). “Evaluation of Hierarchical Clustering Algorithms for Document Datasets.” 11th Conference of Information and Knowledge Management (CIKM), 51-524. doi:10.1145/584792.584877 .
Super class
mlr3::Resampling
-> ResamplingRepeatedSptCVCstf
Active bindings
iters
integer(1)
Returns the number of resampling iterations, depending on the values stored in theparam_set
.
Methods
Method new()
Create a "Spacetime Folds" resampling instance.
Usage
ResamplingRepeatedSptCVCstf$new(id = "repeated_sptcv_cstf")
Method folds()
Translates iteration numbers to fold number.
Arguments
iters
integer()
Iteration number.
Method repeats()
Translates iteration numbers to repetition number.
Arguments
iters
integer()
Iteration number.
Method instantiate()
Materializes fixed training and test splits for a given task.
Arguments
task
mlr3::Task
A task to instantiate.
Examples
# \donttest{
library(mlr3)
task = tsk("cookfarm_mlr3")
task$set_col_roles("SOURCEID", roles = "space")
task$set_col_roles("Date", roles = "time")
# Instantiate Resampling
rcv = rsmp("repeated_sptcv_cstf", folds = 5, repeats = 2)
rcv$instantiate(task)
### Individual sets:
# rcv$train_set(1)
# rcv$test_set(1)
# check that no obs are in both sets
intersect(rcv$train_set(1), rcv$test_set(1)) # good!
#> integer(0)
# Internal storage:
# rcv$instance # table
# }