(blockCV) Spatial buffering resampling
Source:R/ResamplingSpCVBuffer.R
mlr_resamplings_spcv_buffer.Rd
This function generates spatially separated train and test folds by considering buffers of
the specified distance (size
parameter) around each observation point.
This approach is a form of leave-one-out cross-validation. Each fold is generated by excluding
nearby observations around each testing point within the specified distance (ideally the range of
spatial autocorrelation, see cv_spatial_autocor
). In this method, the testing set never
directly abuts a training sample (e.g. presence or absence; 0s and 1s). For more information see the details section.
Details
When working with presence-background (presence and pseudo-absence) species distribution
data (should be specified by presence_bg = TRUE
argument), only presence records are used
for specifying the folds (recommended). Consider a target presence point. The buffer is defined around this target point,
using the specified range (size
). By default, the testing fold comprises only the target presence point (all background
points within the buffer are also added when add_bg = TRUE
).
Any non-target presence points inside the buffer are excluded.
All points (presence and background) outside of buffer are used for the training set.
The methods cycles through all the presence data, so the number of folds is equal to
the number of presence points in the dataset.
For presence-absence data (and all other types of data), folds are created based on all records, both
presences and absences. As above, a target observation (presence or absence) forms a test point, all
presence and absence points other than the target point within the buffer are ignored, and the training
set comprises all presences and absences outside the buffer. Apart from the folds, the number
of training-presence, training-absence, testing-presence and testing-absence
records is stored and returned in the records
table. If column = NULL
and presence_bg = FALSE
,
the procedure is like presence-absence data. All other data types (continuous, count or multi-class responses) should be
done by presence_bg = FALSE
.
mlr3spatiotempcv notes
The 'Description' and 'Details' fields are inherited from the respective upstream function. For a list of available arguments, please see blockCV::cv_buffer.
blockCV
>= 3.0.0 changed the argument names of the implementation. For backward compatibility, mlr3spatiotempcv
is still using the old ones.
Here's a list which shows the mapping between blockCV
< 3.0.0 and blockCV
>= 3.0.0:
theRange
->size
addBG
->add_bg
spDataType
(character vector) ->presence_bg
(boolean)
References
Valavi R, Elith J, Lahoz-Monfort JJ, Guillera-Arroita G (2018). “blockCV: an R package for generating spatially or environmentally separated folds for k-fold cross-validation of species distribution models.” bioRxiv. doi:10.1101/357798 .
Super class
mlr3::Resampling
-> ResamplingSpCVBuffer
Active bindings
iters
integer(1)
Returns the number of resampling iterations, depending on the values stored in theparam_set
.
Methods
Method new()
Create an "Environmental Block" resampling instance.
For a list of available arguments, please see
blockCV::cv_buffer()
.
Usage
ResamplingSpCVBuffer$new(id = "spcv_buffer")
Method instantiate()
Materializes fixed training and test splits for a given task.
Arguments
task
Task
A task to instantiate.
Examples
# \donttest{
if (mlr3misc::require_namespaces(c("sf", "blockCV"), quietly = TRUE)) {
library(mlr3)
task = tsk("ecuador")
# Instantiate Resampling
rcv = rsmp("spcv_buffer", theRange = 10000)
rcv$instantiate(task)
# Individual sets:
rcv$train_set(1)
rcv$test_set(1)
intersect(rcv$train_set(1), rcv$test_set(1))
# Internal storage:
# rcv$instance
}
#> numeric(0)
# }