29 Developing new functions in SITS
29.1 General principles
New functions that build on the sits
API should follow the general principles below.
The target audience for
sits
is the community of remote sensing experts with Earth Sciences background who want to use state-of-the-art data analysis methods with minimal investment in programming skills. The design of thesits
API considers the typical workflow for land classification using satellite image time series and thus provides a clear and direct set of functions, which are easy to learn and master.For this reason, we welcome contributors that provide useful additions to the existing API, such as new ML/DL classification algorithms. In case of a new API function, before making a pull request please raise an issue stating your rationale for a new function.
Most functions in
sits
use the S3 programming model with a strong emphasis on generic methods wich are specialized depending on the input data type. See for example the implementation of thesits_bands()
function.Please do not include contributed code using the S4 programming model. Doing so would break the structure and the logic of existing code. Convert your code from S4 to S3.
Use generic functions as much as possible, as they improve modularity and maintenance. If your code has decision points using
if-else
clauses, such asif A, do X; else do Y
consider using generic functions.Functions that use the
torch
package use the R6 model to be compatible with that package. See for example, the code insits_tempcnn.R
andapi_torch.R
. To convertpyTorch
code to R and include it is straightforward. Please see the Technical Annex of the sits on-line book.The sits code relies on the packages of the
tidyverse
to work with tables and list. We usedplyr
andtidyr
for data selection and wrangling,purrr
andslider
for loops on lists and table,lubridate
to handle dates and times.
29.2 Adherence to the sits
data types
The sits
package in built on top of three data types: time series tibble, data cubes and models. Most sits
functions have one or more of these types as inputs and one of them as return values. The time series tibble contains data and metadata. The first six columns contain the metadata: spatial and temporal information, the label assigned to the sample, and the data cube from where the data has been extracted. The time_series column contains the time series data for each spatiotemporal location. All time series tibbles are objects of class sits
.
The cube
data type is designed to store metadata about image files. In principle, images which are part of a data cube share the same geographical region, have the same bands, and have been regularized to fit into a pre-defined temporal interval. Data cubes in sits
are organized by tiles. A tile is an element of a satellite’s mission reference system, for example MGRS for Sentinel-2 and WRS2 for Landsat. A cube
is a tibble where each row contains information about data covering one tile. Each row of the cube tibble contains a column named file_info
; this column contains a list that stores a tibble
The cube
data type is specialised in raster_cube
(ARD images), vector_cube
(ARD cube with segmentation vectors). probs_cube
(probabilities produced by classification algorithms on raster data), probs_vector_cube
(probabilites generated by vector classification of segments), uncertainty_cube
(cubes with uncertainty information), and class_cube
(labelled maps). See the code in sits_plot.R
as an example of specialisation of plot
to handle different classes of raster data.
All ML/DL models in sits
which are the result of sits_train
belong to the ml_model
class. In addition, models are assigned a second class, which is unique to ML models (e.g, rfor_model
, svm_model
) and generic for all DL torch
based models (torch_model
). The class information is used for plotting models and for establishing if a model can run on GPUs.
29.3 Literal values, error messages, and testing
The internal sits
code has no literal values, which are all stored in the YAML configuration files ./inst/extdata/config.yml
and ./inst/extdata/config_internals.yml
. The first file contains configuration parameters that are relevant to users, related to visualisation and plotting; the second contains parameters that are relevant only for developers. These values are accessible using the .conf
function. For example, the value of the default size for ploting COG files is accessed using the command .conf["plot", "max_size"]
.
Error messages are also stored outside of the code in the YAML configuration file ./inst/extdata/config_messages.yml
. These values are accessible using the .conf
function. For example, the error associated to an invalid NA value for an input parameter is accessible using th function .conf("messages", ".check_na_parameter")
.
We strive for high code coverage (> 90%). Every parameter of all sits
function (including internal ones) is checked for consistency. Please see api_check.R
.