29  Developing new functions in SITS

29.1 General principles

New functions that build on the sits API should follow the general principles below.

  • The target audience for sits is the community of remote sensing experts with Earth Sciences background who want to use state-of-the-art data analysis methods with minimal investment in programming skills. The design of the sits API considers the typical workflow for land classification using satellite image time series and thus provides a clear and direct set of functions, which are easy to learn and master.

  • For this reason, we welcome contributors that provide useful additions to the existing API, such as new ML/DL classification algorithms. In case of a new API function, before making a pull request please raise an issue stating your rationale for a new function.

  • Most functions in sits use the S3 programming model with a strong emphasis on generic methods wich are specialized depending on the input data type. See for example the implementation of the sits_bands() function.

  • Please do not include contributed code using the S4 programming model. Doing so would break the structure and the logic of existing code. Convert your code from S4 to S3.

  • Use generic functions as much as possible, as they improve modularity and maintenance. If your code has decision points using if-else clauses, such as if A, do X; else do Y consider using generic functions.

  • Functions that use the torch package use the R6 model to be compatible with that package. See for example, the code in sits_tempcnn.R and api_torch.R. To convert pyTorch code to R and include it is straightforward. Please see the Technical Annex of the sits on-line book.

  • The sits code relies on the packages of the tidyverse to work with tables and list. We use dplyr and tidyr for data selection and wrangling, purrr and slider for loops on lists and table, lubridate to handle dates and times.

29.2 Adherence to the sits data types

The sits package in built on top of three data types: time series tibble, data cubes and models. Most sits functions have one or more of these types as inputs and one of them as return values. The time series tibble contains data and metadata. The first six columns contain the metadata: spatial and temporal information, the label assigned to the sample, and the data cube from where the data has been extracted. The time_series column contains the time series data for each spatiotemporal location. All time series tibbles are objects of class sits.

The cube data type is designed to store metadata about image files. In principle, images which are part of a data cube share the same geographical region, have the same bands, and have been regularized to fit into a pre-defined temporal interval. Data cubes in sits are organized by tiles. A tile is an element of a satellite’s mission reference system, for example MGRS for Sentinel-2 and WRS2 for Landsat. A cube is a tibble where each row contains information about data covering one tile. Each row of the cube tibble contains a column named file_info; this column contains a list that stores a tibble

The cube data type is specialised in raster_cube (ARD images), vector_cube (ARD cube with segmentation vectors). probs_cube (probabilities produced by classification algorithms on raster data), probs_vector_cube(probabilites generated by vector classification of segments), uncertainty_cube (cubes with uncertainty information), and class_cube (labelled maps). See the code in sits_plot.R as an example of specialisation of plot to handle different classes of raster data.

All ML/DL models in sits which are the result of sits_train belong to the ml_model class. In addition, models are assigned a second class, which is unique to ML models (e.g, rfor_model, svm_model) and generic for all DL torch based models (torch_model). The class information is used for plotting models and for establishing if a model can run on GPUs.

29.3 Literal values, error messages, and testing

The internal sits code has no literal values, which are all stored in the YAML configuration files ./inst/extdata/config.yml and ./inst/extdata/config_internals.yml. The first file contains configuration parameters that are relevant to users, related to visualisation and plotting; the second contains parameters that are relevant only for developers. These values are accessible using the .conf function. For example, the value of the default size for ploting COG files is accessed using the command .conf["plot", "max_size"].

Error messages are also stored outside of the code in the YAML configuration file ./inst/extdata/config_messages.yml. These values are accessible using the .conf function. For example, the error associated to an invalid NA value for an input parameter is accessible using th function .conf("messages", ".check_na_parameter").

We strive for high code coverage (> 90%). Every parameter of all sits function (including internal ones) is checked for consistency. Please see api_check.R.