Earth observation data cubes

Analysis-ready data collections

Analysis Ready Data (CEOS-ARD) are satellite data that have been processed to meet the ARD standards defined by the Committee on Earth Observation Satellites (CEOS). ARD data simplify and accelerate the analysis of Earth observation data by providing consistent and high-quality images that are standardized across different sensors and platforms. ARD image processing includes geometric, radiometric, and atmospheric corrections. Images are georeferenced, meaning they are accurately aligned with a coordinate system. Optical ARD images include cloud and shadow masking information. These masks indicate which pixels are affected by clouds or cloud shadows. For optical sensors, CEOS-ARD images are converted to surface reflectance values, which represent the fraction of light that is reflected by the surface. This makes the data more comparable across different times and locations. For SAR images, CEOS-ARD specification require images to undergo Radiometric Terrain Correction (RTC) and be provided in the gamma nought (\(\gamma_0\)) backscatter values. This value which mitigates the variations from diverse observation geometries and is recommended for most land applications.

ARD images are available from various satellite platforms, including Landsat, Sentinel, and commercial satellites. This provides a wide range of spatial, spectral, and temporal resolutions to suit different applications. They are organized as a collection of files, where each pixel contains a single value for each spectral band for a given date. These collections are available in cloud services such as Brazil Data Cube, Digital Earth Africa, and Microsoft’s Planetary Computer. In general, the timelines of the images in an ARD collection are different. Images may still contain cloudy or missing pixels, and bands for the images in the collection may have different resolutions. Figure 1 shows an example of the Landsat ARD image collection.

Figure 1: ARD image collection (source: USGS).

Regular Earth observation data cubes

Machine learning and deep learning (ML/DL) classification algorithms require the input data to be consistent. The dimensionality of the data used for training the model has to be the same as that of the data to be classified. There should be no gaps and no missing values. Thus, to use ML/DL algorithms for remote sensing data, ARD image collections should be converted to regular data cubes.

A regular EO data cube is a partition of the Earth’s surface which covers the area in periodic intervals. Regular data cubes are derived by ARD images by filling coverage gaps, accounting for cloud coverage, and reprocessing irregular temporal coverage to regular periods. Also, regular data cubes can cover more than a single coordinate projection zone, so as to be defined in large areas.

In sits, regular data cubes have the following properties:

  1. A regular data cube is a set of three-dimensional arrays with two spatial dimensions and one temporal dimension. Each array contains cells which have the same spatial resolution, the same temporal duration, and the same set of attributes (spectral bands and/or indices).
  2. The number of dimensions is fixed (2D + time), while the number of attributes is not limited.
  3. The spatial dimensions are associated to a coordinate projection system, such as the MGRS grid (Military Grid Reference System). A tile of the grid corresponds to a unique zone of the coordinate system. A data cube may span various tiles and projection zones.
  4. The temporal dimension is a set of continuous and equally spaced intervals.
  5. For each position in space, the cube should provide a multi-attribute time series. For each time interval, the cube should provide a multi-attribute 2D image (see Figure 2).
Figure 2: Conceptual view of a data cube.

The definition of data cubes in sits differs from an earlier proposal by Appel and Pebesma [1] which has been used in the R package stars and in the openEO system. Appel and Pebesma consider that a data cube will only cover a single tile of a coordinate system (e.g., MGRS). In sits, this restriction has been removed and data cubes many span multiple tiles.

Currently, the only cloud service that provides regular data cubes by default is the Brazil Data Cube (BDC). ARD collections available in other cloud services are not regular in space and time. Bands may have different resolutions, images may not cover the entire timeline, and time intervals may be irregular. For this reason, subsets of these collections need to be converted into regular data cubes before further processing. To produce data cubes for machine learning data analysis, this part of the book describes the steps involved in producing and using regular data cubes:

  • Obtaining data from ARD image collections
  • Producing regular data cubes from single- and multi-source data
  • Recovering data cubes from local files
  • Performing operations on data cubes

References

[1]
M. Appel and E. Pebesma, “On-Demand Processing of Data Cubes from Satellite Image Collections with the gdalcubes Library,” Data, vol. 4, no. 3, 2019, doi: 10.3390/data4030092.