This dataset was generated to learn the radiative heating process using machine learning. More specifically, the data captures the inputs and outputs of the ecRAD radiation scheme (Hogan & Bozzo 2018) using the Tripleclouds solver (Shonk & Hogan 2008). The goal is to learn a more efficient version of the radiation scheme using neural networks or other machine learning algorithm.
This dataset was generated as part of the MAELSTROM project, which seeks to create machine learning tools across applications in weather and climate science. The project will create benchmark datasets and solutions, develop software workflows and explore the best hardware for each project.
Further details can be found in the MAELSTROM documentation for this dataset, part 3.3. This includes motivation, background reading and descriptors of the variables in the dataset.
The dataset can be downloaded using the python tool CliMetLab, which utilises xarray.
!pip install climetlab climetlab_maelstrom_radiation
import climetlab as cml
cml_ds = cml.load_dataset("maelstrom-radiation", subset="tier-1")
ds = cml_ds.to_xarray()
This will download a small dataset with which to explore the data structure. Using subset="2020" will download the entire suggested training dataset, WARNING this is very large O(1Tb).
For machine learning purposes, it is suggested to use the "maelstrom-radiation-tf" dataset which provides a TensorFlow Dataset access to the dataset, pre-shuffled.