How to access and use a satellite-derived GHG Level 2 data product using XCO2_EMMA as an example

Topics:

Climate

Tags:

Copernicus, C3S, ECV, greenhouse gas

Level:

Fundamentals

Open on Github

Open on ECMWF JupyterHub

Michael Buchwitz, 20-Nov-2024

logo

How to access and use a satellite-derived GHG Level 2 data product using XCO2_EMMA as an example

This notebook can be run on free online platforms, such as Binder, Kaggle and Colab, or they can be accessed from GitHub. The links to run this notebook in these environments are provided here, but please note they are not supported by ECMWF.

Learning objectives 🎯

This Jupyter Notebook illustrates how to access and use a satellite-derived Greenhouse Gas (GHG) atmospheric carbon dioxide (CO2) Level 2 data product as generated via the Copernicus Climate Change Service (C3S) and made available via the Copernicus Climate Data Store (CDS, https://cds.climate.copernicus.eu/).

This JN shows how to download a data product from the CDS, explains how to access the main variables and how to use them for interesting applications. We focus on two use cases related to the spatial and temporal variation of atmospheric CO2 concentrations and their observational coverage.

The first use case is related to the latitudial distribution of XCO2. We show how XCO2 averages and standard devations per latitude band can be computed and plotted. For this we use two days of observations, one in April and one in September. We explain that the observed latitudinal distributions are closely related to the seasonal cycle of CO2 due to uptake and release of CO2 by vegetation.

For the second use case we show how a map of the spatial distribution of the individual ground pixel XCO2 observations can be generated. We show that the spatial coverage of the daily observations is very sparse due to strict quality filtering of the individual XCO2 retrievals. Most applications therefore require appropriate spatio-temporal averaging (see, for example, the use of C3S GHG XCO2 and XCH4 data for assessments such as Copernicus European State of the Climate (ESOTC) as shown on the GHG concentration climate indicators website https://climate.copernicus.eu/climate-indicators/greenhouse-gas-concentrations).

Prepare your environment

Set up CDSAPI and your credentials

The code below will ensure that the cdsapi package is installed. If you have not setup your ~/.cdsapirc file with your credenials, you can replace None with your credentials that can be found on the how to api page (you will need to log in to see your credentials).

!pip install -q cdsapi
# If you have already setup your .cdsapirc file you can leave this as None
cdsapi_key = None
cdsapi_url = None

(Install and) Import libraries

We will be working with data in NetCDF format. To best handle this data we will use libraries for working with multidimensional arrays, in particular Xarray. We will also need libraries for plotting and viewing data, in this case we will use Matplotlib and Cartopy.

# Import libraries needed for the Jupyter notebook
# Note that lines starting with "#" are comment lines

# Libraries for working with data, especially multidimensional arrays
import numpy as np
import pandas as pd
import xarray as xr

# Library to work with zip-archives and operating system (OS) functions and pattern expansion
import zipfile
import os
from pathlib import Path

# Libraries for plotting and visualising data
import matplotlib.pyplot as plt
import matplotlib as mplt
import cartopy.crs as ccrs
import cartopy.feature as cfeature

# Libraries for style parameters
from pylab import rcParams
import seaborn as sns

# Disable warnings for data download via API
import urllib3
urllib3.disable_warnings()

print('* ... finished importing modules')

Specify data directory

# Directory to store data
# Please ensure that data_dir is a location where you have write permissions
DATADIR = './data_dir/'
# Create this directory if it doesn't exist
os.makedirs(DATADIR, exist_ok=True)

Explore data

Here we illustrate the use of a GHG Level 2 (L2) product using product XCO2_EMMA as an example. A CO2 L2 data product contains CO2 information for individual satellite ground pixels (also called footprints or soundings). XCO2 is the column-averaged dry-air mole fraction of atmospheric CO2 in parts per million (ppm). EMMA is the name of the multi-satellite XCO2 (and XCH4) merging algorithm developed to generate product XCO2_EMMA using as input individual Level 2 XCO2 products from different satellite sensors (here: SCIAMACHY/ENVISAT, GOSAT, GOSAT-2 and OCO-2; see, e.g., Reuter et al., https://amt.copernicus.org/articles/13/789/2020/, 2020). The satellite-derived atmospheric CO2 data products can be found here.

Within C3S also other L2 products had been generated and made available via the CDS. These products are (i) XCO2 from individual satellite sensors, (ii) XCH4 products, where XCH4 is the column-averaged dry-air mole fraction of CH4 in parts per billion (ppb), and (iii) mid tropospheric CO2 and CH4 mixing ratio products. Detailed information on all these products is available via the CDS. In addition to L2 products, also Level 3 (L3) products are available. A L3 product is based on a corresponding L2 product. A L3 product is obtained by spatio-temporally averaging a corresponding L2 product. How to access and use a L3 GHG product is shown in a separate Jupyter Notebook (JN).

For this JN we use XCO2_EMMA version 4.5, which covers the period 2003 - 2022.

Search for the data

Having selected the correct dataset, we now need to specify what product type, variables, temporal and geographic coverage we are interested in. For this JN we want to access data product XCO2_EMMA, version 4.5, for these two days:

File 1: 15-April-2020
File 2: 15-September-2020

These parameters can all be selected in the “Download data” tab. In this tab a form appears in which we will select the following parameters to download:

:::{dropdown} Parameters of data to download

Processing level: Level 2
Variable: Column-average dry-air mole fraction of atmospheric Carbon Dioxide (XCO2) and related variables
Sensor and algorithm: MERGED and EMMA
Year: 2020
Month: April and September
Day: 15
Version: 4.5

:::

At the end of the download form, select “Show API request”. This will reveal a block of code, which you can simply copy and paste into a cell of your Jupyter Notebook (see cell below). Having copied the API request into the cell below, running this will retrieve and download the data you requested into your local directory.

:::{warning}

Please remember to accept the terms and conditions of the dataset, at the bottom of the CDS download form!

:::

Download the data

Check if the two data files have already been downloaded

product_id = 'XCO2_EMMA'  # keep

# Select one version (by commenting in/out):
#product_version = 'v4.4'; product_version_str2 = '4_4'
product_version = 'v4.5'; product_version_str2 = '4_5'

# We want to get 2 daily data files:
year_1 = 2020; day_1 = 15; month_1 = 4
year_2 = 2020; day_2 = 15; month_2 = 9

product_str = product_id+' '+product_version

print('* selected product: ', product_str)

# Generate corresponding time strings:
# ... for the first day:
year_1_str = str(year_1); day_1_str = str(day_1); month_1_str = str(month_1)
if day_1 < 10:
   day_1_str = '0'+day_1_str
if month_1 < 10:
   month_1_str = '0'+month_1_str
# ... for the second day:
year_2_str = str(year_2); day_2_str = str(day_2); month_2_str = str(month_2)
if day_2 < 10:
   day_2_str = '0'+day_2_str
if month_2 < 10:
   month_2_str = '0'+month_2_str

# Combine to generate date strings:
date_1_str = year_1_str+month_1_str+day_1_str
date_2_str = year_2_str+month_2_str+day_2_str
print('* Selected day 1: ', date_1_str)
print('* Selected day 2: ', date_2_str)

# Names of the two desired data files:
wanted_file_1 = date_1_str+'-C3S-L2_XCO2-GHG_PRODUCTS-MERGED-MERGED-EMMA-DAILY-'+product_version+'.nc'
wanted_file_2 = date_2_str+'-C3S-L2_XCO2-GHG_PRODUCTS-MERGED-MERGED-EMMA-DAILY-'+product_version+'.nc'

wanted_path_file_1 = DATADIR/wanted_file_1
wanted_path_file_2 = DATADIR/wanted_file_2
print('* Wanted file 1: ', wanted_path_file_1)
print('* Wanted file 2: ', wanted_path_file_2)

# Check if desired data files have already been downloaded:
product_already_downloaded = 'no'
if (os.path.exists(wanted_path_file_1) == True):
   if (os.path.exists(wanted_path_file_2) == True):
      product_already_downloaded = 'yes'
print('* Product files already downloaded?: ', product_already_downloaded)

If they haven’t, download the data

With the API request copied into the cells below, running these cells will retrieve and download the data you requested into your local directory.

if product_already_downloaded == 'no':

   print('* downloading data via cdsapi ...')
   
   # The following code has been generated (apart from minor modifications)
   # via "Show API request code" as explained above:
    
   import cdsapi
    
   dataset = "satellite-carbon-dioxide"
   request = {
         "processing_level": ["level_2"],
         "variable": "xco2",
         "sensor_and_algorithm": "merged_emma",
         "year": ["2020"],
         "month": ["04", "09"],
         "day": ["15"],
         #"version": ["4_5"]
         "version": [product_version_str2]
   }
   client = cdsapi.Client()
   # Generates download.zip file in current directory:
   client.retrieve(dataset, request).download('download.zip')
   # Unzip:
   path_to_zip_file        = 'download.zip'
   with zipfile.ZipFile(path_to_zip_file, 'r') as zip_ref:
      zip_ref.extractall(DATADIR)
    
   print('* ... finished downloading data via cdsapi') 
   print('* data are stored in directory: ', DATADIR)
    
else:
   print('* no data download via cdsapi as files already exist')

Inspect data

Now we are reading the two data files:

# Read wanted files:
wanted_files_OK = 'no'
print('* trying to read file 1: ', wanted_path_file_1)
if (os.path.exists(wanted_path_file_1) == True):
   print('* reading file 1 ...')
   ds_1 = xr.open_dataset(wanted_path_file_1)
   print('* trying to read file 2: ', wanted_path_file_2)
   if (os.path.exists(wanted_path_file_2) == True):
      print('* reading file 2 ...')
      ds_2 = xr.open_dataset(wanted_path_file_2)
      wanted_files_OK = 'yes'
   else:
      print('* ERROR: file 2 does not exist !?')
else:
   print('* ERROR: file 1 does not exist !?')
 
# Read variables of interest:
if wanted_files_OK == 'yes':
   print('* ... reading files OK !')
   f1_xco2    = ds_1['xco2']              # XCO2 in ppm
   f1_xco2_qf = ds_1['xco2_quality_flag'] # 0 = good
   f1_lat     = ds_1['latitude']          # latitude [deg]
   f1_lon     = ds_1['longitude']         # longitude [deg]
   f2_xco2    = ds_2['xco2']              # XCO2 in ppm
   f2_xco2_qf = ds_2['xco2_quality_flag'] # 0 = good
   f2_lat     = ds_2['latitude']          # latitude [deg]
   f2_lon     = ds_2['longitude']         # longitude [deg]
   
   # Filter by quality flag:
   (idgd_1) = (f1_xco2_qf.values == 0).nonzero() # Get indices of good quality data for day 1
   (idgd_2) = (f2_xco2_qf.values == 0).nonzero() # Get indices of good quality data for day 2
   g_1_xco2 = f1_xco2[(idgd_1)]                  # Select sub-set of good data for day 1
   g_1_lat  = f1_lat[(idgd_1)]
   g_1_lon  = f1_lon[(idgd_1)]
   g_2_xco2 = f2_xco2[(idgd_2)]                  # Select sub-set of good data for day 2
   g_2_lat  = f2_lat[(idgd_2)]
   g_2_lon  = f2_lon[(idgd_2)]
else:
   print('* ERROR: Problem when reading input files !?')

Setting default plot style parameters

Here we set some parameters which determine the style of the generated figures.

# The following style parameters will be used for all plots in this use case.
rcParams['figure.figsize'] = [15,5]
rcParams['figure.dpi'] = 350
#rcParams['font.family'] = 'serif'
rcParams['font.family'] = 'Arial'
#rcParams['font.serif'] = mplt.rcParamsDefault['font.serif']
#rcParams['mathtext.rm'] = 'serif:light'
#rcParams['mathtext.it'] = 'serif:italic'
#rcParams['mathtext.bf'] = 'serif:bold'
rcParams['mathtext.default'] = 'regular'
plt.rc('font', size=17)          # controls default text sizes
plt.rc('axes', titlesize=17)     # fontsize of the axes title
plt.rc('axes', labelsize=17)     # fontsize of the x and y labelsize
plt.rc('xtick', labelsize=15)    # fontsize of the tick labels
plt.rc('ytick', labelsize=15)    # fontsize of the tick labels
plt.rc('legend', fontsize=10)    # fontsize of the legend
plt.rc('figure', titlesize=18)   # fontsize of the figure title

projection = ccrs.PlateCarree()
mplt.rc('xtick', labelsize=9)
mplt.rc('ytick', labelsize=9)

print('* ... finished setting plot style parameters')

Application 1: the latitudinal distribution of XCO2

Here our intention is to plot XCO2 as a function of latitude. For this we first need to define the latitude bands.

Definition of latitude bands

print('* Computing latitude band center coordinates')
lat_band_width = 10.0           # Width of single latitude band in deg
d_lat = lat_band_width*0.5
lat_band_min = -90.0 + d_lat
lat_band_max = 90.0 - d_lat     # Width of entire latitude range
n_lat_bands = int((lat_band_max - lat_band_min) / lat_band_width)+1
lat_band_center = np.zeros(n_lat_bands)
for ii in range(n_lat_bands):   # Generate array of latitude band centers
   lat_band_center[ii] = lat_band_min + ii*lat_band_width
print('* lat_band_center: ', lat_band_center)

Computing XCO2 vs latitude for the two selected days

Now we compute for each latitude band the mean value and the standard deviation of XCO2 for both days:

# For file 1 data:
if wanted_files_OK == 'yes':
   print('* Computing XCO2 as a function of latitude for the data from file 1')
   # Relevant input data:
   lat_1 = g_1_lat
   ghg_1 = g_1_xco2
   # Init arrays:
   mean_ghg_1 = lat_band_center.copy() * 0.0
   std_ghg_1 = lat_band_center.copy() * 0.0 -999 # < 0 for too few data
   # Compute mean and standard deviation for each latitude band:
   for ii in range(n_lat_bands):
      act_lat = lat_band_center[ii]
      idg = np.where((lat_1 >= act_lat-d_lat) & (lat_1 < act_lat+d_lat))
      sel_1 = ghg_1[idg]; len_sel_1 = len(sel_1)
      #print('* len_sel_1: ', len_sel_1)
      if len_sel_1 > 1:
         mean_ghg_1[ii] = np.mean(sel_1)
         std_ghg_1[ii] = np.std(sel_1)
   idg = (std_ghg_1 > 0.0)
   lb_1_xco2_mean = mean_ghg_1[idg]
   lb_1_xco2_std = std_ghg_1[idg]
   lb_1_lat = lat_band_center[idg]
   print('* lb_1_xco2_mean: ', lb_1_xco2_mean)

# For file 2 data:
if wanted_files_OK == 'yes':
   print('* Computing XCO2 as a function of latitude for the data from file 2')
   # Relevant input data:
   lat_2 = g_2_lat
   ghg_2 = g_2_xco2
   # Init arrays:
   mean_ghg_2 = lat_band_center.copy() * 0.0
   std_ghg_2 = lat_band_center.copy() * 0.0 -999 # < 0 for too few data
   # Compute mean and standard deviation for each latitude band:
   for ii in range(n_lat_bands):
      act_lat = lat_band_center[ii]
      idg = np.where((lat_2 >= act_lat-d_lat) & (lat_2 < act_lat+d_lat))
      sel_2 = ghg_2[idg]; len_sel_2 = len(sel_2)
      #print('* len_sel_2: ', len_sel_2)
      if len_sel_2 > 1:
         mean_ghg_2[ii] = np.mean(sel_2)
         std_ghg_2[ii] = np.std(sel_2)
   idg = (std_ghg_2 > 0.0)
   lb_2_xco2_mean = mean_ghg_2[idg]
   lb_2_xco2_std = std_ghg_2[idg]
   lb_2_lat = lat_band_center[idg]
   print('* lb_2_xco2_mean: ', lb_2_xco2_mean)

Generation of XCO2 vs latitude plot

Now we generate the desired plot:

# Generation of xy plot showing XCO2 vs latitude

if wanted_files_OK == 'yes':
   # Plot data:
   figsize = (8,6)                 # Figure size
   fig = plt.figure(figsize=figsize)
   pos = [0.09,0.11,0.88,0.83]     # position (left,bottom,width,height) in page coordinates
   ax  = fig.add_axes(pos)
   xmin = 404.0                    # x axis range of plot
   xmax = 420.0
   ymin = -60.0                    # y axis range of plot
   ymax =  80.0
   ax.axis([xmin, xmax, ymin, ymax])
   file_1_date2 = date_1_str
   file_2_date2 = date_2_str
   # -----------------------------------
   # plot day 1 data:
   ax.plot(lb_1_xco2_mean, lb_1_lat, linewidth=6.0, color='red', zorder=39, label = 'Mean+/-Stddev '+file_1_date2)
   if 1 == 0:
      ax.plot(lb_1_xco2_mean-lb_1_xco2_std, lb_1_lat, linewidth=2.0, color='red', zorder=38)
      ax.plot(lb_1_xco2_mean+lb_1_xco2_std, lb_1_lat, linewidth=2.0, color='red', zorder=38)
   else:
      nn_1 = len(lb_1_xco2_mean)
      for ii in range(nn_1):
         yy = lb_1_lat[ii]
         x1 = lb_1_xco2_mean[ii]-lb_1_xco2_std[ii]
         x2 = lb_1_xco2_mean[ii]+lb_1_xco2_std[ii]
         ax.plot([x1, x2], [yy, yy], linewidth=20.0, color='red', zorder=38, alpha=0.3)
   # -----------------------------------
   # plot day 2 data:
   ax.plot(lb_2_xco2_mean, lb_2_lat, linewidth=6.0, color='black', zorder=49, label = 'Mean+/-Stddev '+file_2_date2)
   if 1 == 0:
      ax.plot(lb_2_xco2_mean-lb_2_xco2_std, lb_2_lat, linewidth=2.0, color='black', zorder=48)
      ax.plot(lb_2_xco2_mean+lb_2_xco2_std, lb_2_lat, linewidth=2.0, color='black', zorder=48)
   else:
      nn_2 = len(lb_2_xco2_mean)
      for ii in range(nn_2):
         yy = lb_2_lat[ii]
         x1 = lb_2_xco2_mean[ii]-lb_2_xco2_std[ii]
         x2 = lb_2_xco2_mean[ii]+lb_2_xco2_std[ii]
         ax.plot([x1, x2], [yy, yy], linewidth=15.0, color='black', zorder=48, alpha=0.3)
   # -----------------------------------
   title = product_str
   x_label = 'XCO$_2$ [ppm]'
   y_label = 'Latitude [deg]'
   plot_title = product_str+': XCO$_2$ vs latitude'
   plt.title(plot_title, fontsize=20)
   plt.xlabel(x_label, fontsize=16); plt.ylabel(y_label, fontsize=16)
   plt.legend(loc='upper left')
   plot_type = 'png'
   if plot_type == 'png':
      o_file_plot = product_id+'_'+product_version+'_latitude.png'
      print('* generating: ', o_file_plot)
      plt.savefig(o_file_plot, dpi=600)
   else:
      plt.show()
else:
   print('* ERROR: Cannot generate XCO2 vs latitude plot!?')

The figure above shows mean value of XCO2 as a function of latitude (thick lines) and the corresponding variation (computed as standard deviation of the individual XCO2 retrievals) within each latitude band (semi-transparent horizontal bars) for 15-April-2020 (red) and 15-September-2020 (black). As can be seen, during September XCO2 shows quite little variation with latitude, whereas in April XCO2 is significantly higher over the northern hemisphere (NH) compared to the southern hemisphere (SH). This is due to the seasonal cycle of CO2 primarily resulting from regular uptake and release of atmospheric CO2 by growing and decaying vegetation (photosythesis and respiration). Vegetation uptake reduces the CO2 concentration over the NH during the growing season (spring and summer) compared to the dormant season (winter). Over the SH there is less vegetation and therefore CO2 is relatively constant (apart from the general increase due to CO2 emissions by burning fossil fuels).

Application 2: showing daily data on a global map

Here we show how to generate a plot showing the spatial distribution of the data. As can be seen, we use only three variables: XCO2, latitude and longitude. We define a small region in terms of latitude and longitude corner coordinates and select only data in this region for the plot.

# Generate map:
print('* Generating XCO2 map ...')
if wanted_files_OK == 'yes':
   # Relevant input data:
   lat = g_2_lat.values
   lon = g_2_lon.values
   ghg = g_2_xco2.values
   file_2_date2 = date_2_str
   plot_title = product_str+' - '+file_2_date2
   # Define spatial region of interest:
   lonmin = -12.0         # Longitude range
   lonmax =  35.0
   latmin =  32.0         # Latitude range
   latmax =  65.0
   # Select data to be shown on map:
   idg = np.nonzero((lat > latmin) & (lat < latmax) & (lon > lonmin) & (lon < lonmax))
   lat_sel = lat[idg]
   lon_sel = lon[idg]
   ghg_sel = ghg[idg]
   n_data = len(ghg_sel)
   print('* Number of ground pixel for map: ', n_data)
   rmin  = np.min(ghg_sel)       # Color bar / scale range
   rmax  = np.max(ghg_sel)
   
   projection = ccrs.PlateCarree()
   figsize = (9,7)
   fig = plt.figure(figsize=figsize)
   pos = [0.05,0.00,0.95,0.90]     # pos (l,b,w,h) in page coord
   ax1 = fig.add_axes(pos, projection=projection)
   ax1.set_extent([lonmin, lonmax, latmin, latmax], crs=projection)
   ax1.add_feature(cfeature.OCEAN, color='powderblue')
   ax1.add_feature(cfeature.COASTLINE)
   ax1.add_feature(cfeature.BORDERS, alpha=0.5)
   #x_label = 'Latitude [deg]'
   #y_label = 'Longitude [deg]'
   
   ax1.set_title(plot_title, fontsize=18)
   drawmeridians_label = True
   gl = ax1.gridlines(crs=projection, draw_labels=drawmeridians_label, linewidth=1, color='gray', alpha=0.2)
   gl.top_labels  = False
   plt.scatter(lon_sel, lat_sel, c=ghg_sel, s=20, zorder=10, alpha=0.8, cmap='viridis', vmin=rmin, vmax=rmax)
   #anchor=(0.5, 1.0) # default
   anchor=(0.5, 1.7)  # shift upwards
   cb = plt.colorbar(label='XCO$_2$ [ppm]', location='bottom', extend='both', shrink=0.5, anchor=anchor)
   #cb = plt.colorbar(label='XCO$_2$ [ppm]', location='right', shrink=0.3)
   plot_type = 'png'
   if plot_type == 'png':
      o_file_plot = product_id+'_'+product_version+'_map.png'
      print('* generating: ', o_file_plot)
      plt.savefig(o_file_plot, dpi=600)
   else:
      plt.show()
else:
   print('* ERROR: Cannot generate XCO2 map plot!?')

The figure above shows the locations of the individual ground pixel observations and their corresponding XCO2 value using all “good” retrievals over Europe and surrounding area for 15-Sept-2020. As can be seen, the spatial coverage of the daily data is very sparse. This is because only data of the highest quality are contained in the product file. Strict quality filtering is important to meet the demanding requirements on accuracy and precision for satellite XCO2 data observations.

Jupyterhub link

https://jupyterhub.ecmwf.int/hub/user-redirect/lab/tree/notebooks-library/c3s-training/sat-obs-atmos-comp/greenhouse-gases-level2-products.ipynb?profile=dss-singleuser