pyku.drs#

Data Reference Syntax (DRS) module

Resources:

The CMIP5 CORDEX standard has ambiguities regarding variable names. Appendix A of the NetCDF header examples excludes the variable name from both mandatory and optional metadata. While VariableName is listed in the Controlled Vocabulary, suggesting it could be a mandatory global attribute, but no standard key is defined. Consequently, most existing datasets omit the variable name. In this context, ‘variable_name’ is automatically derived from the dataset’s climate data, even when it is not explicitly required by the standard.

In contrast, newer standards like obs4mips explicitly include the variable name in global attributes with the key ‘variable_id’.

pyku.drs.cmorize(ds, global_metadata={}, area_def=None)[source]#

CMORize dataset. The variable shall contain only one variable.

Parameters:
Returns:

CMORized dataset.

Return type:

xarray.Dataset

Raises:
  • ValueError – If the file contains no climate variable.

  • ValueError – If the file contains more than one climate variable.

pyku.drs.drs_filename(ds, varname=None, standard=None, version=None)[source]#

Generate a filename based on the metadata in the given dataset and the standard from the ‘drs.yaml’ configuration file from pyku. Only one climate variable is supported.

Parameters:
  • ds (xarray.Dataset) – The input dataset containing the required metadata.

  • standard (str) – The DRS standard to apply. Supported standards include {‘cordex’, ‘cordex_adjust’, ‘reanalysis’, ‘obs4mips’, ‘cmip6’}, as specified in pyku/drs.yaml. The complete list of available standards can be obtained using pyku.list_drs_standards().

  • version (str, optional) – An optional version string. If provided, it will be used to create an additional directory between the CMOR path and the CMOR filename. While not CMOR-compliant, this practice is commonly used at DWD.

Returns:

The generated filename.

Return type:

str

Raises:
  • ValueError – If the file contains no climate variable.

  • ValueError – If the file contains more than one climate variable.

  • ValueError – If the standard does not exist.

Example

In [1]: import pyku
   ...: ds = pyku.resources.get_test_data('cordex_data')
   ...: ds.pyku.drs_filename(standard='cordex')
   ...: 
Out[1]: 'output/EUR-11/SMHI/MPI-M-MPI-ESM-LR/historical/r1i1p1/SMHI-RCA4/v1a/mon/tas/tas_EUR-11_MPI-M-MPI-ESM-LR_historical_r1i1p1_SMHI-RCA4_v1a_mon_19700116-20051216.nc'
pyku.drs.drs_parent(ds, varname=None, standard=None, version=None)[source]#

Generate a directory according to the metadata in the given dataset and the standard from the ‘drs.yaml’ configuration file from pyku. Only one climate variable is supported.

Parameters:
  • ds (xarray.Dataset) – The input Dataset.

  • standard (str) – The DRS standard. Supported standards include {‘cordex’, ‘cordex_adjust’, ‘reanalysis’, ‘obs4mips’, ‘cmip6’}, as specified in pyku/drs.yaml. The complete list of available standards can be obtained using pyku.list_drs_standards().

  • version (str, optional) – Optional version string. If provided, it will create an additional directory between the CMOR path and the CMOR filename. This practice for cmip5 cordex standard is not CMOR-conform and should be discouraged. Note: In ‘obs4mips’, the version is part of the standard and should be read from the global attributes.

Returns:

The generated filename parent.

Return type:

str

Raises:
  • ValueError – If the file contains no climate variable.

  • ValueError – If the file contains more than one climate variable.

  • KeyError – If the standard does not exist.

Example

In [1]: import pyku
   ...: ds = pyku.resources.get_test_data('cordex_data')
   ...: ds.pyku.drs_parent(standard='cordex')
   ...: 
Out[1]: 'output/EUR-11/SMHI/MPI-M-MPI-ESM-LR/historical/r1i1p1/SMHI-RCA4/v1a/mon/tas'
pyku.drs.drs_stem(ds, varname=None, standard=None)[source]#

Generate a file stem (basename) according to the metadata in the given dataset and the standard from the ‘drs.yaml’ configuration file from pyku. Only one climate variable is supported.

Parameters:
  • ds (xarray.Dataset) – The input Dataset.

  • standard (str) – The DRS standard to apply. Supported standards include {‘cordex’, ‘cordex_adjust’, ‘reanalysis’, ‘obs4mips’, ‘cmip6’}, as specified in pyku/drs.yaml. The complete list of available standards can be obtained using pyku.list_drs_standards().

Returns:

File base name

Return type:

str

Raises:
  • ValueError – If the file contains no climate variable.

  • ValueError – If the file contains more than one climate variable.

  • ValueError – If the standard does not exist.

Example

In [1]: import pyku
   ...: ds = pyku.resources.get_test_data('cordex_data')
   ...: ds.pyku.drs_stem(standard='cordex')
   ...: 
Out[1]: 'tas_EUR-11_MPI-M-MPI-ESM-LR_historical_r1i1p1_SMHI-RCA4_v1a_mon_19700116-20051216.nc'
pyku.drs.get_cmor_varname(da)[source]#

Infer CMOR variable name

Parameters:

da (xarray.DataArray) – The input data array.

Returns:

CMOR-conform variable name infered from the data

Return type:

str

pyku.drs.get_facets_from_file_parent(filename, standard, has_version=False)[source]#

Read facets from file path.

Parameters:
  • filename (string) – The filename with or without path.

  • standards (str) – One of the standards defined in pyku (for example cordex, obs4mips). All standard can be listed with pyku.drs.list_drs_standards()

  • has_version (bool) – Defaults to false. If the facets include a non-CMOR conform version at the end of the directory. For example, if the file directory is of the form /path/to/1hr/tas/v20230630/, it contains a non-conform CMOR path including a version number.

Note

The available standards are available in dictionary pyku.drs.drs_data:

In [1]: import pyku.drs as drs
   ...: list(drs.list_drs_standards())
   ...: 
Out[1]: 
['cmip6',
 'cmip5',
 'cordex',
 'cordex_adjust',
 'cordex_interp',
 'cordex_adjust_interp',
 'cordex_cmip6',
 'hyras',
 'sk_raw',
 'sk_cmor',
 'reanalysis',
 'cosmo-rea6',
 'eobs',
 'obs4mips',
 'seamless']

For example the patterns for the cordex standard can be obtained with:

In [2]: print(drs.drs_data.get('standards')\
   ...:                   .get('cordex')\
   ...:                   .get('parent_pattern'))
   ...: 
{product}/{CORDEX_domain}/{institute_id}/{driving_model_id}/{driving_experiment_name}/{driving_model_ensemble_member}/{model_id}/{rcm_version_id}/{frequency}/{variable_name}

Example

In [3]: import pyku.drs as drs
   ...: 
   ...: filename = '/path/to/DATA/CMOR/OUT/DWD-CPS/\
   ...: output/GER-0275/CLMcom-DWD/ECMWF-ERA5/evaluation/r1i1p1/\
   ...: CLMcom-DWD-CCLM5-0-16/x0n1-v1/1hr/tas/v20230630/\
   ...: tas_GER-0275_ECMWF-ERA5_evaluation_r1i1p1_\
   ...: CLMcom-DWD-CCLM5-0-16_x0n1-v1_1hr_\
   ...: 202201010000-202212312300.nc'
   ...: 
   ...: drs.get_facets_from_file_parent(
   ...:     filename,
   ...:     standard='cordex',
   ...:     has_version=True
   ...: )
   ...: 
Out[3]: 
{'product': 'output',
 'CORDEX_domain': 'GER-0275',
 'institute_id': 'CLMcom-DWD',
 'driving_model_id': 'ECMWF-ERA5',
 'driving_experiment_name': 'evaluation',
 'driving_model_ensemble_member': 'r1i1p1',
 'model_id': 'CLMcom-DWD-CCLM5-0-16',
 'rcm_version_id': 'x0n1-v1',
 'frequency': '1hr',
 'variable_name': 'tas',
 'version': 'v20230630'}
pyku.drs.get_facets_from_file_stem(filename, standard)[source]#

Read facets from file stem

Parameters:
  • filename (string) – The filename with or without path.

  • standard (str) – The DRS standard. Supported standards include {‘cordex’, ‘cordex_adjust’, ‘reanalysis’, ‘obs4mips’, ‘cmip6’}, as specified in pyku/drs.yaml. The complete list of available standards can be obtained using pyku.list_drs_standards().

Note

The available standards are available in dictionary pyku.drs.drs_data:

In [1]: import pyku.drs as drs
   ...: list(drs.drs_data.get('standards').keys())
   ...: 
Out[1]: 
['cmip6',
 'cmip5',
 'cordex',
 'cordex_adjust',
 'cordex_interp',
 'cordex_adjust_interp',
 'cordex_cmip6',
 'hyras',
 'sk_raw',
 'sk_cmor',
 'reanalysis',
 'cosmo-rea6',
 'eobs',
 'obs4mips',
 'seamless']

For example the patterns for the cordex standard can be obtained with:

In [2]: print(drs.drs_data.get('standards')\
   ...:                   .get('cordex')\
   ...:                   .get('stem_pattern'))
   ...: 
{variable_name}_{CORDEX_domain}_{driving_model_id}_{driving_experiment_name}_{driving_model_ensemble_member}_{model_id}_{rcm_version_id}_{frequency}_{start_time}-{end_time}

Example

In [3]: import pyku.drs as drs
   ...: 
   ...: filename = '/path/to/DATA/CLM/CMOR/OUT/DWD-CPS/\
   ...: output/GER-0275/CLMcom-DWD/ECMWF-ERA5/evaluation/r1i1p1/\
   ...: CLMcom-DWD-CCLM5-0-16/x0n1-v1/1hr/tas/v20230630/\
   ...: tas_GER-0275_ECMWF-ERA5_evaluation_r1i1p1_\
   ...: CLMcom-DWD-CCLM5-0-16_x0n1-v1_1hr_\
   ...: 202201010000-202212312300.nc'
   ...: 
   ...: drs.get_facets_from_file_stem(filename, standard='cordex')
   ...: 
Out[3]: 
{'variable_name': 'tas',
 'CORDEX_domain': 'GER-0275',
 'driving_model_id': 'ECMWF-ERA5',
 'driving_experiment_name': 'evaluation',
 'driving_model_ensemble_member': 'r1i1p1',
 'model_id': 'CLMcom-DWD-CCLM5-0-16',
 'rcm_version_id': 'x0n1-v1',
 'frequency': '1hr',
 'start_time': '202201010000',
 'end_time': '202212312300'}
pyku.drs.has_cmor_time_labels(ds, var=None)[source]#

Check if time labels are conform to the CMOR convention.

Parameters:
  • ds (xarray.Dataset) – The input dataset.

  • var (str) – The input variable.

Returns:

Whether the time labels are CMOR-conform

Return type:

bool

Example

In [1]: import pyku
   ...: ds = pyku.resources.get_test_data('cordex_data')
   ...: ds.pyku.has_cmor_time_labels(var='tas')
   ...: 
Out[1]: True
pyku.drs.list_drs_standards()[source]#

List available DRS standards included in pyku

Returns:

List of DRS standards.

Return type:

List(str)

pyku.drs.to_cmor_attrs(ds)[source]#

Set dataset attributes to CMOR standard.

Parameters:

ds (xarray.Dataset) – The input dataset.

Returns:

Dataset with CMOR-conform attributes

Return type:

xarray.Dataset

pyku.drs.to_cmor_units(ds)[source]#

Convert CMOR variables to CMOR-conform units.

Parameters:

ds (xarray.Dataset) – The input data.

Returns:

The data with CMOR-conform units.

Return type:

xarray.Dataset

Example

In [1]: import pyku
   ...: ds = pyku.resources.get_test_data('cordex_data')
   ...: ds.pyku.to_cmor_units()['tas'].attrs
   ...: 
Out[1]: 
{'cell_methods': 'time: mean',
 'grid_mapping': 'rotated_pole',
 'long_name': 'Near-Surface Air Temperature',
 'standard_name': 'air_temperature',
 'units': 'K'}
pyku.drs.to_cmor_varnames(ds)[source]#

Convert variables to CMOR-conform variables.

Parameters:

ds (xarray.Dataset) – The input data.

Returns:

Data with CMOR-conform variable names

Return type:

xarray.Dataset

pyku.drs.to_drs_netcdfs(ds, base_dir=None, standard='cordex', var=None, version=None, dry_run=False, encoding='auto', overwrite=False, complevel=None, **kwargs)[source]#

Write CMOR-conform NetCDF files.

Arguments:

ds (xarray.Dataset): Dataset with CMOR-conform metadata.

base_dir (str): Output base directory. The file will be written according to ‘base_dir/cmor_path/cmor_filename.nc’

standard (str): The DRS standard to apply. Supported standards include {‘cordex’, ‘cordex_adjust’, ‘reanalysis’, ‘obs4mips’, ‘cmip6’}, as specified in pyku/drs.yaml. The complete list of available standards can be obtained using pyku.list_drs_standards().

var (str): Variable to be cmorized. Only one variable per file is supported at the moment.

version (str): Optional. The version string creates an additional directory between the cmor path and the cmor filename /cmor/path/version/cmor_filename.nc. The practice is not CMOR-conform and its usage should be discouraged. However the use of the version argument in that manner is widespread at DWD.

dry_run (bool): Optional. Whether to try a dry run without writing the data first.

encoding (dict): Deprecated. Now the encoding is set from yaml file drs.yaml. Optional encoding parameters when writing the NetCDF files. For details, see xarray.Dataset.to_netcdf().

overwrite (bool): Optional. Whether exiting files should be overwritten if they already exist. Defaults to False