pyku.drs#
Data Reference Syntax (DRS) module
Resources:
http://is-enes-data.github.io/cordex_archive_specifications.pdf
http://is-enes-data.github.io/CORDEX_variables_requirement_table.pdf
IS-ENES-Data/IS-ENES-Data.github.io CORDEX_adjust_drs.pdf
The CMIP5 CORDEX standard has ambiguities regarding variable names. Appendix A of the NetCDF header examples excludes the variable name from both mandatory and optional metadata. While VariableName is listed in the Controlled Vocabulary, suggesting it could be a mandatory global attribute, but no standard key is defined. Consequently, most existing datasets omit the variable name. In this context, ‘variable_name’ is automatically derived from the dataset’s climate data, even when it is not explicitly required by the standard.
In contrast, newer standards like obs4mips explicitly include the variable name in global attributes with the key ‘variable_id’.
- pyku.drs.cmorize(ds, global_metadata={}, area_def=None)[source]#
CMORize dataset. The variable shall contain only one variable.
- Parameters:
ds (
xarray.Dataset) – The input dataset.metadata (dict) – The dictionary of global metadata.
area_def (
pyresample.geometry.AreaDefinition, str) – (Deprecated) Output area definition.
- Returns:
CMORized dataset.
- Return type:
- Raises:
ValueError – If the file contains no climate variable.
ValueError – If the file contains more than one climate variable.
- pyku.drs.drs_filename(ds, varname=None, standard=None, version=None)[source]#
Generate a filename based on the metadata in the given dataset and the standard from the ‘drs.yaml’ configuration file from pyku. Only one climate variable is supported.
- Parameters:
ds (
xarray.Dataset) – The input dataset containing the required metadata.standard (str) – The DRS standard to apply. Supported standards include {‘cordex’, ‘cordex_adjust’, ‘reanalysis’, ‘obs4mips’, ‘cmip6’}, as specified in
pyku/drs.yaml. The complete list of available standards can be obtained usingpyku.list_drs_standards().version (str, optional) – An optional version string. If provided, it will be used to create an additional directory between the CMOR path and the CMOR filename. While not CMOR-compliant, this practice is commonly used at DWD.
- Returns:
The generated filename.
- Return type:
str
- Raises:
ValueError – If the file contains no climate variable.
ValueError – If the file contains more than one climate variable.
ValueError – If the standard does not exist.
Example
In [1]: import pyku ...: ds = pyku.resources.get_test_data('cordex_data') ...: ds.pyku.drs_filename(standard='cordex') ...: Out[1]: 'output/EUR-11/SMHI/MPI-M-MPI-ESM-LR/historical/r1i1p1/SMHI-RCA4/v1a/mon/tas/tas_EUR-11_MPI-M-MPI-ESM-LR_historical_r1i1p1_SMHI-RCA4_v1a_mon_19700116-20051216.nc'
- pyku.drs.drs_parent(ds, varname=None, standard=None, version=None)[source]#
Generate a directory according to the metadata in the given dataset and the standard from the ‘drs.yaml’ configuration file from pyku. Only one climate variable is supported.
- Parameters:
ds (
xarray.Dataset) – The input Dataset.standard (str) – The DRS standard. Supported standards include {‘cordex’, ‘cordex_adjust’, ‘reanalysis’, ‘obs4mips’, ‘cmip6’}, as specified in
pyku/drs.yaml. The complete list of available standards can be obtained usingpyku.list_drs_standards().version (str, optional) – Optional version string. If provided, it will create an additional directory between the CMOR path and the CMOR filename. This practice for cmip5 cordex standard is not CMOR-conform and should be discouraged. Note: In ‘obs4mips’, the version is part of the standard and should be read from the global attributes.
- Returns:
The generated filename parent.
- Return type:
str
- Raises:
ValueError – If the file contains no climate variable.
ValueError – If the file contains more than one climate variable.
KeyError – If the standard does not exist.
Example
In [1]: import pyku ...: ds = pyku.resources.get_test_data('cordex_data') ...: ds.pyku.drs_parent(standard='cordex') ...: Out[1]: 'output/EUR-11/SMHI/MPI-M-MPI-ESM-LR/historical/r1i1p1/SMHI-RCA4/v1a/mon/tas'
- pyku.drs.drs_stem(ds, varname=None, standard=None)[source]#
Generate a file stem (basename) according to the metadata in the given dataset and the standard from the ‘drs.yaml’ configuration file from pyku. Only one climate variable is supported.
- Parameters:
ds (
xarray.Dataset) – The input Dataset.standard (str) – The DRS standard to apply. Supported standards include {‘cordex’, ‘cordex_adjust’, ‘reanalysis’, ‘obs4mips’, ‘cmip6’}, as specified in
pyku/drs.yaml. The complete list of available standards can be obtained usingpyku.list_drs_standards().
- Returns:
File base name
- Return type:
str
- Raises:
ValueError – If the file contains no climate variable.
ValueError – If the file contains more than one climate variable.
ValueError – If the standard does not exist.
Example
In [1]: import pyku ...: ds = pyku.resources.get_test_data('cordex_data') ...: ds.pyku.drs_stem(standard='cordex') ...: Out[1]: 'tas_EUR-11_MPI-M-MPI-ESM-LR_historical_r1i1p1_SMHI-RCA4_v1a_mon_19700116-20051216.nc'
- pyku.drs.get_cmor_varname(da)[source]#
Infer CMOR variable name
- Parameters:
da (
xarray.DataArray) – The input data array.- Returns:
CMOR-conform variable name infered from the data
- Return type:
str
- pyku.drs.get_facets_from_file_parent(filename, standard, has_version=False)[source]#
Read facets from file path.
- Parameters:
filename (string) – The filename with or without path.
standards (str) – One of the standards defined in pyku (for example cordex, obs4mips). All standard can be listed with
pyku.drs.list_drs_standards()has_version (bool) – Defaults to false. If the facets include a non-CMOR conform version at the end of the directory. For example, if the file directory is of the form
/path/to/1hr/tas/v20230630/, it contains a non-conform CMOR path including a version number.
Note
The available standards are available in dictionary
pyku.drs.drs_data:In [1]: import pyku.drs as drs ...: list(drs.list_drs_standards()) ...: Out[1]: ['cmip6', 'cmip5', 'cordex', 'cordex_adjust', 'cordex_interp', 'cordex_adjust_interp', 'cordex_cmip6', 'hyras', 'sk_raw', 'sk_cmor', 'reanalysis', 'cosmo-rea6', 'eobs', 'obs4mips', 'seamless']
For example the patterns for the cordex standard can be obtained with:
In [2]: print(drs.drs_data.get('standards')\ ...: .get('cordex')\ ...: .get('parent_pattern')) ...: {product}/{CORDEX_domain}/{institute_id}/{driving_model_id}/{driving_experiment_name}/{driving_model_ensemble_member}/{model_id}/{rcm_version_id}/{frequency}/{variable_name}
Example
In [3]: import pyku.drs as drs ...: ...: filename = '/path/to/DATA/CMOR/OUT/DWD-CPS/\ ...: output/GER-0275/CLMcom-DWD/ECMWF-ERA5/evaluation/r1i1p1/\ ...: CLMcom-DWD-CCLM5-0-16/x0n1-v1/1hr/tas/v20230630/\ ...: tas_GER-0275_ECMWF-ERA5_evaluation_r1i1p1_\ ...: CLMcom-DWD-CCLM5-0-16_x0n1-v1_1hr_\ ...: 202201010000-202212312300.nc' ...: ...: drs.get_facets_from_file_parent( ...: filename, ...: standard='cordex', ...: has_version=True ...: ) ...: Out[3]: {'product': 'output', 'CORDEX_domain': 'GER-0275', 'institute_id': 'CLMcom-DWD', 'driving_model_id': 'ECMWF-ERA5', 'driving_experiment_name': 'evaluation', 'driving_model_ensemble_member': 'r1i1p1', 'model_id': 'CLMcom-DWD-CCLM5-0-16', 'rcm_version_id': 'x0n1-v1', 'frequency': '1hr', 'variable_name': 'tas', 'version': 'v20230630'}
- pyku.drs.get_facets_from_file_stem(filename, standard)[source]#
Read facets from file stem
- Parameters:
filename (string) – The filename with or without path.
standard (str) – The DRS standard. Supported standards include {‘cordex’, ‘cordex_adjust’, ‘reanalysis’, ‘obs4mips’, ‘cmip6’}, as specified in
pyku/drs.yaml. The complete list of available standards can be obtained usingpyku.list_drs_standards().
Note
The available standards are available in dictionary
pyku.drs.drs_data:In [1]: import pyku.drs as drs ...: list(drs.drs_data.get('standards').keys()) ...: Out[1]: ['cmip6', 'cmip5', 'cordex', 'cordex_adjust', 'cordex_interp', 'cordex_adjust_interp', 'cordex_cmip6', 'hyras', 'sk_raw', 'sk_cmor', 'reanalysis', 'cosmo-rea6', 'eobs', 'obs4mips', 'seamless']
For example the patterns for the cordex standard can be obtained with:
In [2]: print(drs.drs_data.get('standards')\ ...: .get('cordex')\ ...: .get('stem_pattern')) ...: {variable_name}_{CORDEX_domain}_{driving_model_id}_{driving_experiment_name}_{driving_model_ensemble_member}_{model_id}_{rcm_version_id}_{frequency}_{start_time}-{end_time}
Example
In [3]: import pyku.drs as drs ...: ...: filename = '/path/to/DATA/CLM/CMOR/OUT/DWD-CPS/\ ...: output/GER-0275/CLMcom-DWD/ECMWF-ERA5/evaluation/r1i1p1/\ ...: CLMcom-DWD-CCLM5-0-16/x0n1-v1/1hr/tas/v20230630/\ ...: tas_GER-0275_ECMWF-ERA5_evaluation_r1i1p1_\ ...: CLMcom-DWD-CCLM5-0-16_x0n1-v1_1hr_\ ...: 202201010000-202212312300.nc' ...: ...: drs.get_facets_from_file_stem(filename, standard='cordex') ...: Out[3]: {'variable_name': 'tas', 'CORDEX_domain': 'GER-0275', 'driving_model_id': 'ECMWF-ERA5', 'driving_experiment_name': 'evaluation', 'driving_model_ensemble_member': 'r1i1p1', 'model_id': 'CLMcom-DWD-CCLM5-0-16', 'rcm_version_id': 'x0n1-v1', 'frequency': '1hr', 'start_time': '202201010000', 'end_time': '202212312300'}
- pyku.drs.has_cmor_time_labels(ds, var=None)[source]#
Check if time labels are conform to the CMOR convention.
- Parameters:
ds (
xarray.Dataset) – The input dataset.var (str) – The input variable.
- Returns:
Whether the time labels are CMOR-conform
- Return type:
bool
Example
In [1]: import pyku ...: ds = pyku.resources.get_test_data('cordex_data') ...: ds.pyku.has_cmor_time_labels(var='tas') ...: Out[1]: True
- pyku.drs.list_drs_standards()[source]#
List available DRS standards included in pyku
- Returns:
List of DRS standards.
- Return type:
List(str)
- pyku.drs.to_cmor_attrs(ds)[source]#
Set dataset attributes to CMOR standard.
- Parameters:
ds (
xarray.Dataset) – The input dataset.- Returns:
Dataset with CMOR-conform attributes
- Return type:
- pyku.drs.to_cmor_units(ds)[source]#
Convert CMOR variables to CMOR-conform units.
- Parameters:
ds (
xarray.Dataset) – The input data.- Returns:
The data with CMOR-conform units.
- Return type:
Example
In [1]: import pyku ...: ds = pyku.resources.get_test_data('cordex_data') ...: ds.pyku.to_cmor_units()['tas'].attrs ...: Out[1]: {'cell_methods': 'time: mean', 'grid_mapping': 'rotated_pole', 'long_name': 'Near-Surface Air Temperature', 'standard_name': 'air_temperature', 'units': 'K'}
- pyku.drs.to_cmor_varnames(ds)[source]#
Convert variables to CMOR-conform variables.
- Parameters:
ds (
xarray.Dataset) – The input data.- Returns:
Data with CMOR-conform variable names
- Return type:
- pyku.drs.to_drs_netcdfs(ds, base_dir=None, standard='cordex', var=None, version=None, dry_run=False, encoding='auto', overwrite=False, complevel=None, **kwargs)[source]#
Write CMOR-conform NetCDF files.
Arguments:
ds (
xarray.Dataset): Dataset with CMOR-conform metadata.base_dir (str): Output base directory. The file will be written according to ‘base_dir/cmor_path/cmor_filename.nc’
standard (str): The DRS standard to apply. Supported standards include {‘cordex’, ‘cordex_adjust’, ‘reanalysis’, ‘obs4mips’, ‘cmip6’}, as specified in
pyku/drs.yaml. The complete list of available standards can be obtained usingpyku.list_drs_standards().var (str): Variable to be cmorized. Only one variable per file is supported at the moment.
version (str): Optional. The version string creates an additional directory between the cmor path and the cmor filename
/cmor/path/version/cmor_filename.nc. The practice is not CMOR-conform and its usage should be discouraged. However the use of the version argument in that manner is widespread at DWD.dry_run (bool): Optional. Whether to try a dry run without writing the data first.
encoding (dict): Deprecated. Now the encoding is set from yaml file
drs.yaml. Optional encoding parameters when writing the NetCDF files. For details, seexarray.Dataset.to_netcdf().overwrite (bool): Optional. Whether exiting files should be overwritten if they already exist. Defaults to False