pyku.check#
Functions for checking data
See also
pyku configuration files:
./pyku/etc/drs.yaml./pyku/etc/metadata.yaml
- pyku.check.check(ds, standard=None, completeness_period=None, all_nan_slices=False)[source]#
Perform the following checks:
If any all NaN slice is found,
valid bounds,
georeferencing,
units,
CMOR variable names,
frequency,
the role of variables
if all timestamps are available within completeness period
- Parameters:
ds (
xarray.Dataset) – The input dataset.standard (str) – Optional, defaults to None. cordex, obs4mips, cordex_adjust, cordex_adjust_interp or any standard implemented in the pyku configuration file
./pyku/etc/drs.yaml. If None, compliance of the metadata with a standard is not checked.completeness_period (freqstr) – Optional frequency string (e.g. ‘1MS’, ‘1YS’). Defaults to None. If given, it will be checked with the given data frequency if all timestamps are available. Possible values can be found at: https://pandas.pydata.org/docs/user_guide/timeseries.html#offset-aliases
all_nan_slices (bool) – Defaults to true, optional. Check if slices with only NaNs exist in dataset
- Returns:
The checks and issues.
- Return type:
Example
In [1]: import pyku ...: ds = pyku.resources.get_test_data('hyras') ...: ds.pyku.check(standard='obs4mips') ...: Out[1]: key ... description 0 tas above 500.0 ... NaN 1 tas below 50.0 ... NaN 2 y_projection_coordinate_exist ... Checking if y projection coordinate available 3 x_projection_coordinate_exist ... Checking if x projection coordinate available 4 lat_geographic_coordinate_exist ... Checking if lat geographic coordinate available 5 lon_geographic_coordinate_exist ... Checking if lon geographic coordinate available 6 y_projection_coordinate_unit_correct ... Checking if y projection coordinates units 7 x_projection_coordinate_unit_correct ... Checking if x projection coordinates units 8 y_projection_coordinate_standard_name_correct ... Checking y projection coordinates standard_name 9 x_projection_coordinate_standard_name_correct ... Checking x projection coordinates standard_name 10 lat_geographic_coordinate_unit_correct ... Checking lat geographic coordinate units 11 lon_geographic_coordinate_unit_correct ... Checking if lat geographic coordinate units 12 lat_geographic_coordinate_standard_name_correct ... Checking lat geographic coordinate standard_name 13 lon_geographic_coordinate_standard_name_correct ... Checking lon geographic coordinate standard_name 14 cf_area_def_readable ... Check if CF projection metadata are readable 15 area_extent_is_readable ... Check if the area extent can be determined fro... 16 longitudes_within_180W_and_180E ... Check that the longitudes are within 180 degre... 17 tas_units_can_be_read ... Check if units can be read automatically 18 tas ... NaN 19 is_cmor_standard_name ... If possible, check if standard name is CMOR co... 20 is_cmor_long_name ... Check if long_name is CMOR conform 21 is_cmor_units ... Check if units is CMOR conform 22 frequency_can_be_inferred_from_data ... Tried to infer frequency from the time labels 23 frequency_can_be_determined ... Check if frequency can be determined from the ... 24 geodata_vars ... NaN 25 geographic_latlon ... NaN 26 projection_yx ... NaN 27 time_dependent_vars ... NaN 28 time_bounds_var ... NaN 29 spatial_bounds_vars ... NaN 30 spatial_vertices_vars ... NaN 31 crs_var ... NaN 32 unidentified_vars ... NaN 33 has_time_dimension ... Check if data have a time dimension 34 time_is_numpy_datetime64_or_cftime ... Check the data type of the time stamps 35 time_stamps_are_midnight_or_noon ... Check if all timestamps are midnight or noon 36 variable_id ... NaN 37 frequency ... NaN 38 source_id ... NaN 39 variant_label ... NaN 40 grid_label ... NaN 41 activity_id ... NaN 42 institution_id ... NaN 43 source_id ... NaN 44 frequency ... NaN 45 variable_id ... NaN 46 grid_label ... NaN 47 version ... NaN [48 rows x 4 columns]
- pyku.check.check_allnan_slices(ds)[source]#
Check for allnan slices along time
- Parameters:
ds (
xarray.Dataset) – The input dataset.- Returns:
The checks and issues.
- Return type:
Examples
In [1]: import pyku ...: ...: ds = pyku.resources.get_test_data('hyras') ...: ...: ds.pyku.check_allnan_slices() ...: Out[1]: key value issue 0 tas allnan 730 time labels for tas None
- pyku.check.check_cmor_varnames(ds)[source]#
Check if variable names are CMOR-conform
- Parameters:
ds (
xarray.Dataset) – The input dataset.- Returns:
The checks and issues.
- Return type:
Example
In [1]: import pyku ...: ds = pyku.resources.get_test_data('hyras') ...: ds.pyku.check_cmor_varnames() ...: Out[1]: key value issue 0 tas tas None
- pyku.check.check_datetime_completeness(ds, frequency)[source]#
Check data completeness for a given frequency/period. Note that the function says frequency when really, a period is needed.
- Parameters:
ds (
xarray.Dataset) – The input dataset.frequency (freqstr) – Frequency string (e.g. 1D, 3H, 1D, 1MS, 1YS, 1YS, or Q). The complete list is available at: https://pandas.pydata.org/docs/user_guide/timeseries.html#offset-aliases
- Returns:
The checks and issues.
- Return type:
Example
In [1]: import pyku ...: ds = pyku.resources.get_test_data('hyras') ...: ds.pyku.check_datetime_completeness(frequency='1MS') ...: Out[1]: key ... issue 0 Time dimension ... None 1 Type of datetime in first dataset ... None 2 Datetimes ... None 3 Datetimes ... None 4 Datetimes ... None 5 Datetimes ... None 6 Datetimes ... None 7 Datetimes ... None 8 Datetimes ... None 9 Datetimes ... None 10 Datetimes ... None 11 Datetimes ... None 12 Datetimes ... None 13 Datetimes ... None 14 Datetimes ... None 15 Datetimes ... None 16 Datetimes ... None 17 Datetimes ... None 18 Datetimes ... None 19 Datetimes ... None 20 Datetimes ... None 21 Datetimes ... None 22 Datetimes ... None 23 Datetimes ... None 24 Datetimes ... None 25 Datetimes ... None [26 rows x 3 columns]
- pyku.check.check_datetimes(ds)[source]#
Check datetimes.
- Parameters:
ds (
xarray.Dataset) – The input dataset.- Returns:
The checks and issues.
- Return type:
Example
In [1]: import pyku ...: ds = pyku.resources.get_test_data('hyras') ...: ds.pyku.check_datetimes() ...: Out[1]: key ... description 0 has_time_dimension ... Check if data have a time dimension 1 time_is_numpy_datetime64_or_cftime ... Check the data type of the time stamps 2 time_stamps_are_midnight_or_noon ... Check if all timestamps are midnight or noon [3 rows x 4 columns]
- pyku.check.check_drs(ds, standard=None)[source]#
Check metadata for Data Reference Syntax (DRS)
- Parameters:
ds (
xarray.Dataset) – The input dataset.standards (str) – Standard, can be one of ‘cordex’, ‘cordex_adjust’, ‘obs4mips’, or ‘cordex_adjust_interp’.
- Returns:
The checks and issues.
- Return type:
Example
In [1]: import pyku ...: ds = pyku.resources.get_test_data('hyras') ...: ds.pyku.check_drs(standard='cordex') ...: Out[1]: key value issue 0 CORDEX_domain undefined None 1 driving_model_id None missing value 2 driving_experiment_name None missing value 3 driving_model_ensemble_member None missing value 4 model_id None missing value 5 rcm_version_id None missing value 6 frequency day None 7 product observations None 8 institute_id None missing value 9 experiment_id None missing value
- pyku.check.check_files(list_of_files, standard=None, completeness_period=None, progress=False)[source]#
Warning
Do not use this function as this may be taken out in the near future.
Check list of files.
- Parameters:
list_of_files (list) – List of files to be checked
standard (str) – Standard (e.g. ‘cordex’), defaults to None. If ‘None’, the standard metadata are not checked.
completeness_period (freqstr) – The files will be checked for completeness within the defined period (e.g. ‘1MS’).
- Returns:
Issues
- Return type:
- pyku.check.check_files_multi(list_of_files, standard='cordex', completeness_period=None)[source]#
Check list of files (multiprocessed version). This function should not be used as the multiprocessing should run on each files, instead of loading many files at once and running them in parallel.
Todo
Check if functional outside of dask distributed
Write docstring
- Parameters:
list_of_files (list) – List of files to be checked
- Returns:
Issues
- Return type:
- pyku.check.check_frequency(ds)[source]#
Check frequency from time labels and time bounds. If the period between consecutive data is not homogenous, an issue is raised.
- Parameters:
ds (
xarray.Dataset) – The input dataset.- Returns:
The checks and issues.
- Return type:
Examples
In [1]: import pyku ...: ds = pyku.resources.get_test_data('hyras') ...: ds.pyku.check_frequency() ...: Out[1]: key ... description 0 frequency_can_be_inferred_from_data ... Tried to infer frequency from the time labels 1 frequency_can_be_determined ... Check if frequency can be determined from the ... [2 rows x 4 columns]
- pyku.check.check_georeferencing(ds)[source]#
Check georeferencing
- Parameters:
ds (
xarray.Dataset) – The input dataset.- Returns:
The checks and issues.
- Return type:
Examples
In [1]: import pyku ...: ...: ds = pyku.resources.get_test_data('hyras') ...: ...: ds.pyku.check_georeferencing() ...: Out[1]: key ... description 0 y_projection_coordinate_exist ... Checking if y projection coordinate available 1 x_projection_coordinate_exist ... Checking if x projection coordinate available 2 lat_geographic_coordinate_exist ... Checking if lat geographic coordinate available 3 lon_geographic_coordinate_exist ... Checking if lon geographic coordinate available 4 y_projection_coordinate_unit_correct ... Checking if y projection coordinates units 5 x_projection_coordinate_unit_correct ... Checking if x projection coordinates units 6 y_projection_coordinate_standard_name_correct ... Checking y projection coordinates standard_name 7 x_projection_coordinate_standard_name_correct ... Checking x projection coordinates standard_name 8 lat_geographic_coordinate_unit_correct ... Checking lat geographic coordinate units 9 lon_geographic_coordinate_unit_correct ... Checking if lat geographic coordinate units 10 lat_geographic_coordinate_standard_name_correct ... Checking lat geographic coordinate standard_name 11 lon_geographic_coordinate_standard_name_correct ... Checking lon geographic coordinate standard_name 12 cf_area_def_readable ... Check if CF projection metadata are readable 13 area_extent_is_readable ... Check if the area extent can be determined fro... 14 longitudes_within_180W_and_180E ... Check that the longitudes are within 180 degre... [15 rows x 4 columns]
- pyku.check.check_metadata(ds, standard=None, completeness_period=None)[source]#
Perform the following checks:
georeferencing,
units,
CMOR variable names,
CMOR variables metdata
frequency,
the role of variables
CMOR standard
Completeness of data over a given period
The difference with
pyku.check.check()is that the resource intensive testing function like checking for all-nan slices or checking time bounds are left out.- Parameters:
ds (
xarray.Dataset) – The input dataset.standard (str) – Optional standard. One of cordex, obs4mips, cordex_adjust, cordex_adjust_interp or any standard implemented in pyku configuration file
./pyku/etc/drs.yaml. If None, compliance of metadata with a standard is not checked.completeness_period (freqstr) – Frequency string (e.g. ‘1MS’, ‘1YS’). It will then be checked with the given data frequency if all timestamps are available. Possible values can be found at: https://pandas.pydata.org/docs/user_guide/timeseries.html#offset-aliases
- Returns:
The checks and issues.
- Return type:
Example
In [1]: %%time ...: import pyku ...: ds = pyku.resources.get_test_data('hyras') ...: ds.pyku.check_metadata(standard='obs4mips') ...: CPU times: user 126 ms, sys: 15.9 ms, total: 142 ms Wall time: 140 ms Out[1]: key ... description 0 y_projection_coordinate_exist ... Checking if y projection coordinate available 1 x_projection_coordinate_exist ... Checking if x projection coordinate available 2 lat_geographic_coordinate_exist ... Checking if lat geographic coordinate available 3 lon_geographic_coordinate_exist ... Checking if lon geographic coordinate available 4 y_projection_coordinate_unit_correct ... Checking if y projection coordinates units 5 x_projection_coordinate_unit_correct ... Checking if x projection coordinates units 6 y_projection_coordinate_standard_name_correct ... Checking y projection coordinates standard_name 7 x_projection_coordinate_standard_name_correct ... Checking x projection coordinates standard_name 8 lat_geographic_coordinate_unit_correct ... Checking lat geographic coordinate units 9 lon_geographic_coordinate_unit_correct ... Checking if lat geographic coordinate units 10 lat_geographic_coordinate_standard_name_correct ... Checking lat geographic coordinate standard_name 11 lon_geographic_coordinate_standard_name_correct ... Checking lon geographic coordinate standard_name 12 cf_area_def_readable ... Check if CF projection metadata are readable 13 area_extent_is_readable ... Check if the area extent can be determined fro... 14 longitudes_within_180W_and_180E ... Check that the longitudes are within 180 degre... 15 tas_units_can_be_read ... Check if units can be read automatically 16 tas ... NaN 17 is_cmor_standard_name ... If possible, check if standard name is CMOR co... 18 is_cmor_long_name ... Check if long_name is CMOR conform 19 is_cmor_units ... Check if units is CMOR conform 20 frequency_can_be_inferred_from_data ... Tried to infer frequency from the time labels 21 frequency_can_be_determined ... Check if frequency can be determined from the ... 22 geodata_vars ... NaN 23 geographic_latlon ... NaN 24 projection_yx ... NaN 25 time_dependent_vars ... NaN 26 time_bounds_var ... NaN 27 spatial_bounds_vars ... NaN 28 spatial_vertices_vars ... NaN 29 crs_var ... NaN 30 unidentified_vars ... NaN 31 has_time_dimension ... Check if data have a time dimension 32 time_is_numpy_datetime64_or_cftime ... Check the data type of the time stamps 33 time_stamps_are_midnight_or_noon ... Check if all timestamps are midnight or noon 34 variable_id ... NaN 35 frequency ... NaN 36 source_id ... NaN 37 variant_label ... NaN 38 grid_label ... NaN 39 activity_id ... NaN 40 institution_id ... NaN 41 source_id ... NaN 42 frequency ... NaN 43 variable_id ... NaN 44 grid_label ... NaN 45 version ... NaN [46 rows x 4 columns]
- pyku.check.check_units(ds)[source]#
Check units
- Parameters:
ds (
xarray.Dataset) – The input dataset.- Returns:
The checks and issues.
- Return type:
Example
In [1]: import pyku ...: ds = pyku.resources.get_test_data('hyras') ...: ds.pyku.check_units() ...: Out[1]: key value issue description 0 tas_units_can_be_read True None Check if units can be read automatically
- pyku.check.check_valid_bounds(ds, bounds=None)[source]#
Check bounds
- Parameters:
ds (
xarray.Dataset) – The input dataset.bounds (dict) – Nested dictionary.
- Returns:
The checks and issues.
- Return type:
Examples
In [1]: import pyku ...: ds = pyku.resources.get_test_data('hyras') ...: ds.pyku.check_valid_bounds() ...: Out[1]: key value issue 0 tas above 500.0 0 values above threshold None 1 tas below 50.0 0 values below threshold None
In [2]: import pyku ...: ds = pyku.resources.get_test_data('hyras') ...: ds.pyku.check_valid_bounds( ...: bounds = { ...: 'tas': { ...: 'units': 'celsius', ...: 'valid_bounds': [1, 20] ...: } ...: } ...: ) ...: Out[2]: key ... issue 0 tas above 20.0 ... Shape (730, 178, 133) First 50 indices: [[128 ... 1 tas below 1.0 ... Shape (730, 178, 133) First 50 indices: {where... [2 rows x 3 columns]
- pyku.check.check_variables_cmor_metadata(ds)[source]#
Check variable CMOR metadata (‘standard_name’, ‘long_name’ and ‘units’)
- Parameters:
ds (
xarray.Dataset) – The input dataset.- Returns:
The checks and issues.
- Return type:
Example
In [1]: import pyku ...: ds = pyku.resources.get_test_data('air_temperature') ...: ds.pyku.check_variables_cmor_metadata() ...: Out[1]: key ... description 0 is_cmor_standard_name ... If possible, check if standard name is CMOR co... 1 is_cmor_long_name ... Check if long_name is CMOR conform 2 is_cmor_units ... Check if units is CMOR conform [3 rows x 4 columns]
- pyku.check.check_variables_role(ds)[source]#
Look for variables which role is not identified. Identified roles for variables are coordinate reference system, spatial bounds, spatial vertices, geographic longitude, geographic latitude, projection coordinate x, projection coordinate y, georeferenced data
- Parameters:
ds (
xarray.Dataset) – The input dataset.- Returns:
The checks and issues.
- Return type:
Example
In [1]: import pyku ...: ds = pyku.resources.get_test_data('hyras') ...: ds.pyku.check_variables_role() ...: Out[1]: key value issue 0 geodata_vars [tas] None 1 geographic_latlon (lat, lon) None 2 projection_yx (y, x) None 3 time_dependent_vars [number_of_stations, tas, time_bnds] None 4 time_bounds_var time_bnds None 5 spatial_bounds_vars [] None 6 spatial_vertices_vars [] None 7 crs_var crs None 8 unidentified_vars [number_of_stations] None
- pyku.check.compare_attrs(ds1, ds2, var=None)[source]#
Compare global or variable attrs
- Parameters:
ds1 (
xarray.Dataset) – The first input dataset.ds2 (
xarray.Dataset) – The second input dataset.var (str) – Variable name. Defaults to None. If variable is None, the global attributes are compared, otherwise the variable attributes are analyzed.
- Returns:
The checks and issues.
- Return type:
Example
In [1]: import pyku ...: ...: ds1 = pyku.resources.get_test_data('model_data') ...: ds2 = pyku.resources.get_test_data('hyras') ...: ...: ds1.pyku.compare_attrs(ds2) ...: Out[1]: differences ... dataset 2 0 attrs['driving_experiment'] ... None 1 attrs['driving_experiment_name'] ... None 2 attrs['driving_model_ensemble_member'] ... None 3 attrs['driving_model_id'] ... None 4 attrs['experiment'] ... None 5 attrs['experiment_id'] ... None 6 attrs['institute_id'] ... None 7 attrs['model_id'] ... None 8 attrs['rcm_version_id'] ... None 9 attrs['rossby_comment'] ... None 10 attrs['rossby_grib_path'] ... None 11 attrs['rossby_run_id'] ... None 12 attrs['tracking_id'] ... None 13 attrs['source'] ... surface observations 14 attrs['title'] ... gridded_temperature_dataset_(HYRAS TAS) 15 attrs['realization'] ... v6-1 16 attrs['input_data_status'] ... checked 17 attrs['realm'] ... atmos 18 attrs['level_type'] ... surface 19 attrs['horizontal_resolution'] ... 1_km 20 attrs['author'] ... Climate Monitoring (KU21) 21 attrs['variable_id'] ... tas 22 attrs['ConventionsURL'] ... http://cfconventions.org/Data/cf-conventions/c... 23 attrs['license'] ... The HYRAS data, produced by DWD, is licensed u... 24 attrs['filename'] ... tas_hyras_1_1981_v6-1_de.nc 25 attrs['comment'] ... Please be aware that the parameters are stored... 26 attrs['unique_dataset_id'] ... DWD_HYRAS_DE_tas_v6-1_1981_3a0bd428-c11d-47f6-... [27 rows x 3 columns]
- pyku.check.compare_coordinates(ds1, ds2)[source]#
Check if coordinates are the same in both datasets.
- Parameters:
ds1 (
xarray.Dataset) – The first dataset.ds2 (
xarray.Dataset) – The second dataset.
- Returns:
The checks and issues.
- Return type:
Example
In [1]: import pyku ...: ...: ds1 = pyku.resources.get_test_data('model_data') ...: ds2 = pyku.resources.get_test_data('hyras') ...: ...: ds1.pyku.compare_coordinates(ds2) ...: Out[1]: key ... issue 0 coordinate_names_are_the_same ... Different keys {'x', 'rlon', 'y', 'height', 'r... [1 rows x 3 columns]
- pyku.check.compare_datasets(ds1, ds2)[source]#
Check the compatibility of two climate datasets:
Compare geographic alignment
Compare datasets datetimes
Compare datasets dimensions
Compare datasets coordinates
- Parameters:
ds1 (
xarray.Dataset) – The first dataset.ds2 (
xarray.Dataset) – The second dataset.
- Returns:
The checks and issues.
- Return type:
Example
In [1]: import pyku ...: ...: ds1 = pyku.resources.get_test_data('model_data') ...: ds2 = pyku.resources.get_test_data('hyras') ...: ...: ds1.pyku.compare_datasets(ds2) ...: Out[1]: key ... description 0 have_same_number_of_pixels_in_y_and_x_directions ... Check if number of pixels is the same in the y... 1 first_dataset_has_time_dimension ... NaN 2 second_dataset_has_time_dimension ... NaN 3 first_dataset_datetimes_are_numpy_datetime64 ... NaN 4 second_dataset_datetimes_are_numpy_datetime64 ... NaN 5 same_datetimes_in_both_datasets ... NaN 6 same_rounded_datetimes_in_both_datasets ... NaN 7 dimensions_names_equal ... NaN 8 coordinate_names_are_the_same ... NaN [9 rows x 4 columns]
- pyku.check.compare_datetimes(ds1, ds2)[source]#
Check if datetimes are the same in both datasets
- Parameters:
ds1 (
xarray.Dataset) – The first dataset.ds2 (
xarray.Dataset) – The second dataset.
- Returns:
The checks and issues.
- Return type:
Example
In [1]: import pyku ...: ...: ds1 = pyku.resources.get_test_data('model_data') ...: ds2 = pyku.resources.get_test_data('hyras') ...: ...: ds1.pyku.compare_datetimes(ds2) ...: Out[1]: key ... issue 0 first_dataset_has_time_dimension ... None 1 second_dataset_has_time_dimension ... None 2 first_dataset_datetimes_are_numpy_datetime64 ... None 3 second_dataset_datetimes_are_numpy_datetime64 ... None 4 same_datetimes_in_both_datasets ... The first 2 timesteps in the first dataset are... 5 same_rounded_datetimes_in_both_datasets ... The first 2 timesteps in the first dataset are... [6 rows x 3 columns]
- pyku.check.compare_dimensions(ds1, ds2)[source]#
Check if dimensions are the same in both datasets
- Parameters:
ds1 (
xarray.Dataset) – The first dataset.ds2 (
xarray.Dataset) – The second dataset.
- Returns:
The checks and issues.
- Return type:
Example
In [1]: import pyku ...: ...: ds1 = pyku.resources.get_test_data('model_data') ...: ds2 = pyku.resources.get_test_data('hyras') ...: ...: ds1.pyku.compare_dimensions(ds2) ...: Out[1]: key value issue 0 dimensions_names_equal True None
- pyku.check.compare_geographic_alignment(ds1, ds2, tolerance=None)[source]#
Check the alignment of georeferencing of two datasets
- Parameters:
ds1 (
xarray.Dataset) – The first dataset.ds2 (
xarray.Dataset) – The second dataset.tolerance (float) – Defaults to 0.001. Tolerance with respect to alignment. If the difference of any values from the geographic coordinates or projection coordinates does not fall within the tolerance, the function reports the difference.
- Returns:
The checks and issues.
- Return type:
Example
In [1]: import pyku ...: ...: ds1 = pyku.resources.get_test_data('model_data') ...: ds2 = pyku.resources.get_test_data('hyras') ...: ...: ds1.pyku.compare_geographic_alignment(ds2) ...: Out[1]: key ... description 0 have_same_number_of_pixels_in_y_and_x_directions ... Check if number of pixels is the same in the y... [1 rows x 4 columns]