pyku.check#

Functions for checking data

See also

pyku configuration files:

  • ./pyku/etc/drs.yaml

  • ./pyku/etc/metadata.yaml

pyku.check.check(ds, standard=None, completeness_period=None, all_nan_slices=False)[source]#

Perform the following checks:

  • If any all NaN slice is found,

  • valid bounds,

  • georeferencing,

  • units,

  • CMOR variable names,

  • frequency,

  • the role of variables

  • if all timestamps are available within completeness period

Parameters:
  • ds (xarray.Dataset) – The input dataset.

  • standard (str) – Optional, defaults to None. cordex, obs4mips, cordex_adjust, cordex_adjust_interp or any standard implemented in the pyku configuration file ./pyku/etc/drs.yaml. If None, compliance of the metadata with a standard is not checked.

  • completeness_period (freqstr) – Optional frequency string (e.g. ‘1MS’, ‘1YS’). Defaults to None. If given, it will be checked with the given data frequency if all timestamps are available. Possible values can be found at: https://pandas.pydata.org/docs/user_guide/timeseries.html#offset-aliases

  • all_nan_slices (bool) – Defaults to true, optional. Check if slices with only NaNs exist in dataset

Returns:

The checks and issues.

Return type:

pandas.DataFrame

Example

In [1]: import pyku
   ...: ds = pyku.resources.get_test_data('hyras')
   ...: ds.pyku.check(standard='obs4mips')
   ...: 
Out[1]: 
                                                key  ...                                        description
0                                   tas above 500.0  ...                                                NaN
1                                    tas below 50.0  ...                                                NaN
2                     y_projection_coordinate_exist  ...      Checking if y projection coordinate available
3                     x_projection_coordinate_exist  ...      Checking if x projection coordinate available
4                   lat_geographic_coordinate_exist  ...    Checking if lat geographic coordinate available
5                   lon_geographic_coordinate_exist  ...    Checking if lon geographic coordinate available
6              y_projection_coordinate_unit_correct  ...         Checking if y projection coordinates units
7              x_projection_coordinate_unit_correct  ...         Checking if x projection coordinates units
8     y_projection_coordinate_standard_name_correct  ...    Checking y projection coordinates standard_name
9     x_projection_coordinate_standard_name_correct  ...    Checking x projection coordinates standard_name
10           lat_geographic_coordinate_unit_correct  ...           Checking lat geographic coordinate units
11           lon_geographic_coordinate_unit_correct  ...        Checking if lat geographic coordinate units
12  lat_geographic_coordinate_standard_name_correct  ...   Checking lat geographic coordinate standard_name
13  lon_geographic_coordinate_standard_name_correct  ...   Checking lon geographic coordinate standard_name
14                             cf_area_def_readable  ...       Check if CF projection metadata are readable
15                          area_extent_is_readable  ...  Check if the area extent can be determined fro...
16                  longitudes_within_180W_and_180E  ...  Check that the longitudes are within 180 degre...
17                            tas_units_can_be_read  ...           Check if units can be read automatically
18                                              tas  ...                                                NaN
19                            is_cmor_standard_name  ...  If possible, check if standard name is CMOR co...
20                                is_cmor_long_name  ...                 Check if long_name is CMOR conform
21                                    is_cmor_units  ...                     Check if units is CMOR conform
22              frequency_can_be_inferred_from_data  ...      Tried to infer frequency from the time labels
23                      frequency_can_be_determined  ...  Check if frequency can be determined from the ...
24                                     geodata_vars  ...                                                NaN
25                                geographic_latlon  ...                                                NaN
26                                    projection_yx  ...                                                NaN
27                              time_dependent_vars  ...                                                NaN
28                                  time_bounds_var  ...                                                NaN
29                              spatial_bounds_vars  ...                                                NaN
30                            spatial_vertices_vars  ...                                                NaN
31                                          crs_var  ...                                                NaN
32                                unidentified_vars  ...                                                NaN
33                               has_time_dimension  ...                Check if data have a time dimension
34               time_is_numpy_datetime64_or_cftime  ...             Check the data type of the time stamps
35                 time_stamps_are_midnight_or_noon  ...       Check if all timestamps are midnight or noon
36                                      variable_id  ...                                                NaN
37                                        frequency  ...                                                NaN
38                                        source_id  ...                                                NaN
39                                    variant_label  ...                                                NaN
40                                       grid_label  ...                                                NaN
41                                      activity_id  ...                                                NaN
42                                   institution_id  ...                                                NaN
43                                        source_id  ...                                                NaN
44                                        frequency  ...                                                NaN
45                                      variable_id  ...                                                NaN
46                                       grid_label  ...                                                NaN
47                                          version  ...                                                NaN

[48 rows x 4 columns]
pyku.check.check_allnan_slices(ds)[source]#

Check for allnan slices along time

Parameters:

ds (xarray.Dataset) – The input dataset.

Returns:

The checks and issues.

Return type:

pandas.DataFrame

Examples

In [1]: import pyku
   ...: 
   ...: ds = pyku.resources.get_test_data('hyras')
   ...: 
   ...: ds.pyku.check_allnan_slices()
   ...: 
Out[1]: 
          key                    value issue
0  tas allnan  730 time labels for tas  None
pyku.check.check_cmor_varnames(ds)[source]#

Check if variable names are CMOR-conform

Parameters:

ds (xarray.Dataset) – The input dataset.

Returns:

The checks and issues.

Return type:

pandas.DataFrame

Example

In [1]: import pyku
   ...: ds = pyku.resources.get_test_data('hyras')
   ...: ds.pyku.check_cmor_varnames()
   ...: 
Out[1]: 
   key value issue
0  tas   tas  None
pyku.check.check_datetime_completeness(ds, frequency)[source]#

Check data completeness for a given frequency/period. Note that the function says frequency when really, a period is needed.

Parameters:
Returns:

The checks and issues.

Return type:

pandas.DataFrame

Example

In [1]: import pyku
   ...: ds = pyku.resources.get_test_data('hyras')
   ...: ds.pyku.check_datetime_completeness(frequency='1MS')
   ...: 
Out[1]: 
                                  key  ... issue
0                      Time dimension  ...  None
1   Type of datetime in first dataset  ...  None
2                           Datetimes  ...  None
3                           Datetimes  ...  None
4                           Datetimes  ...  None
5                           Datetimes  ...  None
6                           Datetimes  ...  None
7                           Datetimes  ...  None
8                           Datetimes  ...  None
9                           Datetimes  ...  None
10                          Datetimes  ...  None
11                          Datetimes  ...  None
12                          Datetimes  ...  None
13                          Datetimes  ...  None
14                          Datetimes  ...  None
15                          Datetimes  ...  None
16                          Datetimes  ...  None
17                          Datetimes  ...  None
18                          Datetimes  ...  None
19                          Datetimes  ...  None
20                          Datetimes  ...  None
21                          Datetimes  ...  None
22                          Datetimes  ...  None
23                          Datetimes  ...  None
24                          Datetimes  ...  None
25                          Datetimes  ...  None

[26 rows x 3 columns]
pyku.check.check_datetimes(ds)[source]#

Check datetimes.

Parameters:

ds (xarray.Dataset) – The input dataset.

Returns:

The checks and issues.

Return type:

pandas.DataFrame

Example

In [1]: import pyku
   ...: ds = pyku.resources.get_test_data('hyras')
   ...: ds.pyku.check_datetimes()
   ...: 
Out[1]: 
                                  key  ...                                   description
0                  has_time_dimension  ...           Check if data have a time dimension
1  time_is_numpy_datetime64_or_cftime  ...        Check the data type of the time stamps
2    time_stamps_are_midnight_or_noon  ...  Check if all timestamps are midnight or noon

[3 rows x 4 columns]
pyku.check.check_drs(ds, standard=None)[source]#

Check metadata for Data Reference Syntax (DRS)

Parameters:
  • ds (xarray.Dataset) – The input dataset.

  • standards (str) – Standard, can be one of ‘cordex’, ‘cordex_adjust’, ‘obs4mips’, or ‘cordex_adjust_interp’.

Returns:

The checks and issues.

Return type:

pandas.DataFrame

Example

In [1]: import pyku
   ...: ds = pyku.resources.get_test_data('hyras')
   ...: ds.pyku.check_drs(standard='cordex')
   ...: 
Out[1]: 
                             key         value          issue
0                  CORDEX_domain     undefined           None
1               driving_model_id          None  missing value
2        driving_experiment_name          None  missing value
3  driving_model_ensemble_member          None  missing value
4                       model_id          None  missing value
5                 rcm_version_id          None  missing value
6                      frequency           day           None
7                        product  observations           None
8                   institute_id          None  missing value
9                  experiment_id          None  missing value
pyku.check.check_files(list_of_files, standard=None, completeness_period=None, progress=False)[source]#

Warning

Do not use this function as this may be taken out in the near future.

Check list of files.

Parameters:
  • list_of_files (list) – List of files to be checked

  • standard (str) – Standard (e.g. ‘cordex’), defaults to None. If ‘None’, the standard metadata are not checked.

  • completeness_period (freqstr) – The files will be checked for completeness within the defined period (e.g. ‘1MS’).

Returns:

Issues

Return type:

pandas.DataFrame

pyku.check.check_files_multi(list_of_files, standard='cordex', completeness_period=None)[source]#

Check list of files (multiprocessed version). This function should not be used as the multiprocessing should run on each files, instead of loading many files at once and running them in parallel.

Todo

  • Check if functional outside of dask distributed

  • Write docstring

Parameters:

list_of_files (list) – List of files to be checked

Returns:

Issues

Return type:

pandas.DataFrame

pyku.check.check_frequency(ds)[source]#

Check frequency from time labels and time bounds. If the period between consecutive data is not homogenous, an issue is raised.

Parameters:

ds (xarray.Dataset) – The input dataset.

Returns:

The checks and issues.

Return type:

pandas.DataFrame

Examples

In [1]: import pyku
   ...: ds = pyku.resources.get_test_data('hyras')
   ...: ds.pyku.check_frequency()
   ...: 
Out[1]: 
                                   key  ...                                        description
0  frequency_can_be_inferred_from_data  ...      Tried to infer frequency from the time labels
1          frequency_can_be_determined  ...  Check if frequency can be determined from the ...

[2 rows x 4 columns]
pyku.check.check_georeferencing(ds)[source]#

Check georeferencing

Parameters:

ds (xarray.Dataset) – The input dataset.

Returns:

The checks and issues.

Return type:

pandas.DataFrame

Examples

In [1]: import pyku
   ...: 
   ...: ds = pyku.resources.get_test_data('hyras')
   ...: 
   ...: ds.pyku.check_georeferencing()
   ...: 
Out[1]: 
                                                key  ...                                        description
0                     y_projection_coordinate_exist  ...      Checking if y projection coordinate available
1                     x_projection_coordinate_exist  ...      Checking if x projection coordinate available
2                   lat_geographic_coordinate_exist  ...    Checking if lat geographic coordinate available
3                   lon_geographic_coordinate_exist  ...    Checking if lon geographic coordinate available
4              y_projection_coordinate_unit_correct  ...         Checking if y projection coordinates units
5              x_projection_coordinate_unit_correct  ...         Checking if x projection coordinates units
6     y_projection_coordinate_standard_name_correct  ...    Checking y projection coordinates standard_name
7     x_projection_coordinate_standard_name_correct  ...    Checking x projection coordinates standard_name
8            lat_geographic_coordinate_unit_correct  ...           Checking lat geographic coordinate units
9            lon_geographic_coordinate_unit_correct  ...        Checking if lat geographic coordinate units
10  lat_geographic_coordinate_standard_name_correct  ...   Checking lat geographic coordinate standard_name
11  lon_geographic_coordinate_standard_name_correct  ...   Checking lon geographic coordinate standard_name
12                             cf_area_def_readable  ...       Check if CF projection metadata are readable
13                          area_extent_is_readable  ...  Check if the area extent can be determined fro...
14                  longitudes_within_180W_and_180E  ...  Check that the longitudes are within 180 degre...

[15 rows x 4 columns]
pyku.check.check_metadata(ds, standard=None, completeness_period=None)[source]#

Perform the following checks:

  • georeferencing,

  • units,

  • CMOR variable names,

  • CMOR variables metdata

  • frequency,

  • the role of variables

  • CMOR standard

  • Completeness of data over a given period

The difference with pyku.check.check() is that the resource intensive testing function like checking for all-nan slices or checking time bounds are left out.

Parameters:
  • ds (xarray.Dataset) – The input dataset.

  • standard (str) – Optional standard. One of cordex, obs4mips, cordex_adjust, cordex_adjust_interp or any standard implemented in pyku configuration file ./pyku/etc/drs.yaml. If None, compliance of metadata with a standard is not checked.

  • completeness_period (freqstr) – Frequency string (e.g. ‘1MS’, ‘1YS’). It will then be checked with the given data frequency if all timestamps are available. Possible values can be found at: https://pandas.pydata.org/docs/user_guide/timeseries.html#offset-aliases

Returns:

The checks and issues.

Return type:

pandas.DataFrame

Example

In [1]: %%time
   ...: import pyku
   ...: ds = pyku.resources.get_test_data('hyras')
   ...: ds.pyku.check_metadata(standard='obs4mips')
   ...: 
CPU times: user 126 ms, sys: 15.9 ms, total: 142 ms
Wall time: 140 ms
Out[1]: 
                                                key  ...                                        description
0                     y_projection_coordinate_exist  ...      Checking if y projection coordinate available
1                     x_projection_coordinate_exist  ...      Checking if x projection coordinate available
2                   lat_geographic_coordinate_exist  ...    Checking if lat geographic coordinate available
3                   lon_geographic_coordinate_exist  ...    Checking if lon geographic coordinate available
4              y_projection_coordinate_unit_correct  ...         Checking if y projection coordinates units
5              x_projection_coordinate_unit_correct  ...         Checking if x projection coordinates units
6     y_projection_coordinate_standard_name_correct  ...    Checking y projection coordinates standard_name
7     x_projection_coordinate_standard_name_correct  ...    Checking x projection coordinates standard_name
8            lat_geographic_coordinate_unit_correct  ...           Checking lat geographic coordinate units
9            lon_geographic_coordinate_unit_correct  ...        Checking if lat geographic coordinate units
10  lat_geographic_coordinate_standard_name_correct  ...   Checking lat geographic coordinate standard_name
11  lon_geographic_coordinate_standard_name_correct  ...   Checking lon geographic coordinate standard_name
12                             cf_area_def_readable  ...       Check if CF projection metadata are readable
13                          area_extent_is_readable  ...  Check if the area extent can be determined fro...
14                  longitudes_within_180W_and_180E  ...  Check that the longitudes are within 180 degre...
15                            tas_units_can_be_read  ...           Check if units can be read automatically
16                                              tas  ...                                                NaN
17                            is_cmor_standard_name  ...  If possible, check if standard name is CMOR co...
18                                is_cmor_long_name  ...                 Check if long_name is CMOR conform
19                                    is_cmor_units  ...                     Check if units is CMOR conform
20              frequency_can_be_inferred_from_data  ...      Tried to infer frequency from the time labels
21                      frequency_can_be_determined  ...  Check if frequency can be determined from the ...
22                                     geodata_vars  ...                                                NaN
23                                geographic_latlon  ...                                                NaN
24                                    projection_yx  ...                                                NaN
25                              time_dependent_vars  ...                                                NaN
26                                  time_bounds_var  ...                                                NaN
27                              spatial_bounds_vars  ...                                                NaN
28                            spatial_vertices_vars  ...                                                NaN
29                                          crs_var  ...                                                NaN
30                                unidentified_vars  ...                                                NaN
31                               has_time_dimension  ...                Check if data have a time dimension
32               time_is_numpy_datetime64_or_cftime  ...             Check the data type of the time stamps
33                 time_stamps_are_midnight_or_noon  ...       Check if all timestamps are midnight or noon
34                                      variable_id  ...                                                NaN
35                                        frequency  ...                                                NaN
36                                        source_id  ...                                                NaN
37                                    variant_label  ...                                                NaN
38                                       grid_label  ...                                                NaN
39                                      activity_id  ...                                                NaN
40                                   institution_id  ...                                                NaN
41                                        source_id  ...                                                NaN
42                                        frequency  ...                                                NaN
43                                      variable_id  ...                                                NaN
44                                       grid_label  ...                                                NaN
45                                          version  ...                                                NaN

[46 rows x 4 columns]
pyku.check.check_units(ds)[source]#

Check units

Parameters:

ds (xarray.Dataset) – The input dataset.

Returns:

The checks and issues.

Return type:

pandas.DataFrame

Example

In [1]: import pyku
   ...: ds = pyku.resources.get_test_data('hyras')
   ...: ds.pyku.check_units()
   ...: 
Out[1]: 
                     key  value issue                               description
0  tas_units_can_be_read   True  None  Check if units can be read automatically
pyku.check.check_valid_bounds(ds, bounds=None)[source]#

Check bounds

Parameters:
  • ds (xarray.Dataset) – The input dataset.

  • bounds (dict) – Nested dictionary.

Returns:

The checks and issues.

Return type:

pandas.DataFrame

Examples

In [1]: import pyku
   ...: ds = pyku.resources.get_test_data('hyras')
   ...: ds.pyku.check_valid_bounds()
   ...: 
Out[1]: 
               key                     value issue
0  tas above 500.0  0 values above threshold  None
1   tas below 50.0  0 values below threshold  None
In [2]: import pyku
   ...: ds = pyku.resources.get_test_data('hyras')
   ...: ds.pyku.check_valid_bounds(
   ...:     bounds = {
   ...:         'tas': {
   ...:             'units': 'celsius',
   ...:             'valid_bounds': [1, 20]
   ...:         }
   ...:     }
   ...: )
   ...: 
Out[2]: 
              key  ...                                              issue
0  tas above 20.0  ...  Shape (730, 178, 133) First 50 indices: [[128 ...
1   tas below 1.0  ...  Shape (730, 178, 133) First 50 indices: {where...

[2 rows x 3 columns]
pyku.check.check_variables_cmor_metadata(ds)[source]#

Check variable CMOR metadata (‘standard_name’, ‘long_name’ and ‘units’)

Parameters:

ds (xarray.Dataset) – The input dataset.

Returns:

The checks and issues.

Return type:

pandas.DataFrame

Example

In [1]: import pyku
   ...: ds = pyku.resources.get_test_data('air_temperature')
   ...: ds.pyku.check_variables_cmor_metadata()
   ...: 
Out[1]: 
                     key  ...                                        description
0  is_cmor_standard_name  ...  If possible, check if standard name is CMOR co...
1      is_cmor_long_name  ...                 Check if long_name is CMOR conform
2          is_cmor_units  ...                     Check if units is CMOR conform

[3 rows x 4 columns]
pyku.check.check_variables_role(ds)[source]#

Look for variables which role is not identified. Identified roles for variables are coordinate reference system, spatial bounds, spatial vertices, geographic longitude, geographic latitude, projection coordinate x, projection coordinate y, georeferenced data

Parameters:

ds (xarray.Dataset) – The input dataset.

Returns:

The checks and issues.

Return type:

pandas.DataFrame

Example

In [1]: import pyku
   ...: ds = pyku.resources.get_test_data('hyras')
   ...: ds.pyku.check_variables_role()
   ...: 
Out[1]: 
                     key                                 value issue
0           geodata_vars                                 [tas]  None
1      geographic_latlon                            (lat, lon)  None
2          projection_yx                                (y, x)  None
3    time_dependent_vars  [number_of_stations, tas, time_bnds]  None
4        time_bounds_var                             time_bnds  None
5    spatial_bounds_vars                                    []  None
6  spatial_vertices_vars                                    []  None
7                crs_var                                   crs  None
8      unidentified_vars                  [number_of_stations]  None
pyku.check.compare_attrs(ds1, ds2, var=None)[source]#

Compare global or variable attrs

Parameters:
  • ds1 (xarray.Dataset) – The first input dataset.

  • ds2 (xarray.Dataset) – The second input dataset.

  • var (str) – Variable name. Defaults to None. If variable is None, the global attributes are compared, otherwise the variable attributes are analyzed.

Returns:

The checks and issues.

Return type:

pandas.DataFrame

Example

In [1]: import pyku
   ...: 
   ...: ds1 = pyku.resources.get_test_data('model_data')
   ...: ds2 = pyku.resources.get_test_data('hyras')
   ...: 
   ...: ds1.pyku.compare_attrs(ds2)
   ...: 
Out[1]: 
                               differences  ...                                          dataset 2
0              attrs['driving_experiment']  ...                                               None
1         attrs['driving_experiment_name']  ...                                               None
2   attrs['driving_model_ensemble_member']  ...                                               None
3                attrs['driving_model_id']  ...                                               None
4                      attrs['experiment']  ...                                               None
5                   attrs['experiment_id']  ...                                               None
6                    attrs['institute_id']  ...                                               None
7                        attrs['model_id']  ...                                               None
8                  attrs['rcm_version_id']  ...                                               None
9                  attrs['rossby_comment']  ...                                               None
10               attrs['rossby_grib_path']  ...                                               None
11                  attrs['rossby_run_id']  ...                                               None
12                    attrs['tracking_id']  ...                                               None
13                         attrs['source']  ...                               surface observations
14                          attrs['title']  ...            gridded_temperature_dataset_(HYRAS TAS)
15                    attrs['realization']  ...                                               v6-1
16              attrs['input_data_status']  ...                                            checked
17                          attrs['realm']  ...                                              atmos
18                     attrs['level_type']  ...                                            surface
19          attrs['horizontal_resolution']  ...                                               1_km
20                         attrs['author']  ...                          Climate Monitoring (KU21)
21                    attrs['variable_id']  ...                                                tas
22                 attrs['ConventionsURL']  ...  http://cfconventions.org/Data/cf-conventions/c...
23                        attrs['license']  ...  The HYRAS data, produced by DWD, is licensed u...
24                       attrs['filename']  ...                        tas_hyras_1_1981_v6-1_de.nc
25                        attrs['comment']  ...  Please be aware that the parameters are stored...
26              attrs['unique_dataset_id']  ...  DWD_HYRAS_DE_tas_v6-1_1981_3a0bd428-c11d-47f6-...

[27 rows x 3 columns]
pyku.check.compare_coordinates(ds1, ds2)[source]#

Check if coordinates are the same in both datasets.

Parameters:
Returns:

The checks and issues.

Return type:

pandas.DataFrame

Example

In [1]: import pyku
   ...: 
   ...: ds1 = pyku.resources.get_test_data('model_data')
   ...: ds2 = pyku.resources.get_test_data('hyras')
   ...: 
   ...: ds1.pyku.compare_coordinates(ds2)
   ...: 
Out[1]: 
                             key  ...                                              issue
0  coordinate_names_are_the_same  ...  Different keys {'x', 'rlon', 'y', 'height', 'r...

[1 rows x 3 columns]
pyku.check.compare_datasets(ds1, ds2)[source]#

Check the compatibility of two climate datasets:

  • Compare geographic alignment

  • Compare datasets datetimes

  • Compare datasets dimensions

  • Compare datasets coordinates

Parameters:
Returns:

The checks and issues.

Return type:

pandas.DataFrame

Example

In [1]: import pyku
   ...: 
   ...: ds1 = pyku.resources.get_test_data('model_data')
   ...: ds2 = pyku.resources.get_test_data('hyras')
   ...: 
   ...: ds1.pyku.compare_datasets(ds2)
   ...: 
Out[1]: 
                                                key  ...                                        description
0  have_same_number_of_pixels_in_y_and_x_directions  ...  Check if number of pixels is the same in the y...
1                  first_dataset_has_time_dimension  ...                                                NaN
2                 second_dataset_has_time_dimension  ...                                                NaN
3      first_dataset_datetimes_are_numpy_datetime64  ...                                                NaN
4     second_dataset_datetimes_are_numpy_datetime64  ...                                                NaN
5                   same_datetimes_in_both_datasets  ...                                                NaN
6           same_rounded_datetimes_in_both_datasets  ...                                                NaN
7                            dimensions_names_equal  ...                                                NaN
8                     coordinate_names_are_the_same  ...                                                NaN

[9 rows x 4 columns]
pyku.check.compare_datetimes(ds1, ds2)[source]#

Check if datetimes are the same in both datasets

Parameters:
Returns:

The checks and issues.

Return type:

pandas.DataFrame

Example

In [1]: import pyku
   ...: 
   ...: ds1 = pyku.resources.get_test_data('model_data')
   ...: ds2 = pyku.resources.get_test_data('hyras')
   ...: 
   ...: ds1.pyku.compare_datetimes(ds2)
   ...: 
Out[1]: 
                                             key  ...                                              issue
0               first_dataset_has_time_dimension  ...                                               None
1              second_dataset_has_time_dimension  ...                                               None
2   first_dataset_datetimes_are_numpy_datetime64  ...                                               None
3  second_dataset_datetimes_are_numpy_datetime64  ...                                               None
4                same_datetimes_in_both_datasets  ...  The first 2 timesteps in the first dataset are...
5        same_rounded_datetimes_in_both_datasets  ...  The first 2 timesteps in the first dataset are...

[6 rows x 3 columns]
pyku.check.compare_dimensions(ds1, ds2)[source]#

Check if dimensions are the same in both datasets

Parameters:
Returns:

The checks and issues.

Return type:

pandas.DataFrame

Example

In [1]: import pyku
   ...: 
   ...: ds1 = pyku.resources.get_test_data('model_data')
   ...: ds2 = pyku.resources.get_test_data('hyras')
   ...: 
   ...: ds1.pyku.compare_dimensions(ds2)
   ...: 
Out[1]: 
                      key  value issue
0  dimensions_names_equal   True  None
pyku.check.compare_geographic_alignment(ds1, ds2, tolerance=None)[source]#

Check the alignment of georeferencing of two datasets

Parameters:
  • ds1 (xarray.Dataset) – The first dataset.

  • ds2 (xarray.Dataset) – The second dataset.

  • tolerance (float) – Defaults to 0.001. Tolerance with respect to alignment. If the difference of any values from the geographic coordinates or projection coordinates does not fall within the tolerance, the function reports the difference.

Returns:

The checks and issues.

Return type:

pandas.DataFrame

Example

In [1]: import pyku
   ...: 
   ...: ds1 = pyku.resources.get_test_data('model_data')
   ...: ds2 = pyku.resources.get_test_data('hyras')
   ...: 
   ...: ds1.pyku.compare_geographic_alignment(ds2)
   ...: 
Out[1]: 
                                                key  ...                                        description
0  have_same_number_of_pixels_in_y_and_x_directions  ...  Check if number of pixels is the same in the y...

[1 rows x 4 columns]