pyku.timekit#

Functions for dealing with time

pyku.timekit.add_missing_time_labels(ds, frequency=None)[source]#

Add missing time labels to the dataset, filling corresponding data slices with NaN.

This function ensures that all expected time labels, based on the specified frequency, are present in the dataset. Missing time labels are inserted along with data slices filled entirely with NaN values.

Parameters:
  • ds (xarray.Dataset) – The input dataset.

  • frequency (str, mandatory) – The frequency string defining the expected time intervals (e.g., “1H” for hourly, “1D” for daily).

Returns:

The dataset with missing time labels added, where corresponding slices are filled with NaN values.

Return type:

xarray.Dataset

pyku.timekit.date_range(*args, **kwargs)[source]#

This function is deprecated as its functionality has been integrated into the xarray.date_range() function, which follows the same conventions.

pyku.timekit.filter_incomplete_datetimes(ds, frequency=None, freq_resampled=None)[source]#

Filter dataset for incomplete datetimes.

Use case 1

When analyzing hourly data to calculate daily averages, it is crucial to address instances where hourly data is incomplete. For instance, if the dataset begins at 2010-04-07T01:00 instead of 2010-04-07T00:00, the entire day of 2010-04-07 should be excluded from the analysis.

Use case 2

In the analysis of hourly data for computing daily averages, it is crucial to account for instances of incomplete hourly data. For example, if the dataset starts at 2010-04-07T01:00 rather than 2010-04-07T00:00, it’s necessary to exclude the entire day of 2010-04-07 from the analysis.

Parameters:
  • ds (xarray.Dataset) – The input dataset.

  • frequency (str) – Resampling frequency string (e.g. 1H, 1D).

  • freq_resampled (str) – Deprecated. Do not use.

Returns:

Dataset filtered for partial data.

Return type:

xarray.Dataset

References

https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases

pyku.timekit.resample_datetimes(ds, how=None, frequency=None, complete=False)[source]#

Resample along the time dimension.

During resampling, the time labels are set to the lower time bound if applicable. This behavior differs from the CMOR convention, which sets the time labels to the middle of the time bounds.

If needed, the time labels can be reset using the pyku.timekit.set_time_labels_from_time_bounds() function by passing the parameter how='middle'.

Parameters:
  • ds (xarray.Dataset) – The input dataset.

  • how (str) – The resampling method (mean, max, min, sum).

  • frequency (str) – The resampling frequency, e.g. 1H, 1D, 2MS. For seasonal outputs, use QS-DEC. The hydrological year divided into to two periods of 6 months periods (winter half year Nov-Apr and summer half year Mai-Oct) can be obtained with 2QS-MAY

  • complete (bool) – If True, filter groups of data with incomplete datetimes at the resampling frequency.

Returns:

The dataset resampled along the time dimension.

Return type:

xarray.Dataset

References

https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases

  • For yearly resampling, use YS (Year Start).

  • For monthly resampling, use MS (Month Start).

  • For seasonal resampling, use QS-DEC.

  • For half year outputs, use 2QS-MAY (Nov-Apr/Mai-Oct)

  • The use of M for month end is deprecated in the pandas library.

Example

In [1]: %%time
   ...: import pyku
   ...: 
   ...: ds = pyku.resources.get_test_data('low-res-hourly-tas-data')
   ...: 
   ...: print('First 10 timesteps before:')
   ...: print(ds.time.values[0:10])
   ...: 
   ...: ds = ds.pyku.resample_datetimes(frequency='1D', how='mean')
   ...: 
   ...: print('\nFirst 10 timesteps after:')
   ...: print(ds.time.values[0:10])
   ...: 
First 10 timesteps before:
['2020-01-01T00:50:00.000000000' '2020-01-01T01:50:00.000000000'
 '2020-01-01T02:50:00.000000000' '2020-01-01T03:50:00.000000000'
 '2020-01-01T04:50:00.000000000' '2020-01-01T05:50:00.000000000'
 '2020-01-01T06:50:00.000000000' '2020-01-01T07:50:00.000000000'
 '2020-01-01T08:50:00.000000000' '2020-01-01T09:50:00.000000000']

First 10 timesteps after:
['2020-01-01T00:00:00.000000000' '2020-01-02T00:00:00.000000000'
 '2020-01-03T00:00:00.000000000' '2020-01-04T00:00:00.000000000'
 '2020-01-05T00:00:00.000000000' '2020-01-06T00:00:00.000000000'
 '2020-01-07T00:00:00.000000000' '2020-01-08T00:00:00.000000000'
 '2020-01-09T00:00:00.000000000' '2020-01-10T00:00:00.000000000']
CPU times: user 11.2 s, sys: 4.52 s, total: 15.8 s
Wall time: 12 s
pyku.timekit.select_common_datetimes(ds1, ds2)[source]#

Select the datetimes common to both datasets.

Aruments:

ds1 (xarray.Dataset): The first dataset. ds2 (xarray.Dataset): The second dataset.

Returns:

Tuple of xarray.Dataset with common datetimes. If either dat_mod or dat_obs is None, the function just returns the inputs.

Return type:

xarray.Dataset

pyku.timekit.set_time_bounds_from_time_labels(ds, offset=None)[source]#

Infer time bounds from time labels.

Parameters:
  • ds (xarray.Dataset) – The input dataset.

  • offset (str) – Optional, default to None. Specifies an offset to adjust the time bounds, given as a frequency string. This is useful when the timestamp is, for example, 1986-01-01T12:00:00, and the desired time bounds are [1986-01-01T00:00:00, 1986-01-02T00:00:00]. For available offset aliases, refer to the Pandas documentation: https://pandas.pydata.org/docs/user_guide/timeseries.html#offset-aliases

Returns:

Dataset including time bounds.

Return type:

xarray.Dataset

Example

In [1]: import pyku
   ...: ds = pyku.resources.get_test_data('hostrada')
   ...: ds.pyku.set_time_bounds_from_time_labels()
   ...: 
Out[1]: 
<xarray.Dataset> Size: 4GB
Dimensions:    (time: 744, Y: 938, X: 720, bnds: 2)
Coordinates:
  * time       (time) datetime64[ns] 6kB 1995-01-01 ... 1995-01-31T23:00:00
  * Y          (Y) float32 4kB 2.242e+06 2.244e+06 ... 3.178e+06 3.18e+06
  * X          (X) float32 3kB 3.67e+06 3.672e+06 ... 4.388e+06 4.39e+06
    lat        (Y, X) float64 5MB ...
    lon        (Y, X) float64 5MB ...
Dimensions without coordinates: bnds
Data variables:
    crs        int64 8B ...
    tas        (time, Y, X) float64 4GB ...
    time_bnds  (time, bnds) datetime64[ns] 12kB 1995-01-01 ... 1995-02-01
Attributes: (12/32)
    Conventions:            CF-1.8
    title:                  HOSTRADA
    source:                 HOSTRADA v1.0: Hochaufgeloester Stuendlicher Rast...
    institution:            Deutscher Wetterdienst, Offenbach 63067, Germany
    project_id:             HOSTRADA
    realization:            v1.0
    ...                     ...
    variant_label:          BE
    version:                v20231215
    ConventionsURL:         http://cfconventions.org/Data/cf-conventions/cf-c...
    variant_info:           Best Estimate
    source_name:            HOSTRADA
    source_description:     Hochaufgeloester Stuendlicher Rasterdatensatz fue...
pyku.timekit.set_time_labels_from_time_bounds(ds, how='lower')[source]#

Set time labels to lower bound, middle of bounds, or upper bound.

Parameters:
  • ds (xarray.Dataset) – The input dataset.

  • how (str) – ‘lower’, ‘middle’, ‘upper’

Returns:

Dataset with time labels set to the lower bound of the time bounds.

Return type:

xarray.Dataset

Note

The CMOR standard is to use the middle of the bounds

Example

In [1]: import xarray, pyku
   ...: ds = pyku.resources.get_test_data('hyras-tas-monthly')
   ...: ds.pyku.set_time_labels_from_time_bounds(how='lower')
   ...: 
Out[1]: 
<xarray.Dataset> Size: 124MB
Dimensions:             (time: 24, y: 890, x: 665, bnds: 2)
Coordinates:
  * time                (time) datetime64[ns] 192B 1961-01-01 ... 1962-12-01
  * y                   (y) float64 7kB 2.674e+06 2.676e+06 ... 3.564e+06
  * x                   (x) float64 5kB 4.022e+06 4.022e+06 ... 4.686e+06
    lat                 (y, x) float64 5MB dask.array<chunksize=(890, 665), meta=np.ndarray>
    lon                 (y, x) float64 5MB dask.array<chunksize=(890, 665), meta=np.ndarray>
Dimensions without coordinates: bnds
Data variables:
    crs                 (time) int32 96B dask.array<chunksize=(24,), meta=np.ndarray>
    number_of_stations  (time) float64 192B dask.array<chunksize=(24,), meta=np.ndarray>
    tas                 (time, y, x) float64 114MB dask.array<chunksize=(24, 890, 665), meta=np.ndarray>
    time_bnds           (time, bnds) datetime64[ns] 384B dask.array<chunksize=(24, 2), meta=np.ndarray>
    x_bnds              (time, x, bnds) float64 255kB dask.array<chunksize=(24, 665, 2), meta=np.ndarray>
    y_bnds              (time, y, bnds) float64 342kB dask.array<chunksize=(24, 890, 2), meta=np.ndarray>
Attributes: (12/22)
    source:                 surface observations
    institution:            Deutscher Wetterdienst (DWD)
    Conventions:            CF-1.11
    title:                  gridded_temperature_dataset_(HYRAS TAS)
    realization:            v6-1
    project_id:             HYRAS
    ...                     ...
    ConventionsURL:         http://cfconventions.org/Data/cf-conventions/cf-c...
    license:                The HYRAS data, produced by DWD, is licensed unde...
    comment:                Please be aware that the parameters are stored as...
    filename:               tas_hyras_1_1961_v6-1_de_monmean.nc
    creation_date:          2025-07-04T06:55:30Z
    unique_dataset_id:      DWD_HYRAS_DE_tas_v6-1_1961_mon_159ccae4-6f61-462a...
pyku.timekit.split_by_datetimes(ds)[source]#

Subset a dataset by datetime according to the CMOR convention.

This function partitions large datasets into smaller, manageable segments following the Climate Model Output Rewriter (CMOR) convention. It generates subsets of the dataset organized by specified time intervals, such as yearly, 5-year, 10-year, or 100-year periods. By using a generator, this function optimizes memory usage by loading only the required data into memory as needed.

If the frequency cannot be inferred from the data or the dataset is large, an hourly frequency is assumed, and files are split into one-year segments. For smaller datasets, the original dataset is returned without modifications.

Parameters:

dataset (xarray.Dataset) – The input dataset containing time-based data to be split.

Yields:

xarray.Dataset – Subsets of the input dataset, grouped by appropriate time intervals based on the temporal resolution of the data.

pyku.timekit.to_calendar(ds, calendar=None, add_missing=False)[source]#

Process the calendar

Parameters:
  • ds (xarray.Dataset) – The input dataset.

  • calendar (str) – The output calendar {noleap, 360_day}.

  • add_missing (bool) – If set to True, NaN slices are created for any missing datetimes during the conversion process. Defaults to False. This option is useful when it is important to maintain a consistent time frequency across the dataset.

Returns:

Dataset with time labels set in the gregorian calender

Return type:

xarray.Dataset

Example

In [1]: import xarray, pyku
   ...: ds = pyku.resources.get_test_data('hyras')
   ...: ds.pyku.to_calendar(calendar='noleap')
   ...: 
Out[1]: 
<xarray.Dataset> Size: 139MB
Dimensions:             (time: 730, y: 178, x: 133, bnds: 2)
Coordinates:
  * time                (time) object 6kB 1981-01-01 00:00:00 ... 1982-12-31 ...
  * y                   (y) float64 1kB 3.562e+06 3.556e+06 ... 2.676e+06
  * x                   (x) float64 1kB 4.024e+06 4.028e+06 ... 4.684e+06
    lat                 (y, x) float64 189kB dask.array<chunksize=(178, 133), meta=np.ndarray>
    lon                 (y, x) float64 189kB dask.array<chunksize=(178, 133), meta=np.ndarray>
Dimensions without coordinates: bnds
Data variables:
    crs                 int32 4B ...
    number_of_stations  (time) float64 6kB dask.array<chunksize=(730,), meta=np.ndarray>
    tas                 (time, y, x) float64 138MB dask.array<chunksize=(730, 178, 133), meta=np.ndarray>
    time_bnds           (time, bnds) object 12kB 1981-01-01 00:00:00 ... 1983...
Attributes: (12/23)
    source:                 surface observations
    institution:            Deutscher Wetterdienst (DWD)
    Conventions:            CF-1.11
    title:                  gridded_temperature_dataset_(HYRAS TAS)
    realization:            v6-1
    project_id:             HYRAS
    ...                     ...
    ConventionsURL:         http://cfconventions.org/Data/cf-conventions/cf-c...
    license:                The HYRAS data, produced by DWD, is licensed unde...
    filename:               tas_hyras_1_1981_v6-1_de.nc
    comment:                Please be aware that the parameters are stored as...
    unique_dataset_id:      DWD_HYRAS_DE_tas_v6-1_1981_3a0bd428-c11d-47f6-9fb...
    CORDEX_domain:          undefined
pyku.timekit.to_gregorian_calendar(ds, add_missing=False)[source]#

Process the calendar

Parameters:
  • ds (xarray.Dataset) – The input dataset.

  • add_missing (bool) – If set to True, NaN slices are created for any missing datetimes during the conversion process. Defaults to False. This option is useful when it is important to maintain a consistent time frequency across the dataset.

Returns:

ataset with time labels set in the gregorian calender.

Return type:

xarray.Dataset

Example

In [1]: import xarray, pyku
   ...: ds = pyku.resources.get_test_data('cftime_data')
   ...: ds.pyku.to_gregorian_calendar()
   ...: 
Out[1]: 
<xarray.Dataset> Size: 23MB
Dimensions:    (time: 9855, y: 22, x: 26, bnds: 2)
Coordinates:
  * time       (time) datetime64[ns] 79kB 1979-01-01T12:00:00 ... 2005-12-31T...
  * y          (y) float64 176B 3.564e+06 3.514e+06 ... 2.564e+06 2.514e+06
  * x          (x) float64 208B 3.831e+06 3.88e+06 ... 5.022e+06 5.071e+06
    height     float64 8B ...
    lat        (y, x) float64 5kB ...
    lon        (y, x) float64 5kB ...
Dimensions without coordinates: bnds
Data variables:
    crs        int32 4B ...
    tas        (time, y, x) float32 23MB ...
    time_bnds  (time, bnds) datetime64[ns] 158kB 1979-01-01 ... 2006-01-01
Attributes: (12/32)
    institution:            CCCma (Canadian Centre for Climate Modelling and ...
    institute_id:           CCCma
    experiment_id:          historical
    source:                 CanESM2 2010 atmosphere: CanAM4 (AGCM15i, T63L35)...
    model_id:               CanESM2
    forcing:                GHG,Oz,SA,BC,OC,LU,Sl,Vl (GHG includes CO2,CH4,N2...
    ...                     ...
    title:                  CanESM2 model output prepared for CMIP5 historical
    parent_experiment:      pre-industrial control
    modeling_realm:         atmos
    realization:            1
    cmor_version:           2.5.4
    CORDEX_domain:          undefined