pyku.compute#

Processing library

pyku.compute.calc(ds)[source]#

Add ‘huss’, ‘hurs’ and ‘tdew’ to dataset if possible

Parameters:

ds (xarray.Dataset) – The input dataset.

Returns:

Dataset with all possible extended variables

included.

Return type:

xarray.Dataset

Examples

In [1]: import pyku
   ...: 
   ...: # Open test dataset and select a few time steps
   ...: # ---------------------------------------------
   ...: 
   ...: ds = pyku.resources.get_test_data('tas_hurs')
   ...: ds = ds.isel(time=[0,1,2])
   ...: 
   ...: # Calculate tdew and show data variables
   ...: # --------------------------------------
   ...: 
   ...: ds = ds.pyku.calc()
   ...: ds.data_vars
   ...: 
Out[1]: 
Data variables:
    crs        int32 4B ...
    hurs       (time, y, x) float32 45kB dask.array<chunksize=(3, 71, 53), meta=np.ndarray>
    tas        (time, y, x) float32 45kB dask.array<chunksize=(3, 71, 53), meta=np.ndarray>
    time_bnds  (time, bnds) datetime64[ns] 48B dask.array<chunksize=(3, 2), meta=np.ndarray>
    tdew       (time, y, x) float32 45kB dask.array<chunksize=(3, 71, 53), meta=np.ndarray>
pyku.compute.calc_degreeday(ds, period=None, data_frequency=None, complete=False)[source]#

Add degree day to dataset, calculated from ‘tas’

Parameters:
Returns:

The dataset with degreeday included.

Return type:

xarray.Dataset

pyku.compute.calc_globalwarminglevels(ds, GWL_levels=None, ref_period=None, navg=30, GWL_temp_offset=0.0, cellarea=None)[source]#

Calculate Global Warming Level central, start and end years, calculated from ‘tas’

Parameters:
  • ds (xarray.Dataset) – Input data containing ‘tas’ spanning reference and future time period

  • GWL_levels (list) – List of Global Warming Levels e.g.: [1.5, 2.0, 4.0]

  • ref_period (list) – Start and end year of reference (pre-industrial) period

  • navg (int) – (Optional) Length of GWL period (n-year averaging) Default value set to 30 years

  • GWL_temp_offset (float) – (Optional) Temperature offset for GWL calculation accounting for observational warming until given reference period

  • cellarea (xarray.DataArray) – (Optional) Array containing areas of grid cells for setting up the corresponding weights

Returns:

Dataframe of global warming levels and their corresponding start, central and end years

Return type:

pandas.DataFrame

References

mathause/Chapter-11

Examples

In [1]: %%time
   ...: import pyku
   ...: ds = pyku.resources.get_test_data(
   ...:     'CCCma_CanESM2_Amon_world'
   ...: )
   ...: ds.pyku.calc_globalwarminglevels(
   ...:    GWL_levels = [1.5, 2, 3, 4],
   ...:    ref_period = [1850,1900]
   ...: )
   ...: 
CPU times: user 556 ms, sys: 125 ms, total: 681 ms
Wall time: 369 ms
Out[1]: 
    1.5   2.0   3.0   4.0
0  1996  2007  2025  2039
1  2011  2022  2040  2054
2  2025  2036  2054  2068
In [2]: %%time
   ...: ds.pyku.calc_globalwarminglevels(
   ...:    GWL_levels=[1.5, 2, 3, 4],
   ...:    ref_period=[1850,1900],
   ...:    navg=20
   ...: )
   ...: 
CPU times: user 144 ms, sys: 10.7 ms, total: 154 ms
Wall time: 154 ms
Out[2]: 
    1.5   2.0   3.0   4.0
0  2002  2013  2030  2044
1  2012  2023  2040  2054
2  2021  2032  2049  2063
pyku.compute.calc_hurs(ds)[source]#

Calculate ‘hurs’ from ‘ps’, ‘tas’ and ‘huss’

Parameters:

ds (xarray.Dataset) – The Input data containing ‘ps’, ‘tas’ and ‘huss’.

Returns:

The dataset including tdew.

Return type:

xarray.Dataset

Examples

In [1]: import pyku
   ...: 
   ...: # Open test dataset and select a few time steps
   ...: # ---------------------------------------------
   ...: 
   ...: ds = pyku.resources.get_test_data('tas_ps_huss')
   ...: ds = ds.isel(time=[0,1,2])
   ...: 
   ...: # Calculate hurs and show data variables
   ...: # --------------------------------------
   ...: 
   ...: ds = ds.pyku.calc_hurs()
   ...: ds.data_vars
   ...: 
Out[1]: 
Data variables:
    crs        int32 4B ...
    huss       (time, y, x) float32 45kB dask.array<chunksize=(3, 71, 53), meta=np.ndarray>
    ps         (time, y, x) float32 45kB dask.array<chunksize=(3, 71, 53), meta=np.ndarray>
    tas        (time, y, x) float32 45kB dask.array<chunksize=(3, 71, 53), meta=np.ndarray>
    time_bnds  (time, bnds) datetime64[ns] 48B dask.array<chunksize=(3, 2), meta=np.ndarray>
    hurs       (time, y, x) float32 45kB dask.array<chunksize=(3, 71, 53), meta=np.ndarray>
pyku.compute.calc_huss(ds)[source]#

Calculate ‘huss’ from ‘ps’ and ‘tdew’

Parameters:

ds (xarray.Dataset) – The Input data containing ‘ps’ and ‘tdew’.

Returns:

The dataset including tdew.

Return type:

xarray.Dataset

pyku.compute.calc_ssim(mean_ref, mean_model, variance_ref, variance_model, covariance, c1=1e-08, c2=1e-08)[source]#

Calculate Structural Similarity Index Measure

Parameters:
  • mean_ref (float) – mean for reference values

  • mean_model (float) – mean for sample values

  • variance_ref (float) – variance for reference values

  • variance_model (float) – variance for sample values

  • covariance (float) – covariance for reference and sample values

  • c1 (float) – constant 1

  • c2 (float) – constant 2

Returns:

Structural Similarity Index Measure

Return type:

SSIM (float)

References: Wang et al. 2004 (DOI: 10.1109/tip.2003.819861)

with modifications c1 and c2 in default 1e-8 as suggested by Baker et al. 2022 (abs/2202.02616) to give the equal weight to the luminance and contrast components of SSIM and modification of Dalelane et al. (modref)

pyku.compute.calc_tdew(ds)[source]#

Add dew point temperature to dataset, calculated from ‘tas’ and ‘hurs’

Parameters:

ds (xarray.Dataset) – The input data containing ‘tas’ and ‘hurs’.

Returns:

The data with tdew included.

Return type:

xarray.Dataset

Examples

In [1]: import pyku
   ...: 
   ...: # Open test dataset and select a few time steps
   ...: # ---------------------------------------------
   ...: 
   ...: ds = pyku.resources.get_test_data('tas_hurs')
   ...: ds = ds.isel(time=[0,1,2])
   ...: 
   ...: # Calculate tdew and show data variables
   ...: # --------------------------------------
   ...: 
   ...: ds = ds.pyku.calc_tdew()
   ...: ds.data_vars
   ...: 
Out[1]: 
Data variables:
    crs        int32 4B ...
    hurs       (time, y, x) float32 45kB dask.array<chunksize=(3, 71, 53), meta=np.ndarray>
    tas        (time, y, x) float32 45kB dask.array<chunksize=(3, 71, 53), meta=np.ndarray>
    time_bnds  (time, bnds) datetime64[ns] 48B dask.array<chunksize=(3, 2), meta=np.ndarray>
    tdew       (time, y, x) float32 45kB dask.array<chunksize=(3, 71, 53), meta=np.ndarray>
pyku.compute.calc_windspeed(ds)[source]#

Given windspeed components, calculate the windspeed

Parameters:

ds (xarray.Dataset) – The input dataset.

Returns:

The dataset with windspeed included.

Return type:

xarray.Dataset

pyku.compute.inpainting(ds_in, roi=3, method='INPAINT_TELEA')[source]#

Warning

This function is unmaintained. Please contact a maintainer if you would like to keep this function.

Inpainting data correction

Parameters:
  • ds (xarray.Dataset) – Input data

  • method (str) – Defaults to “INPAINT_TELEA”. One of{“INPAINT_TELEA”, “INPAINT_NS”}.

  • roi (int) – Inpainting radius

Returns:

The inpainted dataset.

Return type:

xarray.Dataset

References

Example

import pyku, numpy

# Select some model data and break it
# -----------------------------------

broken = pyku.resources.get_test_data('model_data')\
.isel(time=[0, 1, 2, 3])
broken['tas'] = broken['tas'].where(
    (broken['tas']<263) | (broken['tas']>263.5),
    numpy.nan
)

# Repair the broken data
# ----------------------

repaired = ds.pyku.inpainting(roi=3)

# Set title and plot
# ------------------

broken = broken.assign_attrs({'label': 'Broken'})
repaired = repaired.assign_attrs({'label': 'Repaired'})

pyku.analyse.two_maps(
    broken.isel(time=0),
    repaired.isel(time=0),
    var = 'tas',
    crs='EUR-11'
)
pyku.compute.persistent_processing(func, files=None, tmpdir=None, identifier='pyku', persist=False, engine=None, unify_chunks=True, chunks={'time': -1}, extension='nc')[source]#

Apply a function to a list of files, save the results in a temporary directory, and return the names of the processed files.

This function enables efficient data processing by applying a user-defined function to each file in the input list. The processed files are stored in a temporary directory, and their paths are returned. It optimizes memory usage by chunking large datasets and processing them in smaller, manageable segments, especially useful for computationally expensive tasks or multiprocessing.

If the frequency cannot be inferred and the dataset size is large, an hourly frequency is assumed, and files are split into one-year segments. For smaller datasets, the original dataset is returned without modifications.

Key Benefits: * Optimized for large datasets and heavy computations. * Supports multiprocessing with chunked data. * Simplifies debugging by breaking down large files into smaller chunks.

Parameters:
  • func (function) – A function that accepts and returns an xarray.Dataset.

  • files (list) – A list of input data files to process.

  • tmpdir (str) – Path to the temporary directory for processed files.

  • unify_chunks (bool) – If True, synchronizes chunking across all dataset variables to prevent computation overhead and alignment errors. Note: This may increase initial memory usage and task graph complexity if the data are not chunked properly.

  • chunks (dict) – Specifies dimension-to-size mapping for data partitioning. Defaults to unchunked (single chunk) along the time dimension.

  • identifier (str, optional) – A string identifier to include in processed file names (default: ‘pyku’).

  • extension (str, optional) – Format for the output files, either nc or zarr

Returns:

A list of paths to the processed files.

Return type:

list

Examples

In [1]: import xarray as xr
   ...: import pyku
   ...: import tempfile
   ...: import pyku.resources
   ...: import pyku.compute
   ...: 
   ...: # Define list of files
   ...: # --------------------
   ...: 
   ...: files = pyku.resources.get_test_data("monthly_hyras_files")
   ...: 
   ...: # Create a temporary directory
   ...: # ----------------------------
   ...: 
   ...: # Alternatively, define your own directory where data
   ...: # should be located. Here since this jupyter notebook
   ...: # runs automatically, I merely intend to use the cleanup
   ...: # function the the tempfile library.
   ...: 
   ...: temp_dir = tempfile.TemporaryDirectory()
   ...: 
   ...: # Show the temporary directory name
   ...: # ---------------------------------
   ...: 
   ...: print("Temporary directory:", temp_dir.name)
   ...: 
   ...: # Get Polygon for Germany
   ...: # -----------------------
   ...: 
   ...: germany_polygon = pyku.resources.get_geodataframe('germany')
   ...: 
   ...: # Define a preprocessing function
   ...: # -------------------------------
   ...: 
   ...: def preprocessing(ds):
   ...:     ds = ds.pyku.project('HYR-LAEA-5')
   ...:     ds = ds.pyku.apply_mask(germany_polygon)
   ...:     return ds
   ...: 
   ...: # Semi-permanently preprocess the files
   ...: # -------------------------------------
   ...: 
   ...: preprocessed_files = pyku.compute.persistent_processing(
   ...:     func=preprocessing,
   ...:     files=files,
   ...:     tmpdir=temp_dir.name,
   ...:     identifier='my-pre-processed-data',
   ...: )
   ...: 
   ...: # Print list of preprocessed files
   ...: # --------------------------------
   ...: 
   ...: print(preprocessed_files)
   ...: 
   ...: # Cleanup the temporary directory
   ...: # -------------------------------
   ...: 
   ...: temp_dir.cleanup()
   ...: 
Temporary directory: /tmp/tmpkprm4lr5
['/tmp/tmpkprm4lr5/tas_hyras_1_1961_v6-1_de_monmean-my-pre-processed-data-e8057cab55e1b77c225ba0b9e1cc0ede-00000000.nc', '/tmp/tmpkprm4lr5/tas_hyras_1_1962_v6-1_de_monmean-my-pre-processed-data-9487fdee784c5c1fb2e9b3b7777bd70a-00000000.nc']