pyku.compute#
Processing library
- pyku.compute.calc(ds)[source]#
Add ‘huss’, ‘hurs’ and ‘tdew’ to dataset if possible
- Parameters:
ds (
xarray.Dataset) – The input dataset.- Returns:
- Dataset with all possible extended variables
included.
- Return type:
Examples
In [1]: import pyku ...: ...: # Open test dataset and select a few time steps ...: # --------------------------------------------- ...: ...: ds = pyku.resources.get_test_data('tas_hurs') ...: ds = ds.isel(time=[0,1,2]) ...: ...: # Calculate tdew and show data variables ...: # -------------------------------------- ...: ...: ds = ds.pyku.calc() ...: ds.data_vars ...: Out[1]: Data variables: crs int32 4B ... hurs (time, y, x) float32 45kB dask.array<chunksize=(3, 71, 53), meta=np.ndarray> tas (time, y, x) float32 45kB dask.array<chunksize=(3, 71, 53), meta=np.ndarray> time_bnds (time, bnds) datetime64[ns] 48B dask.array<chunksize=(3, 2), meta=np.ndarray> tdew (time, y, x) float32 45kB dask.array<chunksize=(3, 71, 53), meta=np.ndarray>
- pyku.compute.calc_degreeday(ds, period=None, data_frequency=None, complete=False)[source]#
Add degree day to dataset, calculated from ‘tas’
- Parameters:
ds (
xarray.Dataset) – The input data containing ‘tas’.period (str) – (Optional) Period, e.g.
7Dor1W. For a full list, see https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases If ‘7D’ is given, the degree-days are calculated over 7 days. If ‘1W’ is given, the degree-days are calculated for each calender week.data_frequency (str) – (Optional) data frequency: e.g.
7Dor1W. For a full list, see https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases If not given explicitely, the data frequency is extracted from the data itself by looking at the time bounds, or calculated from the time labels.complete – (Optional) Use only complete data over the input period. For example, if
periodis 7 days and the data contains 16 days, only the first 14 days will be taken into account and the last 2 days are discared.
- Returns:
The dataset with degreeday included.
- Return type:
- pyku.compute.calc_globalwarminglevels(ds, GWL_levels=None, ref_period=None, navg=30, GWL_temp_offset=0.0, cellarea=None)[source]#
Calculate Global Warming Level central, start and end years, calculated from ‘tas’
- Parameters:
ds (
xarray.Dataset) – Input data containing ‘tas’ spanning reference and future time periodGWL_levels (list) – List of Global Warming Levels e.g.: [1.5, 2.0, 4.0]
ref_period (list) – Start and end year of reference (pre-industrial) period
navg (int) – (Optional) Length of GWL period (n-year averaging) Default value set to 30 years
GWL_temp_offset (float) – (Optional) Temperature offset for GWL calculation accounting for observational warming until given reference period
cellarea (
xarray.DataArray) – (Optional) Array containing areas of grid cells for setting up the corresponding weights
- Returns:
Dataframe of global warming levels and their corresponding start, central and end years
- Return type:
References
Examples
In [1]: %%time ...: import pyku ...: ds = pyku.resources.get_test_data( ...: 'CCCma_CanESM2_Amon_world' ...: ) ...: ds.pyku.calc_globalwarminglevels( ...: GWL_levels = [1.5, 2, 3, 4], ...: ref_period = [1850,1900] ...: ) ...: CPU times: user 556 ms, sys: 125 ms, total: 681 ms Wall time: 369 ms Out[1]: 1.5 2.0 3.0 4.0 0 1996 2007 2025 2039 1 2011 2022 2040 2054 2 2025 2036 2054 2068
In [2]: %%time ...: ds.pyku.calc_globalwarminglevels( ...: GWL_levels=[1.5, 2, 3, 4], ...: ref_period=[1850,1900], ...: navg=20 ...: ) ...: CPU times: user 144 ms, sys: 10.7 ms, total: 154 ms Wall time: 154 ms Out[2]: 1.5 2.0 3.0 4.0 0 2002 2013 2030 2044 1 2012 2023 2040 2054 2 2021 2032 2049 2063
- pyku.compute.calc_hurs(ds)[source]#
Calculate ‘hurs’ from ‘ps’, ‘tas’ and ‘huss’
- Parameters:
ds (
xarray.Dataset) – The Input data containing ‘ps’, ‘tas’ and ‘huss’.- Returns:
The dataset including tdew.
- Return type:
Examples
In [1]: import pyku ...: ...: # Open test dataset and select a few time steps ...: # --------------------------------------------- ...: ...: ds = pyku.resources.get_test_data('tas_ps_huss') ...: ds = ds.isel(time=[0,1,2]) ...: ...: # Calculate hurs and show data variables ...: # -------------------------------------- ...: ...: ds = ds.pyku.calc_hurs() ...: ds.data_vars ...: Out[1]: Data variables: crs int32 4B ... huss (time, y, x) float32 45kB dask.array<chunksize=(3, 71, 53), meta=np.ndarray> ps (time, y, x) float32 45kB dask.array<chunksize=(3, 71, 53), meta=np.ndarray> tas (time, y, x) float32 45kB dask.array<chunksize=(3, 71, 53), meta=np.ndarray> time_bnds (time, bnds) datetime64[ns] 48B dask.array<chunksize=(3, 2), meta=np.ndarray> hurs (time, y, x) float32 45kB dask.array<chunksize=(3, 71, 53), meta=np.ndarray>
- pyku.compute.calc_huss(ds)[source]#
Calculate ‘huss’ from ‘ps’ and ‘tdew’
- Parameters:
ds (
xarray.Dataset) – The Input data containing ‘ps’ and ‘tdew’.- Returns:
The dataset including tdew.
- Return type:
- pyku.compute.calc_ssim(mean_ref, mean_model, variance_ref, variance_model, covariance, c1=1e-08, c2=1e-08)[source]#
Calculate Structural Similarity Index Measure
- Parameters:
mean_ref (float) – mean for reference values
mean_model (float) – mean for sample values
variance_ref (float) – variance for reference values
variance_model (float) – variance for sample values
covariance (float) – covariance for reference and sample values
c1 (float) – constant 1
c2 (float) – constant 2
- Returns:
Structural Similarity Index Measure
- Return type:
SSIM (float)
- References: Wang et al. 2004 (DOI: 10.1109/tip.2003.819861)
with modifications c1 and c2 in default 1e-8 as suggested by Baker et al. 2022 (abs/2202.02616) to give the equal weight to the luminance and contrast components of SSIM and modification of Dalelane et al. (modref)
- pyku.compute.calc_tdew(ds)[source]#
Add dew point temperature to dataset, calculated from ‘tas’ and ‘hurs’
- Parameters:
ds (
xarray.Dataset) – The input data containing ‘tas’ and ‘hurs’.- Returns:
The data with tdew included.
- Return type:
Examples
In [1]: import pyku ...: ...: # Open test dataset and select a few time steps ...: # --------------------------------------------- ...: ...: ds = pyku.resources.get_test_data('tas_hurs') ...: ds = ds.isel(time=[0,1,2]) ...: ...: # Calculate tdew and show data variables ...: # -------------------------------------- ...: ...: ds = ds.pyku.calc_tdew() ...: ds.data_vars ...: Out[1]: Data variables: crs int32 4B ... hurs (time, y, x) float32 45kB dask.array<chunksize=(3, 71, 53), meta=np.ndarray> tas (time, y, x) float32 45kB dask.array<chunksize=(3, 71, 53), meta=np.ndarray> time_bnds (time, bnds) datetime64[ns] 48B dask.array<chunksize=(3, 2), meta=np.ndarray> tdew (time, y, x) float32 45kB dask.array<chunksize=(3, 71, 53), meta=np.ndarray>
- pyku.compute.calc_windspeed(ds)[source]#
Given windspeed components, calculate the windspeed
- Parameters:
ds (
xarray.Dataset) – The input dataset.- Returns:
The dataset with windspeed included.
- Return type:
- pyku.compute.inpainting(ds_in, roi=3, method='INPAINT_TELEA')[source]#
Warning
This function is unmaintained. Please contact a maintainer if you would like to keep this function.
Inpainting data correction
- Parameters:
ds (
xarray.Dataset) – Input datamethod (str) – Defaults to “INPAINT_TELEA”. One of{“INPAINT_TELEA”, “INPAINT_NS”}.
roi (int) – Inpainting radius
- Returns:
The inpainted dataset.
- Return type:
References
Example
import pyku, numpy # Select some model data and break it # ----------------------------------- broken = pyku.resources.get_test_data('model_data')\ .isel(time=[0, 1, 2, 3]) broken['tas'] = broken['tas'].where( (broken['tas']<263) | (broken['tas']>263.5), numpy.nan ) # Repair the broken data # ---------------------- repaired = ds.pyku.inpainting(roi=3) # Set title and plot # ------------------ broken = broken.assign_attrs({'label': 'Broken'}) repaired = repaired.assign_attrs({'label': 'Repaired'}) pyku.analyse.two_maps( broken.isel(time=0), repaired.isel(time=0), var = 'tas', crs='EUR-11' )
- pyku.compute.persistent_processing(func, files=None, tmpdir=None, identifier='pyku', persist=False, engine=None, unify_chunks=True, chunks={'time': -1}, extension='nc')[source]#
Apply a function to a list of files, save the results in a temporary directory, and return the names of the processed files.
This function enables efficient data processing by applying a user-defined function to each file in the input list. The processed files are stored in a temporary directory, and their paths are returned. It optimizes memory usage by chunking large datasets and processing them in smaller, manageable segments, especially useful for computationally expensive tasks or multiprocessing.
If the frequency cannot be inferred and the dataset size is large, an hourly frequency is assumed, and files are split into one-year segments. For smaller datasets, the original dataset is returned without modifications.
Key Benefits: * Optimized for large datasets and heavy computations. * Supports multiprocessing with chunked data. * Simplifies debugging by breaking down large files into smaller chunks.
- Parameters:
func (function) – A function that accepts and returns an
xarray.Dataset.files (list) – A list of input data files to process.
tmpdir (str) – Path to the temporary directory for processed files.
unify_chunks (bool) – If True, synchronizes chunking across all dataset variables to prevent computation overhead and alignment errors. Note: This may increase initial memory usage and task graph complexity if the data are not chunked properly.
chunks (dict) – Specifies dimension-to-size mapping for data partitioning. Defaults to unchunked (single chunk) along the time dimension.
identifier (str, optional) – A string identifier to include in processed file names (default: ‘pyku’).
extension (str, optional) – Format for the output files, either nc or zarr
- Returns:
A list of paths to the processed files.
- Return type:
list
Examples
In [1]: import xarray as xr ...: import pyku ...: import tempfile ...: import pyku.resources ...: import pyku.compute ...: ...: # Define list of files ...: # -------------------- ...: ...: files = pyku.resources.get_test_data("monthly_hyras_files") ...: ...: # Create a temporary directory ...: # ---------------------------- ...: ...: # Alternatively, define your own directory where data ...: # should be located. Here since this jupyter notebook ...: # runs automatically, I merely intend to use the cleanup ...: # function the the tempfile library. ...: ...: temp_dir = tempfile.TemporaryDirectory() ...: ...: # Show the temporary directory name ...: # --------------------------------- ...: ...: print("Temporary directory:", temp_dir.name) ...: ...: # Get Polygon for Germany ...: # ----------------------- ...: ...: germany_polygon = pyku.resources.get_geodataframe('germany') ...: ...: # Define a preprocessing function ...: # ------------------------------- ...: ...: def preprocessing(ds): ...: ds = ds.pyku.project('HYR-LAEA-5') ...: ds = ds.pyku.apply_mask(germany_polygon) ...: return ds ...: ...: # Semi-permanently preprocess the files ...: # ------------------------------------- ...: ...: preprocessed_files = pyku.compute.persistent_processing( ...: func=preprocessing, ...: files=files, ...: tmpdir=temp_dir.name, ...: identifier='my-pre-processed-data', ...: ) ...: ...: # Print list of preprocessed files ...: # -------------------------------- ...: ...: print(preprocessed_files) ...: ...: # Cleanup the temporary directory ...: # ------------------------------- ...: ...: temp_dir.cleanup() ...: Temporary directory: /tmp/tmpkprm4lr5 ['/tmp/tmpkprm4lr5/tas_hyras_1_1961_v6-1_de_monmean-my-pre-processed-data-e8057cab55e1b77c225ba0b9e1cc0ede-00000000.nc', '/tmp/tmpkprm4lr5/tas_hyras_1_1962_v6-1_de_monmean-my-pre-processed-data-9487fdee784c5c1fb2e9b3b7777bd70a-00000000.nc']