pyku.pdftransfer

Contents

pyku.pdftransfer#

Univariate and Multivariate Probability Distribution Transfer and Quantile Mapping methods.

Low level classes for numpy arrays with dimensions (nfeatures x nsamples):

  • MBCn

  • NDPDF

  • UQM

  • QDM

High level classes for use with xarray.Dataset. These classes are directly compatible with NetCDF data loaded with xarray. Climate data typically have dimensions (nfeatures x ntimes x nlats x nlons):

  • MBCnCorrector

  • NDPDFCorrector

  • UQMCorrector

  • QDMCorrector

class pyku.pdftransfer.MBCn(*, nbins=None, niterations=None, kind=None)[source]#

Bases: object

MBCn Bias correction for numpy arrays of size (nfeatures x nsamples)

The method uses an alternance of random rotation and quantile mapping to perform the multivariate bias correction.

Cannon, Alex J.: Multivariate quantile mapping bias correction: an N-dimensional probability density function transform for climate model simulations of multiple variables, Climate Dynamics, nb. 1, vol. 50, p. 31-49, 10.1007/s00382-017-3580-6

fit_predict(*, np_cal=None, np_obs=None, np_mod=None)[source]#

Fit predict

Parameters:
  • np_cal (numpy.ndarray) – The biased dataset for calibration with shape (nfeatures x nsamples)

  • np_obs (numpy.ndarray) – The reference dataset for calibration with shape (nfeatures x nsamples)

  • np_mod (numpy.ndarray) – The biased model dataset with shape (nfeatures x nsamples)

class pyku.pdftransfer.MBCnCorrector(*, ds_mod=None, ds_obs=None, ds_cal=None, nbins=None, niterations=None, kind=None, implementation='SH')[source]#

Bases: PDFCorrector

MBCn Corrector for xarray DataSet

block_fit_predict(ds_mod=None, groupby_type='time.month', block_size=None)#

Group by groupby type and fit-predict each pixel. This function is optimized for multiprocessing with dask.

Parameters:
  • ds_mod (xarray.Dataset) – The model data to be corrected. This parameter is mandatory for ‘uqm’, ‘ndpdf’ and ‘lmk’ methods. For ‘qdm’ and ‘mbcn’, this argument is ignored by default since the model data are already initialized with the corrector. However, if ‘ds_mod’ is passed together with ‘qdm’ or ‘mbcn’, this parameter permits to overwrite the default model data. This is usefull when performing e.g. a correction with a rolling window of 30 years, as it permits to select a subset of the projection data to be considered for the correction. To make sure the code works as intended as intended, it is recommended to pass this parameter even if it is already initialized with the corrector.

  • groupby_type (string) – Data grouping. The correction is applied independently to each group. The value of ‘groupby_type’ is expected to be ‘time.month’, but could also be set to ‘time.season’.

fit_predict()[source]#

Compute the map

generic_fit_predict(ds_obs=None, ds_cal=None, ds_mod=None)#

Fit and predict

The purpose of this function is to be able to reset an existing corrector with new data and fit_predict.

Parameters:
groupedby_fit_predict(ds_mod=None, groupby_type=None)#

grouped by type fit_predict

Parameters:
  • ds_mod (xarray.Dataset) – The biased model dataset.

  • groupby_type (str) – The grouping method, expected the expected value is ‘time.month’.

regional_fit_predict(ds_mod=None, regions=None, area_def=None)#

Regionalized fit_predict

Parameters:
regional_groupedby_fit_predict(ds_mod, groupby_type=None, regions=None, area_def=None, output_varnames=None)#

Regionalized fit_predict

Parameters:
  • groupby_type (str) – Type of grouping. The value is expected to be ‘time.month’

  • regions (geopandas.GeoDataFrame) – regions file

  • area_def (pyresample.AreaDefinition) – Projection

rolling_decadal_block_fit_predict(groupby_type='time.month', block_size=None, client=None)#

High level function for performing bias correction with QDM and MBCn methods.

Use a 30 year rolling window, perform bias correction for the middle decade, and subsequently remove the first and last ten years of the rolling window when returning the corrected data. Then move the rolling window by 10 years. The correction is performed by groupby_type, which should generally be a monthly grouping.

Parameters:
  • groupby_type (str) – How to group the data. Defaults to ‘time.month’. The parameter is expected to be of that type and could be changed to ‘time.season’.

  • client (dask.distributed.client) – Defaults to None. The dask client. When given, the function uses the client for multiprocessing. For large datasets, this parameter is most needed to speed up computations

  • block_size (tuple[int]) – Deprecated option. Block size, defaults to (5, 5). The data should have a form like (ntimes x ny x nx). Calculations are performed for each pixel. For multiprocessing, the block size is a chunk of data along (ny x nx) which can be sent to workers.

Returns:

xarray.Dataset: The corrected dataset.

Return type:

class

class pyku.pdftransfer.NDPDF(*, niterations=None, nbins=None)[source]#

Bases: object

N-Dimensional Probability Distribution Transfer (NDPDF) for numpy arrays of size (nvariables x nsamples)

F. Pitie and A.C. Kokaram and R. Dahyot, N-dimensional probability density function transfer and its application to color transfer, Tenth {IEEE} International Conference on Computer Vision (ICCV05) Volume 1, 10.1109/iccv.2005.166, 2005

fit(*, np_cal=None, np_obs=None)[source]#

Fit

Parameters:
  • np_cal (numpy.ndarray) – The biased model calibration data with shape (nfeatures x nsamples)

  • np_obs (numpy.ndarray) – The observation calibration data with shape (nfeatures x nsamples)

predict(*, np_mod=None)[source]#

Predict

class pyku.pdftransfer.NDPDFCorrector(*, ds_cal=None, ds_obs=None, niterations=None, nbins=None)[source]#

Bases: PDFCorrector

N-Dimensional Probability Distribution Transfer (NDPDF) for xarray Dataset

block_fit_predict(ds_mod=None, groupby_type='time.month', block_size=None)#

Group by groupby type and fit-predict each pixel. This function is optimized for multiprocessing with dask.

Parameters:
  • ds_mod (xarray.Dataset) – The model data to be corrected. This parameter is mandatory for ‘uqm’, ‘ndpdf’ and ‘lmk’ methods. For ‘qdm’ and ‘mbcn’, this argument is ignored by default since the model data are already initialized with the corrector. However, if ‘ds_mod’ is passed together with ‘qdm’ or ‘mbcn’, this parameter permits to overwrite the default model data. This is usefull when performing e.g. a correction with a rolling window of 30 years, as it permits to select a subset of the projection data to be considered for the correction. To make sure the code works as intended as intended, it is recommended to pass this parameter even if it is already initialized with the corrector.

  • groupby_type (string) – Data grouping. The correction is applied independently to each group. The value of ‘groupby_type’ is expected to be ‘time.month’, but could also be set to ‘time.season’.

fit()[source]#

Compute map

generic_fit_predict(ds_obs=None, ds_cal=None, ds_mod=None)#

Fit and predict

The purpose of this function is to be able to reset an existing corrector with new data and fit_predict.

Parameters:
groupedby_fit_predict(ds_mod=None, groupby_type=None)#

grouped by type fit_predict

Parameters:
  • ds_mod (xarray.Dataset) – The biased model dataset.

  • groupby_type (str) – The grouping method, expected the expected value is ‘time.month’.

predict(ds_mod=None)[source]#

Predict

Parameters:

ds_mod (xarray.Dataset) – Biased dataset

Returns:

Corrected dataset

Return type:

xarray.Dataset

regional_fit_predict(ds_mod=None, regions=None, area_def=None)#

Regionalized fit_predict

Parameters:
regional_groupedby_fit_predict(ds_mod, groupby_type=None, regions=None, area_def=None, output_varnames=None)#

Regionalized fit_predict

Parameters:
  • groupby_type (str) – Type of grouping. The value is expected to be ‘time.month’

  • regions (geopandas.GeoDataFrame) – regions file

  • area_def (pyresample.AreaDefinition) – Projection

rolling_decadal_block_fit_predict(groupby_type='time.month', block_size=None, client=None)#

High level function for performing bias correction with QDM and MBCn methods.

Use a 30 year rolling window, perform bias correction for the middle decade, and subsequently remove the first and last ten years of the rolling window when returning the corrected data. Then move the rolling window by 10 years. The correction is performed by groupby_type, which should generally be a monthly grouping.

Parameters:
  • groupby_type (str) – How to group the data. Defaults to ‘time.month’. The parameter is expected to be of that type and could be changed to ‘time.season’.

  • client (dask.distributed.client) – Defaults to None. The dask client. When given, the function uses the client for multiprocessing. For large datasets, this parameter is most needed to speed up computations

  • block_size (tuple[int]) – Deprecated option. Block size, defaults to (5, 5). The data should have a form like (ntimes x ny x nx). Calculations are performed for each pixel. For multiprocessing, the block size is a chunk of data along (ny x nx) which can be sent to workers.

Returns:

xarray.Dataset: The corrected dataset.

Return type:

class

save(pickle_file)[source]#

Save map to file

class pyku.pdftransfer.PDFCorrector[source]#

Bases: object

Parent corrector class for all PDF correctors.

The purpose of this class is to gather high-level and optimized functions.

block_fit_predict(ds_mod=None, groupby_type='time.month', block_size=None)[source]#

Group by groupby type and fit-predict each pixel. This function is optimized for multiprocessing with dask.

Parameters:
  • ds_mod (xarray.Dataset) – The model data to be corrected. This parameter is mandatory for ‘uqm’, ‘ndpdf’ and ‘lmk’ methods. For ‘qdm’ and ‘mbcn’, this argument is ignored by default since the model data are already initialized with the corrector. However, if ‘ds_mod’ is passed together with ‘qdm’ or ‘mbcn’, this parameter permits to overwrite the default model data. This is usefull when performing e.g. a correction with a rolling window of 30 years, as it permits to select a subset of the projection data to be considered for the correction. To make sure the code works as intended as intended, it is recommended to pass this parameter even if it is already initialized with the corrector.

  • groupby_type (string) – Data grouping. The correction is applied independently to each group. The value of ‘groupby_type’ is expected to be ‘time.month’, but could also be set to ‘time.season’.

generic_fit_predict(ds_obs=None, ds_cal=None, ds_mod=None)[source]#

Fit and predict

The purpose of this function is to be able to reset an existing corrector with new data and fit_predict.

Parameters:
groupedby_fit_predict(ds_mod=None, groupby_type=None)[source]#

grouped by type fit_predict

Parameters:
  • ds_mod (xarray.Dataset) – The biased model dataset.

  • groupby_type (str) – The grouping method, expected the expected value is ‘time.month’.

regional_fit_predict(ds_mod=None, regions=None, area_def=None)[source]#

Regionalized fit_predict

Parameters:
regional_groupedby_fit_predict(ds_mod, groupby_type=None, regions=None, area_def=None, output_varnames=None)[source]#

Regionalized fit_predict

Parameters:
  • groupby_type (str) – Type of grouping. The value is expected to be ‘time.month’

  • regions (geopandas.GeoDataFrame) – regions file

  • area_def (pyresample.AreaDefinition) – Projection

rolling_decadal_block_fit_predict(groupby_type='time.month', block_size=None, client=None)[source]#

High level function for performing bias correction with QDM and MBCn methods.

Use a 30 year rolling window, perform bias correction for the middle decade, and subsequently remove the first and last ten years of the rolling window when returning the corrected data. Then move the rolling window by 10 years. The correction is performed by groupby_type, which should generally be a monthly grouping.

Parameters:
  • groupby_type (str) – How to group the data. Defaults to ‘time.month’. The parameter is expected to be of that type and could be changed to ‘time.season’.

  • client (dask.distributed.client) – Defaults to None. The dask client. When given, the function uses the client for multiprocessing. For large datasets, this parameter is most needed to speed up computations

  • block_size (tuple[int]) – Deprecated option. Block size, defaults to (5, 5). The data should have a form like (ntimes x ny x nx). Calculations are performed for each pixel. For multiprocessing, the block size is a chunk of data along (ny x nx) which can be sent to workers.

Returns:

xarray.Dataset: The corrected dataset.

Return type:

class

class pyku.pdftransfer.QDM(*, nbins=None, kind=None)[source]#

Bases: object

Quantile Delta Mapping (QDM) for numpy arrays of size (nfeatures x nsamples)

fit_predict(*, np_obs=None, np_cal=None, np_mod=None)[source]#

Fit

Parameters:
  • np_obs (numpy.ndarray) – The reference dataset for calibration with shape (nfeatures x nsamples).

  • np_cal (numpy.ndarray) – The biased dataset for calibration with shape (nfeatures x nsamples)

  • np_mod (numpy.ndarray) – The biased dataset to be corrected with shape (nfeatures x nsamples)

Returns:

(nfeatures x nsamples) corrected dataset

Return type:

numpy.ndarray

predict_additive(*, np_mod=None, np_cal=None)[source]#
Parameters:
  • np_mod (numpy.ndarray) – The biased model data with shape (nfeatures x nsamples).

  • np_cal (numpy.ndarray) – The calibration model data with shape (nfeatures x nsamples).

Returns:

corrected model data with shape (nfeatures x nsamples)

Return type:

numpy.ndarray

predict_multiplicative(*, np_mod=None, np_cal=None)[source]#
Parameters:
  • np_mod (numpy.ndarray) – The biased model data with shape (nfeatures x nsamples)

  • np_cal (numpy.ndarray) – The calibration data with shape (nfeatures x nsamples)

Returns:

The corrected array with shape (nfeatures x nsamples)

Return type:

numpy.ndarray

class pyku.pdftransfer.QDMCorrector(*, ds_mod=None, ds_obs=None, ds_cal=None, nbins=None, kind=None, implementation='SH')[source]#

Bases: PDFCorrector

Quantile Delta Mapping Corrector (QDM) for xarray.Dataset

https://journals.ametsoc.org/view/journals/clim/28/17/jcli-d-14-00754.1.xml

block_fit_predict(ds_mod=None, groupby_type='time.month', block_size=None)#

Group by groupby type and fit-predict each pixel. This function is optimized for multiprocessing with dask.

Parameters:
  • ds_mod (xarray.Dataset) – The model data to be corrected. This parameter is mandatory for ‘uqm’, ‘ndpdf’ and ‘lmk’ methods. For ‘qdm’ and ‘mbcn’, this argument is ignored by default since the model data are already initialized with the corrector. However, if ‘ds_mod’ is passed together with ‘qdm’ or ‘mbcn’, this parameter permits to overwrite the default model data. This is usefull when performing e.g. a correction with a rolling window of 30 years, as it permits to select a subset of the projection data to be considered for the correction. To make sure the code works as intended as intended, it is recommended to pass this parameter even if it is already initialized with the corrector.

  • groupby_type (string) – Data grouping. The correction is applied independently to each group. The value of ‘groupby_type’ is expected to be ‘time.month’, but could also be set to ‘time.season’.

fit_predict()[source]#

Fit and predict.

Returns:

The corrected dataset.

Return type:

xarray.Dataset

generic_fit_predict(ds_obs=None, ds_cal=None, ds_mod=None)#

Fit and predict

The purpose of this function is to be able to reset an existing corrector with new data and fit_predict.

Parameters:
groupedby_fit_predict(ds_mod=None, groupby_type=None)#

grouped by type fit_predict

Parameters:
  • ds_mod (xarray.Dataset) – The biased model dataset.

  • groupby_type (str) – The grouping method, expected the expected value is ‘time.month’.

regional_fit_predict(ds_mod=None, regions=None, area_def=None)#

Regionalized fit_predict

Parameters:
regional_groupedby_fit_predict(ds_mod, groupby_type=None, regions=None, area_def=None, output_varnames=None)#

Regionalized fit_predict

Parameters:
  • groupby_type (str) – Type of grouping. The value is expected to be ‘time.month’

  • regions (geopandas.GeoDataFrame) – regions file

  • area_def (pyresample.AreaDefinition) – Projection

rolling_decadal_block_fit_predict(groupby_type='time.month', block_size=None, client=None)#

High level function for performing bias correction with QDM and MBCn methods.

Use a 30 year rolling window, perform bias correction for the middle decade, and subsequently remove the first and last ten years of the rolling window when returning the corrected data. Then move the rolling window by 10 years. The correction is performed by groupby_type, which should generally be a monthly grouping.

Parameters:
  • groupby_type (str) – How to group the data. Defaults to ‘time.month’. The parameter is expected to be of that type and could be changed to ‘time.season’.

  • client (dask.distributed.client) – Defaults to None. The dask client. When given, the function uses the client for multiprocessing. For large datasets, this parameter is most needed to speed up computations

  • block_size (tuple[int]) – Deprecated option. Block size, defaults to (5, 5). The data should have a form like (ntimes x ny x nx). Calculations are performed for each pixel. For multiprocessing, the block size is a chunk of data along (ny x nx) which can be sent to workers.

Returns:

xarray.Dataset: The corrected dataset.

Return type:

class

class pyku.pdftransfer.UQM(*, nbins=None)[source]#

Bases: object

Univariate Quantile Mapping (UQM) for numpy arrays of size (nvariables x nsamples)

fit(*, np_cal=None, np_obs=None)[source]#

Fit

Parameters:
  • np_cal (numpy.ndarray) – (nfeatures x nsamples) biased reference dataset.

  • np_obs (numpy.ndarray) – (nfeatures x nsamples) reference observation dataset.

predict(*, np_mod=None)[source]#

Predict

Parameters:

np_mod (numpy.ndarray) – The biased data as a numpy array of size (nfeatures x nsamples)

Returns:

(nfeatures x nsamples) corrected data

Return type:

numpy.ndarray

class pyku.pdftransfer.UQMCorrector(*, ds_obs=None, ds_cal=None, nbins=None)[source]#

Bases: PDFCorrector

Univariate Quantile Corrector (UQM) for xarray Dataset

block_fit_predict(ds_mod=None, groupby_type='time.month', block_size=None)#

Group by groupby type and fit-predict each pixel. This function is optimized for multiprocessing with dask.

Parameters:
  • ds_mod (xarray.Dataset) – The model data to be corrected. This parameter is mandatory for ‘uqm’, ‘ndpdf’ and ‘lmk’ methods. For ‘qdm’ and ‘mbcn’, this argument is ignored by default since the model data are already initialized with the corrector. However, if ‘ds_mod’ is passed together with ‘qdm’ or ‘mbcn’, this parameter permits to overwrite the default model data. This is usefull when performing e.g. a correction with a rolling window of 30 years, as it permits to select a subset of the projection data to be considered for the correction. To make sure the code works as intended as intended, it is recommended to pass this parameter even if it is already initialized with the corrector.

  • groupby_type (string) – Data grouping. The correction is applied independently to each group. The value of ‘groupby_type’ is expected to be ‘time.month’, but could also be set to ‘time.season’.

fit()[source]#

Compute the map

fit_predict(ds_mod=None)[source]#

Fit and predict

Parameters:

ds_md (xarray.Dataset) – Biased dataset to be corrected

Returns:

Corrected dataset

Return type:

xarray.Dataset

generic_fit_predict(ds_obs=None, ds_cal=None, ds_mod=None)#

Fit and predict

The purpose of this function is to be able to reset an existing corrector with new data and fit_predict.

Parameters:
groupedby_fit_predict(ds_mod=None, groupby_type=None)#

grouped by type fit_predict

Parameters:
  • ds_mod (xarray.Dataset) – The biased model dataset.

  • groupby_type (str) – The grouping method, expected the expected value is ‘time.month’.

predict(ds_mod=None)[source]#

Predict

Parameters:

ds_mod (xarray.Dataset) – Biased dataset to be corrected

Returns:

Bias corrected dataset

Return type:

xarray.Dataset

regional_fit_predict(ds_mod=None, regions=None, area_def=None)#

Regionalized fit_predict

Parameters:
regional_groupedby_fit_predict(ds_mod, groupby_type=None, regions=None, area_def=None, output_varnames=None)#

Regionalized fit_predict

Parameters:
  • groupby_type (str) – Type of grouping. The value is expected to be ‘time.month’

  • regions (geopandas.GeoDataFrame) – regions file

  • area_def (pyresample.AreaDefinition) – Projection

rolling_decadal_block_fit_predict(groupby_type='time.month', block_size=None, client=None)#

High level function for performing bias correction with QDM and MBCn methods.

Use a 30 year rolling window, perform bias correction for the middle decade, and subsequently remove the first and last ten years of the rolling window when returning the corrected data. Then move the rolling window by 10 years. The correction is performed by groupby_type, which should generally be a monthly grouping.

Parameters:
  • groupby_type (str) – How to group the data. Defaults to ‘time.month’. The parameter is expected to be of that type and could be changed to ‘time.season’.

  • client (dask.distributed.client) – Defaults to None. The dask client. When given, the function uses the client for multiprocessing. For large datasets, this parameter is most needed to speed up computations

  • block_size (tuple[int]) – Deprecated option. Block size, defaults to (5, 5). The data should have a form like (ntimes x ny x nx). Calculations are performed for each pixel. For multiprocessing, the block size is a chunk of data along (ny x nx) which can be sent to workers.

Returns:

xarray.Dataset: The corrected dataset.

Return type:

class

save(pickle_file)[source]#

Save map to file

Parameters:

pickle_file (str) – File name