pyku.pdftransfer#
Univariate and Multivariate Probability Distribution Transfer and Quantile Mapping methods.
Low level classes for numpy arrays with dimensions (nfeatures x nsamples):
MBCn
NDPDF
UQM
QDM
High level classes for use with xarray.Dataset. These classes are
directly compatible with NetCDF data loaded with xarray. Climate data typically
have dimensions (nfeatures x ntimes x nlats x nlons):
MBCnCorrector
NDPDFCorrector
UQMCorrector
QDMCorrector
- class pyku.pdftransfer.MBCn(*, nbins=None, niterations=None, kind=None)[source]#
Bases:
objectMBCn Bias correction for numpy arrays of size (nfeatures x nsamples)
The method uses an alternance of random rotation and quantile mapping to perform the multivariate bias correction.
Cannon, Alex J.: Multivariate quantile mapping bias correction: an N-dimensional probability density function transform for climate model simulations of multiple variables, Climate Dynamics, nb. 1, vol. 50, p. 31-49, 10.1007/s00382-017-3580-6
- fit_predict(*, np_cal=None, np_obs=None, np_mod=None)[source]#
Fit predict
- Parameters:
np_cal (
numpy.ndarray) – The biased dataset for calibration with shape (nfeatures x nsamples)np_obs (
numpy.ndarray) – The reference dataset for calibration with shape (nfeatures x nsamples)np_mod (
numpy.ndarray) – The biased model dataset with shape (nfeatures x nsamples)
- class pyku.pdftransfer.MBCnCorrector(*, ds_mod=None, ds_obs=None, ds_cal=None, nbins=None, niterations=None, kind=None, implementation='SH')[source]#
Bases:
PDFCorrectorMBCn Corrector for xarray DataSet
- block_fit_predict(ds_mod=None, groupby_type='time.month', block_size=None)#
Group by groupby type and fit-predict each pixel. This function is optimized for multiprocessing with dask.
- Parameters:
ds_mod (
xarray.Dataset) – The model data to be corrected. This parameter is mandatory for ‘uqm’, ‘ndpdf’ and ‘lmk’ methods. For ‘qdm’ and ‘mbcn’, this argument is ignored by default since the model data are already initialized with the corrector. However, if ‘ds_mod’ is passed together with ‘qdm’ or ‘mbcn’, this parameter permits to overwrite the default model data. This is usefull when performing e.g. a correction with a rolling window of 30 years, as it permits to select a subset of the projection data to be considered for the correction. To make sure the code works as intended as intended, it is recommended to pass this parameter even if it is already initialized with the corrector.groupby_type (string) – Data grouping. The correction is applied independently to each group. The value of ‘groupby_type’ is expected to be ‘time.month’, but could also be set to ‘time.season’.
- generic_fit_predict(ds_obs=None, ds_cal=None, ds_mod=None)#
Fit and predict
The purpose of this function is to be able to reset an existing corrector with new data and fit_predict.
- Parameters:
ds_obs (
xarray.Dataset) – Observation training datasetds_cal (
xarray.Dataset) – Model calibration datasetds_mod (
xarray.Dataset) – Bias model dataset
- groupedby_fit_predict(ds_mod=None, groupby_type=None)#
grouped by type fit_predict
- Parameters:
ds_mod (
xarray.Dataset) – The biased model dataset.groupby_type (str) – The grouping method, expected the expected value is ‘time.month’.
- regional_fit_predict(ds_mod=None, regions=None, area_def=None)#
Regionalized fit_predict
- Parameters:
regions (
geopandas.GeoDataFrame) – Region file.area_def (
pyresample.AreaDefinition) – Area definition.
- regional_groupedby_fit_predict(ds_mod, groupby_type=None, regions=None, area_def=None, output_varnames=None)#
Regionalized fit_predict
- Parameters:
groupby_type (str) – Type of grouping. The value is expected to be ‘time.month’
regions (
geopandas.GeoDataFrame) – regions filearea_def (
pyresample.AreaDefinition) – Projection
- rolling_decadal_block_fit_predict(groupby_type='time.month', block_size=None, client=None)#
High level function for performing bias correction with QDM and MBCn methods.
Use a 30 year rolling window, perform bias correction for the middle decade, and subsequently remove the first and last ten years of the rolling window when returning the corrected data. Then move the rolling window by 10 years. The correction is performed by groupby_type, which should generally be a monthly grouping.
- Parameters:
groupby_type (str) – How to group the data. Defaults to ‘time.month’. The parameter is expected to be of that type and could be changed to ‘time.season’.
client (
dask.distributed.client) – Defaults to None. The dask client. When given, the function uses the client for multiprocessing. For large datasets, this parameter is most needed to speed up computationsblock_size (tuple[int]) – Deprecated option. Block size, defaults to (5, 5). The data should have a form like (ntimes x ny x nx). Calculations are performed for each pixel. For multiprocessing, the block size is a chunk of data along (ny x nx) which can be sent to workers.
- Returns:
xarray.Dataset: The corrected dataset.
- Return type:
class
- class pyku.pdftransfer.NDPDF(*, niterations=None, nbins=None)[source]#
Bases:
objectN-Dimensional Probability Distribution Transfer (NDPDF) for numpy arrays of size (nvariables x nsamples)
F. Pitie and A.C. Kokaram and R. Dahyot, N-dimensional probability density function transfer and its application to color transfer, Tenth {IEEE} International Conference on Computer Vision (ICCV05) Volume 1, 10.1109/iccv.2005.166, 2005
- fit(*, np_cal=None, np_obs=None)[source]#
Fit
- Parameters:
np_cal (
numpy.ndarray) – The biased model calibration data with shape (nfeatures x nsamples)np_obs (
numpy.ndarray) – The observation calibration data with shape (nfeatures x nsamples)
- class pyku.pdftransfer.NDPDFCorrector(*, ds_cal=None, ds_obs=None, niterations=None, nbins=None)[source]#
Bases:
PDFCorrectorN-Dimensional Probability Distribution Transfer (NDPDF) for xarray Dataset
- block_fit_predict(ds_mod=None, groupby_type='time.month', block_size=None)#
Group by groupby type and fit-predict each pixel. This function is optimized for multiprocessing with dask.
- Parameters:
ds_mod (
xarray.Dataset) – The model data to be corrected. This parameter is mandatory for ‘uqm’, ‘ndpdf’ and ‘lmk’ methods. For ‘qdm’ and ‘mbcn’, this argument is ignored by default since the model data are already initialized with the corrector. However, if ‘ds_mod’ is passed together with ‘qdm’ or ‘mbcn’, this parameter permits to overwrite the default model data. This is usefull when performing e.g. a correction with a rolling window of 30 years, as it permits to select a subset of the projection data to be considered for the correction. To make sure the code works as intended as intended, it is recommended to pass this parameter even if it is already initialized with the corrector.groupby_type (string) – Data grouping. The correction is applied independently to each group. The value of ‘groupby_type’ is expected to be ‘time.month’, but could also be set to ‘time.season’.
- generic_fit_predict(ds_obs=None, ds_cal=None, ds_mod=None)#
Fit and predict
The purpose of this function is to be able to reset an existing corrector with new data and fit_predict.
- Parameters:
ds_obs (
xarray.Dataset) – Observation training datasetds_cal (
xarray.Dataset) – Model calibration datasetds_mod (
xarray.Dataset) – Bias model dataset
- groupedby_fit_predict(ds_mod=None, groupby_type=None)#
grouped by type fit_predict
- Parameters:
ds_mod (
xarray.Dataset) – The biased model dataset.groupby_type (str) – The grouping method, expected the expected value is ‘time.month’.
- predict(ds_mod=None)[source]#
Predict
- Parameters:
ds_mod (xarray.Dataset) – Biased dataset
- Returns:
Corrected dataset
- Return type:
- regional_fit_predict(ds_mod=None, regions=None, area_def=None)#
Regionalized fit_predict
- Parameters:
regions (
geopandas.GeoDataFrame) – Region file.area_def (
pyresample.AreaDefinition) – Area definition.
- regional_groupedby_fit_predict(ds_mod, groupby_type=None, regions=None, area_def=None, output_varnames=None)#
Regionalized fit_predict
- Parameters:
groupby_type (str) – Type of grouping. The value is expected to be ‘time.month’
regions (
geopandas.GeoDataFrame) – regions filearea_def (
pyresample.AreaDefinition) – Projection
- rolling_decadal_block_fit_predict(groupby_type='time.month', block_size=None, client=None)#
High level function for performing bias correction with QDM and MBCn methods.
Use a 30 year rolling window, perform bias correction for the middle decade, and subsequently remove the first and last ten years of the rolling window when returning the corrected data. Then move the rolling window by 10 years. The correction is performed by groupby_type, which should generally be a monthly grouping.
- Parameters:
groupby_type (str) – How to group the data. Defaults to ‘time.month’. The parameter is expected to be of that type and could be changed to ‘time.season’.
client (
dask.distributed.client) – Defaults to None. The dask client. When given, the function uses the client for multiprocessing. For large datasets, this parameter is most needed to speed up computationsblock_size (tuple[int]) – Deprecated option. Block size, defaults to (5, 5). The data should have a form like (ntimes x ny x nx). Calculations are performed for each pixel. For multiprocessing, the block size is a chunk of data along (ny x nx) which can be sent to workers.
- Returns:
xarray.Dataset: The corrected dataset.
- Return type:
class
- class pyku.pdftransfer.PDFCorrector[source]#
Bases:
objectParent corrector class for all PDF correctors.
The purpose of this class is to gather high-level and optimized functions.
- block_fit_predict(ds_mod=None, groupby_type='time.month', block_size=None)[source]#
Group by groupby type and fit-predict each pixel. This function is optimized for multiprocessing with dask.
- Parameters:
ds_mod (
xarray.Dataset) – The model data to be corrected. This parameter is mandatory for ‘uqm’, ‘ndpdf’ and ‘lmk’ methods. For ‘qdm’ and ‘mbcn’, this argument is ignored by default since the model data are already initialized with the corrector. However, if ‘ds_mod’ is passed together with ‘qdm’ or ‘mbcn’, this parameter permits to overwrite the default model data. This is usefull when performing e.g. a correction with a rolling window of 30 years, as it permits to select a subset of the projection data to be considered for the correction. To make sure the code works as intended as intended, it is recommended to pass this parameter even if it is already initialized with the corrector.groupby_type (string) – Data grouping. The correction is applied independently to each group. The value of ‘groupby_type’ is expected to be ‘time.month’, but could also be set to ‘time.season’.
- generic_fit_predict(ds_obs=None, ds_cal=None, ds_mod=None)[source]#
Fit and predict
The purpose of this function is to be able to reset an existing corrector with new data and fit_predict.
- Parameters:
ds_obs (
xarray.Dataset) – Observation training datasetds_cal (
xarray.Dataset) – Model calibration datasetds_mod (
xarray.Dataset) – Bias model dataset
- groupedby_fit_predict(ds_mod=None, groupby_type=None)[source]#
grouped by type fit_predict
- Parameters:
ds_mod (
xarray.Dataset) – The biased model dataset.groupby_type (str) – The grouping method, expected the expected value is ‘time.month’.
- regional_fit_predict(ds_mod=None, regions=None, area_def=None)[source]#
Regionalized fit_predict
- Parameters:
regions (
geopandas.GeoDataFrame) – Region file.area_def (
pyresample.AreaDefinition) – Area definition.
- regional_groupedby_fit_predict(ds_mod, groupby_type=None, regions=None, area_def=None, output_varnames=None)[source]#
Regionalized fit_predict
- Parameters:
groupby_type (str) – Type of grouping. The value is expected to be ‘time.month’
regions (
geopandas.GeoDataFrame) – regions filearea_def (
pyresample.AreaDefinition) – Projection
- rolling_decadal_block_fit_predict(groupby_type='time.month', block_size=None, client=None)[source]#
High level function for performing bias correction with QDM and MBCn methods.
Use a 30 year rolling window, perform bias correction for the middle decade, and subsequently remove the first and last ten years of the rolling window when returning the corrected data. Then move the rolling window by 10 years. The correction is performed by groupby_type, which should generally be a monthly grouping.
- Parameters:
groupby_type (str) – How to group the data. Defaults to ‘time.month’. The parameter is expected to be of that type and could be changed to ‘time.season’.
client (
dask.distributed.client) – Defaults to None. The dask client. When given, the function uses the client for multiprocessing. For large datasets, this parameter is most needed to speed up computationsblock_size (tuple[int]) – Deprecated option. Block size, defaults to (5, 5). The data should have a form like (ntimes x ny x nx). Calculations are performed for each pixel. For multiprocessing, the block size is a chunk of data along (ny x nx) which can be sent to workers.
- Returns:
xarray.Dataset: The corrected dataset.
- Return type:
class
- class pyku.pdftransfer.QDM(*, nbins=None, kind=None)[source]#
Bases:
objectQuantile Delta Mapping (QDM) for numpy arrays of size (nfeatures x nsamples)
- fit_predict(*, np_obs=None, np_cal=None, np_mod=None)[source]#
Fit
- Parameters:
np_obs (
numpy.ndarray) – The reference dataset for calibration with shape (nfeatures x nsamples).np_cal (
numpy.ndarray) – The biased dataset for calibration with shape (nfeatures x nsamples)np_mod (
numpy.ndarray) – The biased dataset to be corrected with shape (nfeatures x nsamples)
- Returns:
(nfeatures x nsamples) corrected dataset
- Return type:
- predict_additive(*, np_mod=None, np_cal=None)[source]#
- Parameters:
np_mod (
numpy.ndarray) – The biased model data with shape (nfeatures x nsamples).np_cal (
numpy.ndarray) – The calibration model data with shape (nfeatures x nsamples).
- Returns:
corrected model data with shape (nfeatures x nsamples)
- Return type:
- predict_multiplicative(*, np_mod=None, np_cal=None)[source]#
- Parameters:
np_mod (
numpy.ndarray) – The biased model data with shape (nfeatures x nsamples)np_cal (
numpy.ndarray) – The calibration data with shape (nfeatures x nsamples)
- Returns:
The corrected array with shape (nfeatures x nsamples)
- Return type:
- class pyku.pdftransfer.QDMCorrector(*, ds_mod=None, ds_obs=None, ds_cal=None, nbins=None, kind=None, implementation='SH')[source]#
Bases:
PDFCorrectorQuantile Delta Mapping Corrector (QDM) for
xarray.Datasethttps://journals.ametsoc.org/view/journals/clim/28/17/jcli-d-14-00754.1.xml
- block_fit_predict(ds_mod=None, groupby_type='time.month', block_size=None)#
Group by groupby type and fit-predict each pixel. This function is optimized for multiprocessing with dask.
- Parameters:
ds_mod (
xarray.Dataset) – The model data to be corrected. This parameter is mandatory for ‘uqm’, ‘ndpdf’ and ‘lmk’ methods. For ‘qdm’ and ‘mbcn’, this argument is ignored by default since the model data are already initialized with the corrector. However, if ‘ds_mod’ is passed together with ‘qdm’ or ‘mbcn’, this parameter permits to overwrite the default model data. This is usefull when performing e.g. a correction with a rolling window of 30 years, as it permits to select a subset of the projection data to be considered for the correction. To make sure the code works as intended as intended, it is recommended to pass this parameter even if it is already initialized with the corrector.groupby_type (string) – Data grouping. The correction is applied independently to each group. The value of ‘groupby_type’ is expected to be ‘time.month’, but could also be set to ‘time.season’.
- generic_fit_predict(ds_obs=None, ds_cal=None, ds_mod=None)#
Fit and predict
The purpose of this function is to be able to reset an existing corrector with new data and fit_predict.
- Parameters:
ds_obs (
xarray.Dataset) – Observation training datasetds_cal (
xarray.Dataset) – Model calibration datasetds_mod (
xarray.Dataset) – Bias model dataset
- groupedby_fit_predict(ds_mod=None, groupby_type=None)#
grouped by type fit_predict
- Parameters:
ds_mod (
xarray.Dataset) – The biased model dataset.groupby_type (str) – The grouping method, expected the expected value is ‘time.month’.
- regional_fit_predict(ds_mod=None, regions=None, area_def=None)#
Regionalized fit_predict
- Parameters:
regions (
geopandas.GeoDataFrame) – Region file.area_def (
pyresample.AreaDefinition) – Area definition.
- regional_groupedby_fit_predict(ds_mod, groupby_type=None, regions=None, area_def=None, output_varnames=None)#
Regionalized fit_predict
- Parameters:
groupby_type (str) – Type of grouping. The value is expected to be ‘time.month’
regions (
geopandas.GeoDataFrame) – regions filearea_def (
pyresample.AreaDefinition) – Projection
- rolling_decadal_block_fit_predict(groupby_type='time.month', block_size=None, client=None)#
High level function for performing bias correction with QDM and MBCn methods.
Use a 30 year rolling window, perform bias correction for the middle decade, and subsequently remove the first and last ten years of the rolling window when returning the corrected data. Then move the rolling window by 10 years. The correction is performed by groupby_type, which should generally be a monthly grouping.
- Parameters:
groupby_type (str) – How to group the data. Defaults to ‘time.month’. The parameter is expected to be of that type and could be changed to ‘time.season’.
client (
dask.distributed.client) – Defaults to None. The dask client. When given, the function uses the client for multiprocessing. For large datasets, this parameter is most needed to speed up computationsblock_size (tuple[int]) – Deprecated option. Block size, defaults to (5, 5). The data should have a form like (ntimes x ny x nx). Calculations are performed for each pixel. For multiprocessing, the block size is a chunk of data along (ny x nx) which can be sent to workers.
- Returns:
xarray.Dataset: The corrected dataset.
- Return type:
class
- class pyku.pdftransfer.UQM(*, nbins=None)[source]#
Bases:
objectUnivariate Quantile Mapping (UQM) for numpy arrays of size (nvariables x nsamples)
- fit(*, np_cal=None, np_obs=None)[source]#
Fit
- Parameters:
np_cal (
numpy.ndarray) – (nfeatures x nsamples) biased reference dataset.np_obs (
numpy.ndarray) – (nfeatures x nsamples) reference observation dataset.
- predict(*, np_mod=None)[source]#
Predict
- Parameters:
np_mod (
numpy.ndarray) – The biased data as a numpy array of size (nfeatures x nsamples)- Returns:
(nfeatures x nsamples) corrected data
- Return type:
- class pyku.pdftransfer.UQMCorrector(*, ds_obs=None, ds_cal=None, nbins=None)[source]#
Bases:
PDFCorrectorUnivariate Quantile Corrector (UQM) for xarray Dataset
- block_fit_predict(ds_mod=None, groupby_type='time.month', block_size=None)#
Group by groupby type and fit-predict each pixel. This function is optimized for multiprocessing with dask.
- Parameters:
ds_mod (
xarray.Dataset) – The model data to be corrected. This parameter is mandatory for ‘uqm’, ‘ndpdf’ and ‘lmk’ methods. For ‘qdm’ and ‘mbcn’, this argument is ignored by default since the model data are already initialized with the corrector. However, if ‘ds_mod’ is passed together with ‘qdm’ or ‘mbcn’, this parameter permits to overwrite the default model data. This is usefull when performing e.g. a correction with a rolling window of 30 years, as it permits to select a subset of the projection data to be considered for the correction. To make sure the code works as intended as intended, it is recommended to pass this parameter even if it is already initialized with the corrector.groupby_type (string) – Data grouping. The correction is applied independently to each group. The value of ‘groupby_type’ is expected to be ‘time.month’, but could also be set to ‘time.season’.
- fit_predict(ds_mod=None)[source]#
Fit and predict
- Parameters:
ds_md (
xarray.Dataset) – Biased dataset to be corrected- Returns:
Corrected dataset
- Return type:
- generic_fit_predict(ds_obs=None, ds_cal=None, ds_mod=None)#
Fit and predict
The purpose of this function is to be able to reset an existing corrector with new data and fit_predict.
- Parameters:
ds_obs (
xarray.Dataset) – Observation training datasetds_cal (
xarray.Dataset) – Model calibration datasetds_mod (
xarray.Dataset) – Bias model dataset
- groupedby_fit_predict(ds_mod=None, groupby_type=None)#
grouped by type fit_predict
- Parameters:
ds_mod (
xarray.Dataset) – The biased model dataset.groupby_type (str) – The grouping method, expected the expected value is ‘time.month’.
- predict(ds_mod=None)[source]#
Predict
- Parameters:
ds_mod (
xarray.Dataset) – Biased dataset to be corrected- Returns:
Bias corrected dataset
- Return type:
- regional_fit_predict(ds_mod=None, regions=None, area_def=None)#
Regionalized fit_predict
- Parameters:
regions (
geopandas.GeoDataFrame) – Region file.area_def (
pyresample.AreaDefinition) – Area definition.
- regional_groupedby_fit_predict(ds_mod, groupby_type=None, regions=None, area_def=None, output_varnames=None)#
Regionalized fit_predict
- Parameters:
groupby_type (str) – Type of grouping. The value is expected to be ‘time.month’
regions (
geopandas.GeoDataFrame) – regions filearea_def (
pyresample.AreaDefinition) – Projection
- rolling_decadal_block_fit_predict(groupby_type='time.month', block_size=None, client=None)#
High level function for performing bias correction with QDM and MBCn methods.
Use a 30 year rolling window, perform bias correction for the middle decade, and subsequently remove the first and last ten years of the rolling window when returning the corrected data. Then move the rolling window by 10 years. The correction is performed by groupby_type, which should generally be a monthly grouping.
- Parameters:
groupby_type (str) – How to group the data. Defaults to ‘time.month’. The parameter is expected to be of that type and could be changed to ‘time.season’.
client (
dask.distributed.client) – Defaults to None. The dask client. When given, the function uses the client for multiprocessing. For large datasets, this parameter is most needed to speed up computationsblock_size (tuple[int]) – Deprecated option. Block size, defaults to (5, 5). The data should have a form like (ntimes x ny x nx). Calculations are performed for each pixel. For multiprocessing, the block size is a chunk of data along (ny x nx) which can be sent to workers.
- Returns:
xarray.Dataset: The corrected dataset.
- Return type:
class