Named ensembles#
pyku defines named ensembles that can be used to automate the processing of large sets of filenames, extracting CMOR facets from both the file path and filename. In this example, we demonstrate how to retrieve a machine-friendly list of models from the DWD CMIP5 reference and core ensembles and format this data in a human-readable way. To obtain the full list of available ensembles:
[1]:
import pyku.find as find
import pyku
pyku.list_ensembles()
[1]:
['dwd_cmip5_reference', 'dwd_cmip5_core', 'dwd_cmip6_reference']
The named ensembles can then be accessed as a pandas DataFrame, listing all facets of all models included
[2]:
find.get_ensemble_definition('dwd_cmip5_reference')
[2]:
| driving_experiment_name | driving_model_ensemble_member | driving_model_id | model_id | rcm_version_id | |
|---|---|---|---|---|---|
| 0 | historical | r1i1p1 | CCCma-CanESM2 | CLMcom-CCLM4-8-17 | v1 |
| 1 | rcp85 | r1i1p1 | CCCma-CanESM2 | CLMcom-CCLM4-8-17 | v1 |
| 2 | historical | r1i1p1 | CCCma-CanESM2 | GERICS-REMO2015 | v1 |
| 3 | rcp85 | r1i1p1 | CCCma-CanESM2 | GERICS-REMO2015 | v1 |
| 4 | historical | r12i1p1 | ICHEC-EC-EARTH | GERICS-REMO2015 | v1 |
| ... | ... | ... | ... | ... | ... |
| 59 | rcp26 | r1i1p1 | MPI-M-MPI-ESM-LR | SMHI-RCA4 | v1a |
| 60 | rcp45 | r1i1p1 | MPI-M-MPI-ESM-LR | SMHI-RCA4 | v1a |
| 61 | rcp85 | r1i1p1 | MPI-M-MPI-ESM-LR | SMHI-RCA4 | v1a |
| 62 | historical | r1i1p1 | MPI-M-MPI-ESM-LR | UHOH-WRF361H | v1 |
| 63 | rcp85 | r1i1p1 | MPI-M-MPI-ESM-LR | UHOH-WRF361H | v1 |
64 rows × 5 columns
DWD CMIP5 Reference Ensemble#
In this section, we reformat the ensemble facets for readability.
References
[3]:
ensemble = find.get_ensemble_definition('dwd_cmip5_reference')
# Drop historical
# ---------------
df = ensemble.copy()
# Add a column whether a particular driving experiment is included
# ----------------------------------------------------------------
df.loc[:, 'historical'] = df['driving_experiment_name'] == 'historical'
df.loc[:, 'rcp26'] = df['driving_experiment_name'] == 'rcp26'
df.loc[:, 'rcp45'] = df['driving_experiment_name'] == 'rcp45'
df.loc[:, 'rcp85'] = df['driving_experiment_name'] == 'rcp85'
# Group columns, the boolean scenario columns, thereby simplifying the output
# ---------------------------------------------------------------------------
df = df.groupby(['driving_model_id', 'driving_model_ensemble_member', 'model_id', 'rcm_version_id'], as_index=False).agg({'historical': 'max', 'rcp26': 'max', 'rcp45': 'max', 'rcp85': 'max'})
# Sort and reset index
# --------------------
df = df.sort_values(by=['model_id', 'driving_model_id']).reset_index(drop=True)
# Show
# ----
df
[3]:
| driving_model_id | driving_model_ensemble_member | model_id | rcm_version_id | historical | rcp26 | rcp45 | rcp85 | |
|---|---|---|---|---|---|---|---|---|
| 0 | MPI-M-MPI-ESM-LR | r1i1p1 | CLMcom-BTU-CCLM4-8-17 | v1 | False | True | False | False |
| 1 | CCCma-CanESM2 | r1i1p1 | CLMcom-CCLM4-8-17 | v1 | True | False | False | True |
| 2 | ICHEC-EC-EARTH | r12i1p1 | CLMcom-CCLM4-8-17 | v1 | True | True | True | True |
| 3 | MIROC-MIROC5 | r1i1p1 | CLMcom-CCLM4-8-17 | v1 | True | True | False | True |
| 4 | MOHC-HadGEM2-ES | r1i1p1 | CLMcom-CCLM4-8-17 | v1 | True | False | True | True |
| 5 | MPI-M-MPI-ESM-LR | r1i1p1 | CLMcom-CCLM4-8-17 | v1 | True | False | True | True |
| 6 | CCCma-CanESM2 | r1i1p1 | GERICS-REMO2015 | v1 | True | False | False | True |
| 7 | ICHEC-EC-EARTH | r12i1p1 | GERICS-REMO2015 | v1 | True | False | False | True |
| 8 | MIROC-MIROC5 | r1i1p1 | GERICS-REMO2015 | v1 | True | False | False | True |
| 9 | MOHC-HadGEM2-ES | r1i1p1 | GERICS-REMO2015 | v1 | True | False | False | True |
| 10 | ICHEC-EC-EARTH | r12i1p1 | KNMI-RACMO22E | v1 | True | True | True | True |
| 11 | ICHEC-EC-EARTH | r1i1p1 | KNMI-RACMO22E | v1 | True | False | True | True |
| 12 | MOHC-HadGEM2-ES | r1i1p1 | KNMI-RACMO22E | v2 | True | True | True | True |
| 13 | MPI-M-MPI-ESM-LR | r1i1p1 | MPI-CSC-REMO2009 | v1 | True | True | True | True |
| 14 | MPI-M-MPI-ESM-LR | r2i1p1 | MPI-CSC-REMO2009 | v1 | True | True | True | True |
| 15 | ICHEC-EC-EARTH | r12i1p1 | SMHI-RCA4 | v1 | True | True | True | True |
| 16 | IPSL-IPSL-CM5A-MR | r1i1p1 | SMHI-RCA4 | v1 | True | False | True | True |
| 17 | MOHC-HadGEM2-ES | r1i1p1 | SMHI-RCA4 | v1 | True | True | True | True |
| 18 | MPI-M-MPI-ESM-LR | r1i1p1 | SMHI-RCA4 | v1a | True | True | True | True |
| 19 | ICHEC-EC-EARTH | r12i1p1 | UHOH-WRF361H | v1 | True | False | False | True |
| 20 | MOHC-HadGEM2-ES | r1i1p1 | UHOH-WRF361H | v1 | True | False | False | True |
| 21 | MPI-M-MPI-ESM-LR | r1i1p1 | UHOH-WRF361H | v1 | True | False | False | True |
DWD CMIP5 Core Ensemble#
In this section, we reformat the ensemble facets for readability.
References
[4]:
from IPython.display import Markdown, display
ensemble = find.get_ensemble_definition('dwd_cmip5_core')
# Drop historical
# ---------------
df = ensemble.query("driving_experiment_name != 'historical'").copy()
# Select scenarios
# ----------------
rcp26 = df.query("driving_experiment_name == 'rcp26'")
rcp26 = rcp26.sort_values(by=['driving_model_id', 'model_id']).reset_index(drop=True)
rcp45 = df.query("driving_experiment_name == 'rcp45'")
rcp45 = rcp45.sort_values(by=['driving_model_id', 'model_id']).reset_index(drop=True)
rcp85 = df.query("driving_experiment_name == 'rcp85'")
rcp85 = rcp85.sort_values(by=['driving_model_id', 'model_id']).reset_index(drop=True)
display(Markdown("### RCP 2.6"))
display(rcp26)
display(Markdown("### RCP 4.5"))
display(rcp45)
display(Markdown("### RCP 8.5"))
display(rcp85)
RCP 2.6#
| driving_experiment_name | driving_model_ensemble_member | driving_model_id | model_id | rcm_version_id | |
|---|---|---|---|---|---|
| 0 | rcp26 | r12i1p1 | ICHEC-EC-EARTH | CLMcom-CCLM4-8-17 | v1 |
| 1 | rcp26 | r12i1p1 | ICHEC-EC-EARTH | KNMI-RACMO22E | v1 |
| 2 | rcp26 | r1i1p1 | MIROC-MIROC5 | CLMcom-CCLM4-8-17 | v1 |
| 3 | rcp26 | r1i1p1 | MOHC-HadGEM2-ES | KNMI-RACMO22E | v2 |
| 4 | rcp26 | r2i1p1 | MPI-M-MPI-ESM-LR | MPI-CSC-REMO2009 | v1 |
RCP 4.5#
| driving_experiment_name | driving_model_ensemble_member | driving_model_id | model_id | rcm_version_id | |
|---|---|---|---|---|---|
| 0 | rcp45 | r12i1p1 | ICHEC-EC-EARTH | KNMI-RACMO22E | v1 |
| 1 | rcp45 | r1i1p1 | ICHEC-EC-EARTH | KNMI-RACMO22E | v1 |
| 2 | rcp45 | r12i1p1 | ICHEC-EC-EARTH | SMHI-RCA4 | v1 |
| 3 | rcp45 | r1i1p1 | MOHC-HadGEM2-ES | CLMcom-CCLM4-8-17 | v1 |
| 4 | rcp45 | r1i1p1 | MPI-M-MPI-ESM-LR | MPI-CSC-REMO2009 | v1 |
| 5 | rcp45 | r2i1p1 | MPI-M-MPI-ESM-LR | MPI-CSC-REMO2009 | v1 |
RCP 8.5#
| driving_experiment_name | driving_model_ensemble_member | driving_model_id | model_id | rcm_version_id | |
|---|---|---|---|---|---|
| 0 | rcp85 | r1i1p1 | CCCma-CanESM2 | CLMcom-CCLM4-8-17 | v1 |
| 1 | rcp85 | r1i1p1 | ICHEC-EC-EARTH | KNMI-RACMO22E | v1 |
| 2 | rcp85 | r1i1p1 | MIROC-MIROC5 | GERICS-REMO2015 | v1 |
| 3 | rcp85 | r1i1p1 | MOHC-HadGEM2-ES | CLMcom-CCLM4-8-17 | v1 |
| 4 | rcp85 | r1i1p1 | MPI-M-MPI-ESM-LR | MPI-CSC-REMO2009 | v1 |
| 5 | rcp85 | r1i1p1 | MPI-M-MPI-ESM-LR | UHOH-WRF361H | v1 |
To summarize the core ensemble in the same manner as the reference ensemble, you can use the following standard pandas code:
[5]:
ensemble = find.get_ensemble_definition('dwd_cmip5_core')
# Drop historical
# ---------------
df = ensemble.copy()
# Add a column whether a particular driving experiment is included
# ----------------------------------------------------------------
df.loc[:, 'historical'] = df['driving_experiment_name'] == 'historical'
df.loc[:, 'rcp26'] = df['driving_experiment_name'] == 'rcp26'
df.loc[:, 'rcp45'] = df['driving_experiment_name'] == 'rcp45'
df.loc[:, 'rcp85'] = df['driving_experiment_name'] == 'rcp85'
# Group columns, the boolean scenario columns, thereby simplifying the output
# ---------------------------------------------------------------------------
df = df.groupby(['driving_model_id', 'driving_model_ensemble_member', 'model_id', 'rcm_version_id'], as_index=False).agg({'historical': 'max', 'rcp26': 'max', 'rcp45': 'max', 'rcp85': 'max'})
# Sort and reset index
# --------------------
df = df.sort_values(by=['model_id', 'driving_model_id']).reset_index(drop=True)
# Show
# ----
df
[5]:
| driving_model_id | driving_model_ensemble_member | model_id | rcm_version_id | historical | rcp26 | rcp45 | rcp85 | |
|---|---|---|---|---|---|---|---|---|
| 0 | CCCma-CanESM2 | r1i1p1 | CLMcom-CCLM4-8-17 | v1 | True | False | False | True |
| 1 | ICHEC-EC-EARTH | r12i1p1 | CLMcom-CCLM4-8-17 | v1 | True | True | False | False |
| 2 | MIROC-MIROC5 | r1i1p1 | CLMcom-CCLM4-8-17 | v1 | True | True | False | False |
| 3 | MOHC-HadGEM2-ES | r1i1p1 | CLMcom-CCLM4-8-17 | v1 | True | False | True | True |
| 4 | MIROC-MIROC5 | r1i1p1 | GERICS-REMO2015 | v1 | True | False | False | True |
| 5 | ICHEC-EC-EARTH | r12i1p1 | KNMI-RACMO22E | v1 | True | True | True | False |
| 6 | ICHEC-EC-EARTH | r1i1p1 | KNMI-RACMO22E | v1 | True | False | True | True |
| 7 | MOHC-HadGEM2-ES | r1i1p1 | KNMI-RACMO22E | v2 | True | True | False | False |
| 8 | MPI-M-MPI-ESM-LR | r1i1p1 | MPI-CSC-REMO2009 | v1 | True | False | True | True |
| 9 | MPI-M-MPI-ESM-LR | r2i1p1 | MPI-CSC-REMO2009 | v1 | True | True | True | False |
| 10 | ICHEC-EC-EARTH | r12i1p1 | SMHI-RCA4 | v1 | True | False | True | False |
| 11 | MPI-M-MPI-ESM-LR | r1i1p1 | UHOH-WRF361H | v1 | True | False | False | True |