Named ensembles#

pyku defines named ensembles that can be used to automate the processing of large sets of filenames, extracting CMOR facets from both the file path and filename. In this example, we demonstrate how to retrieve a machine-friendly list of models from the DWD CMIP5 reference and core ensembles and format this data in a human-readable way. To obtain the full list of available ensembles:

[1]:
import pyku.find as find
import pyku
pyku.list_ensembles()
[1]:
['dwd_cmip5_reference', 'dwd_cmip5_core', 'dwd_cmip6_reference']

The named ensembles can then be accessed as a pandas DataFrame, listing all facets of all models included

[2]:
find.get_ensemble_definition('dwd_cmip5_reference')
[2]:
driving_experiment_name driving_model_ensemble_member driving_model_id model_id rcm_version_id
0 historical r1i1p1 CCCma-CanESM2 CLMcom-CCLM4-8-17 v1
1 rcp85 r1i1p1 CCCma-CanESM2 CLMcom-CCLM4-8-17 v1
2 historical r1i1p1 CCCma-CanESM2 GERICS-REMO2015 v1
3 rcp85 r1i1p1 CCCma-CanESM2 GERICS-REMO2015 v1
4 historical r12i1p1 ICHEC-EC-EARTH GERICS-REMO2015 v1
... ... ... ... ... ...
59 rcp26 r1i1p1 MPI-M-MPI-ESM-LR SMHI-RCA4 v1a
60 rcp45 r1i1p1 MPI-M-MPI-ESM-LR SMHI-RCA4 v1a
61 rcp85 r1i1p1 MPI-M-MPI-ESM-LR SMHI-RCA4 v1a
62 historical r1i1p1 MPI-M-MPI-ESM-LR UHOH-WRF361H v1
63 rcp85 r1i1p1 MPI-M-MPI-ESM-LR UHOH-WRF361H v1

64 rows × 5 columns

DWD CMIP5 Reference Ensemble#

In this section, we reformat the ensemble facets for readability.

References

[3]:
ensemble = find.get_ensemble_definition('dwd_cmip5_reference')

# Drop historical
# ---------------

df = ensemble.copy()

# Add a column whether a particular driving experiment is included
# ----------------------------------------------------------------

df.loc[:, 'historical'] = df['driving_experiment_name'] == 'historical'
df.loc[:, 'rcp26'] = df['driving_experiment_name'] == 'rcp26'
df.loc[:, 'rcp45'] = df['driving_experiment_name'] == 'rcp45'
df.loc[:, 'rcp85'] = df['driving_experiment_name'] == 'rcp85'

# Group columns, the boolean scenario columns, thereby simplifying the output
# ---------------------------------------------------------------------------

df = df.groupby(['driving_model_id', 'driving_model_ensemble_member', 'model_id', 'rcm_version_id'], as_index=False).agg({'historical': 'max', 'rcp26': 'max', 'rcp45': 'max', 'rcp85': 'max'})

# Sort and reset index
# --------------------

df = df.sort_values(by=['model_id', 'driving_model_id']).reset_index(drop=True)

# Show
# ----

df
[3]:
driving_model_id driving_model_ensemble_member model_id rcm_version_id historical rcp26 rcp45 rcp85
0 MPI-M-MPI-ESM-LR r1i1p1 CLMcom-BTU-CCLM4-8-17 v1 False True False False
1 CCCma-CanESM2 r1i1p1 CLMcom-CCLM4-8-17 v1 True False False True
2 ICHEC-EC-EARTH r12i1p1 CLMcom-CCLM4-8-17 v1 True True True True
3 MIROC-MIROC5 r1i1p1 CLMcom-CCLM4-8-17 v1 True True False True
4 MOHC-HadGEM2-ES r1i1p1 CLMcom-CCLM4-8-17 v1 True False True True
5 MPI-M-MPI-ESM-LR r1i1p1 CLMcom-CCLM4-8-17 v1 True False True True
6 CCCma-CanESM2 r1i1p1 GERICS-REMO2015 v1 True False False True
7 ICHEC-EC-EARTH r12i1p1 GERICS-REMO2015 v1 True False False True
8 MIROC-MIROC5 r1i1p1 GERICS-REMO2015 v1 True False False True
9 MOHC-HadGEM2-ES r1i1p1 GERICS-REMO2015 v1 True False False True
10 ICHEC-EC-EARTH r12i1p1 KNMI-RACMO22E v1 True True True True
11 ICHEC-EC-EARTH r1i1p1 KNMI-RACMO22E v1 True False True True
12 MOHC-HadGEM2-ES r1i1p1 KNMI-RACMO22E v2 True True True True
13 MPI-M-MPI-ESM-LR r1i1p1 MPI-CSC-REMO2009 v1 True True True True
14 MPI-M-MPI-ESM-LR r2i1p1 MPI-CSC-REMO2009 v1 True True True True
15 ICHEC-EC-EARTH r12i1p1 SMHI-RCA4 v1 True True True True
16 IPSL-IPSL-CM5A-MR r1i1p1 SMHI-RCA4 v1 True False True True
17 MOHC-HadGEM2-ES r1i1p1 SMHI-RCA4 v1 True True True True
18 MPI-M-MPI-ESM-LR r1i1p1 SMHI-RCA4 v1a True True True True
19 ICHEC-EC-EARTH r12i1p1 UHOH-WRF361H v1 True False False True
20 MOHC-HadGEM2-ES r1i1p1 UHOH-WRF361H v1 True False False True
21 MPI-M-MPI-ESM-LR r1i1p1 UHOH-WRF361H v1 True False False True

DWD CMIP5 Core Ensemble#

In this section, we reformat the ensemble facets for readability.

References

[4]:
from IPython.display import Markdown, display

ensemble = find.get_ensemble_definition('dwd_cmip5_core')

# Drop historical
# ---------------

df = ensemble.query("driving_experiment_name != 'historical'").copy()

# Select scenarios
# ----------------

rcp26 = df.query("driving_experiment_name == 'rcp26'")
rcp26 = rcp26.sort_values(by=['driving_model_id', 'model_id']).reset_index(drop=True)

rcp45 = df.query("driving_experiment_name == 'rcp45'")
rcp45 = rcp45.sort_values(by=['driving_model_id', 'model_id']).reset_index(drop=True)

rcp85 = df.query("driving_experiment_name == 'rcp85'")
rcp85 = rcp85.sort_values(by=['driving_model_id', 'model_id']).reset_index(drop=True)

display(Markdown("### RCP 2.6"))
display(rcp26)

display(Markdown("### RCP 4.5"))
display(rcp45)

display(Markdown("### RCP 8.5"))
display(rcp85)

RCP 2.6#

driving_experiment_name driving_model_ensemble_member driving_model_id model_id rcm_version_id
0 rcp26 r12i1p1 ICHEC-EC-EARTH CLMcom-CCLM4-8-17 v1
1 rcp26 r12i1p1 ICHEC-EC-EARTH KNMI-RACMO22E v1
2 rcp26 r1i1p1 MIROC-MIROC5 CLMcom-CCLM4-8-17 v1
3 rcp26 r1i1p1 MOHC-HadGEM2-ES KNMI-RACMO22E v2
4 rcp26 r2i1p1 MPI-M-MPI-ESM-LR MPI-CSC-REMO2009 v1

RCP 4.5#

driving_experiment_name driving_model_ensemble_member driving_model_id model_id rcm_version_id
0 rcp45 r12i1p1 ICHEC-EC-EARTH KNMI-RACMO22E v1
1 rcp45 r1i1p1 ICHEC-EC-EARTH KNMI-RACMO22E v1
2 rcp45 r12i1p1 ICHEC-EC-EARTH SMHI-RCA4 v1
3 rcp45 r1i1p1 MOHC-HadGEM2-ES CLMcom-CCLM4-8-17 v1
4 rcp45 r1i1p1 MPI-M-MPI-ESM-LR MPI-CSC-REMO2009 v1
5 rcp45 r2i1p1 MPI-M-MPI-ESM-LR MPI-CSC-REMO2009 v1

RCP 8.5#

driving_experiment_name driving_model_ensemble_member driving_model_id model_id rcm_version_id
0 rcp85 r1i1p1 CCCma-CanESM2 CLMcom-CCLM4-8-17 v1
1 rcp85 r1i1p1 ICHEC-EC-EARTH KNMI-RACMO22E v1
2 rcp85 r1i1p1 MIROC-MIROC5 GERICS-REMO2015 v1
3 rcp85 r1i1p1 MOHC-HadGEM2-ES CLMcom-CCLM4-8-17 v1
4 rcp85 r1i1p1 MPI-M-MPI-ESM-LR MPI-CSC-REMO2009 v1
5 rcp85 r1i1p1 MPI-M-MPI-ESM-LR UHOH-WRF361H v1

To summarize the core ensemble in the same manner as the reference ensemble, you can use the following standard pandas code:

[5]:
ensemble = find.get_ensemble_definition('dwd_cmip5_core')

# Drop historical
# ---------------

df = ensemble.copy()

# Add a column whether a particular driving experiment is included
# ----------------------------------------------------------------

df.loc[:, 'historical'] = df['driving_experiment_name'] == 'historical'
df.loc[:, 'rcp26'] = df['driving_experiment_name'] == 'rcp26'
df.loc[:, 'rcp45'] = df['driving_experiment_name'] == 'rcp45'
df.loc[:, 'rcp85'] = df['driving_experiment_name'] == 'rcp85'

# Group columns, the boolean scenario columns, thereby simplifying the output
# ---------------------------------------------------------------------------

df = df.groupby(['driving_model_id', 'driving_model_ensemble_member', 'model_id', 'rcm_version_id'], as_index=False).agg({'historical': 'max', 'rcp26': 'max', 'rcp45': 'max', 'rcp85': 'max'})

# Sort and reset index
# --------------------

df = df.sort_values(by=['model_id', 'driving_model_id']).reset_index(drop=True)

# Show
# ----

df
[5]:
driving_model_id driving_model_ensemble_member model_id rcm_version_id historical rcp26 rcp45 rcp85
0 CCCma-CanESM2 r1i1p1 CLMcom-CCLM4-8-17 v1 True False False True
1 ICHEC-EC-EARTH r12i1p1 CLMcom-CCLM4-8-17 v1 True True False False
2 MIROC-MIROC5 r1i1p1 CLMcom-CCLM4-8-17 v1 True True False False
3 MOHC-HadGEM2-ES r1i1p1 CLMcom-CCLM4-8-17 v1 True False True True
4 MIROC-MIROC5 r1i1p1 GERICS-REMO2015 v1 True False False True
5 ICHEC-EC-EARTH r12i1p1 KNMI-RACMO22E v1 True True True False
6 ICHEC-EC-EARTH r1i1p1 KNMI-RACMO22E v1 True False True True
7 MOHC-HadGEM2-ES r1i1p1 KNMI-RACMO22E v2 True True False False
8 MPI-M-MPI-ESM-LR r1i1p1 MPI-CSC-REMO2009 v1 True False True True
9 MPI-M-MPI-ESM-LR r2i1p1 MPI-CSC-REMO2009 v1 True True True False
10 ICHEC-EC-EARTH r12i1p1 SMHI-RCA4 v1 True False True False
11 MPI-M-MPI-ESM-LR r1i1p1 UHOH-WRF361H v1 True False False True