pyku.meta#
The pyku.meta module provides functions for working with metadata in
xarray.Dataset, particularly in the context of climate and geospatial
data. These functions assist in managing coordinate variables, spatial
information, and temporal metadata, while ensuring compatibility with common
conventions and formats.
Metadata retrieval
Functions such as pyku.meta.get_geographic_latlon_varnames(),
pyku.meta.get_crs_varname(), pyku.meta.get_geodata_varnames(), and
pyku.meta.get_spatial_varnames() enable retrieval of specific standard
climate variable names from xarray.Dataset.
Spatial metadata
Determine if datasets are georeferenced (pyku.meta.is_georeferenced()) or
have projection coordinates (pyku.meta.has_projection_coordinates()).
Temporal Metadata:
pyku.meta.get_frequency() is a specialized function for detecting
temporal frequency with support for bounds checks and multiple output formats
(freqstr, DateOffset, Timedelta). Functions like
pyku.meta.get_time_bounds(), and pyku.meta.has_time_bounds()
provide tools to inspect, validate, and manage spatial and temporal
information.
Example Usage
Below are examples of typical usage:
import pyku
# Retrieve a test dataset
# -----------------------
ds = pyku.resources.get_test_data('hyras')
# Find variable names of georeferenced data in dataset
# ----------------------------------------------------
ds.pyku.get_geodata_varnames()
# Get dataset frequency
# ---------------------
ds.pyku.get_frequency(dtype='freqstr')
# Check if the dataset is georeferenced
# -------------------------------------
ds.pyku.is_georeferenced()
For more detailed information on each function, refer to their respective docstrings.
- pyku.meta.filter_incomplete_datetimes(*args, **kwargs)[source]#
This function has moved to
pyku.timekit.filter_incomplete_datetimes()
- pyku.meta.find_match(searched_words, words, excluded_words=None)[source]#
Finds the best match for a target set of names from available coordinates.
- Parameters:
target_names (list) – List of potential names to match, e.g., [‘lat’, ‘lats’, ‘latitude’].
available_coords (list) – List of available coordinate names, e.g., [ ‘time’, ‘lat_3’, ‘lon_3’, ‘x’, ‘y’].
exclude (list) – Optional. List of names to exclude from matching, e.g., [‘rlat’, ‘lat_bnds’].
- Returns:
The best matching coordinate name.
- Return type:
str
Example
For example, if we are looking for latitude, which could be represented by names such as [‘lat’, ‘lats’, ‘latitude’], we want to identify the best match from a set of available coordinates like [‘time’, ‘lat_3’, ‘lon_3’, ‘x’, ‘y’].
To refine the search, certain words should be excluded to prevent them from being returned as matches. For instance, when searching for geographic latitude, terms like rlat or lat_bnds should not be considered valid matches.
In [1]: import pyku.meta as meta ...: meta.find_match( ...: searched_words=['lat', 'lats', 'latitude'], ...: words=['time', 'lat_3', 'lon_3', 'y_3', 'x_3'], ...: excluded_words=['rlat', 'clats'] ...: ) ...: Out[1]: 'lat_3'
- pyku.meta.get_crs_varname(ds)[source]#
Get name of the crs variable
- Parameters:
ds (
xarray.Dataset) – The input Dataset.- Returns:
Name of the crs variable.
- Return type:
str
Example
In [1]: import pyku ...: ds = pyku.resources.get_test_data('hyras') ...: ds.pyku.get_crs_varname() ...: Out[1]: 'crs'
- pyku.meta.get_dataset_size(ds)[source]#
Get dataset size in GB
- Parameters:
ds (
xarray.Dataset) – The in put dataset- Returns:
Dataset size
- Return type:
str
- pyku.meta.get_frequency(ds, dtype='Timedelta')[source]#
This function differs from the standard xarray function
xarray.infer_freq()by additionally checking time bounds.- Parameters:
ds (
xarray.Dataset) – The input dataset.dtype (str) –
Specifies the desired data type for frequency representation. Choose one of the following:
’freqstr’: Represents the frequency as a string. This is the recommended default.
’DateOffset’: Represents the frequency using pandas’ DateOffset.
’Timedelta’: Represents the frequency using pandas’ Timedelta.
- Returns:
freqstr,
pandas.tseries.offsets.DateOffset,pandas.Timedelta: The inferred frequency of the dataset.
Examples
In [1]: import pyku ...: ...: # Get the dataset ...: ds = pyku.resources.get_test_data('hyras') ...: In [2]: # Get the frequency string ...: ds.pyku.get_frequency(dtype='freqstr') ...: Out[2]: 'D' In [3]: # Get the frequency as DateOffset ...: ds.pyku.get_frequency(dtype='DateOffset') ...: Out[3]: <Day> In [4]: # Get the frequency as DateOffset
To create an offset that can be compared, use
to_offset, which converts a frequency string into an offset object. This ensures that the frequency of your data can be compared unambiguously.In [5]: import pyku ...: from pandas.tseries.frequencies import to_offset ...: ds = pyku.resources.get_test_data('hyras') ...: myoffset = ds.pyku.get_frequency(dtype='DateOffset') ...: to_offset('1D') == myoffset ...: Out[5]: True
- pyku.meta.get_geodata_varnames(ds)[source]#
Get variable names of georeferenced data from dataset.
The minimal requirement for a variable to be deemed georeferenced is to have either geographic or projection coordinates.
- Parameters:
ds (
xarray.Dataset) – The input dataset.- Returns:
Names of the georeferenced variables.
- Return type:
list
Example
In [1]: import pyku ...: ds = pyku.resources.get_test_data('hyras-tas-monthly') ...: ds.pyku.get_geodata_varnames() ...: Out[1]: ['tas']
- pyku.meta.get_geodataset(ds, var)[source]#
Get dataset for georeferenced dataset. This function is usefull because it gets the variable with all climate data associated.
- Parameters:
ds (
xarray.Dataset) – The input dataset.var (str, List(str)) – The variable name(s).
- Returns:
The geodata variable(s) with all associated climate data variables.
- Return type:
Example
In [1]: import pyku ...: ds = pyku.resources.get_test_data('hyras') ...: ds.pyku.get_geodataset(var='tas') ...: Out[1]: <xarray.Dataset> Size: 139MB Dimensions: (time: 730, y: 178, x: 133, bnds: 2) Coordinates: * time (time) datetime64[ns] 6kB 1981-01-01 1981-01-02 ... 1982-12-31 * y (y) float64 1kB 3.562e+06 3.556e+06 ... 2.682e+06 2.676e+06 * x (x) float64 1kB 4.024e+06 4.028e+06 ... 4.678e+06 4.684e+06 lat (y, x) float64 189kB dask.array<chunksize=(178, 133), meta=np.ndarray> lon (y, x) float64 189kB dask.array<chunksize=(178, 133), meta=np.ndarray> Dimensions without coordinates: bnds Data variables: tas (time, y, x) float64 138MB dask.array<chunksize=(730, 178, 133), meta=np.ndarray> crs int32 4B ... time_bnds (time, bnds) datetime64[ns] 12kB dask.array<chunksize=(730, 2), meta=np.ndarray> Attributes: (12/23) source: surface observations institution: Deutscher Wetterdienst (DWD) Conventions: CF-1.11 title: gridded_temperature_dataset_(HYRAS TAS) realization: v6-1 project_id: HYRAS ... ... ConventionsURL: http://cfconventions.org/Data/cf-conventions/cf-c... license: The HYRAS data, produced by DWD, is licensed unde... filename: tas_hyras_1_1981_v6-1_de.nc comment: Please be aware that the parameters are stored as... unique_dataset_id: DWD_HYRAS_DE_tas_v6-1_1981_3a0bd428-c11d-47f6-9fb... CORDEX_domain: undefined
- pyku.meta.get_geographic_latlon_varnames(ds)[source]#
Identify the variables holding geographic latitudes and longitudes within the dataset.
- Parameters:
ds (
xarray.Dataset) – The input dataset.- Returns:
Name of variables holding geographic latitudes and longitudes.
- Return type:
tuple[str]
Example
In [1]: import pyku ...: ds = pyku.resources.get_test_data('hyras-tas-monthly') ...: ds.pyku.get_geographic_latlon_varnames() ...: Out[1]: ('lat', 'lon')
- pyku.meta.get_latlon_bounds_varnames(ds)[source]#
Get name of geographic lat/lon bounds variable name
- Parameters:
ds (
xarray.Dataset) – The input dataset.- Returns:
Names of the geographic bounds varname
- Return type:
list
Example
In [1]: import pyku ...: ds = pyku.resources.get_test_data('hyras') ...: ds.pyku.get_latlon_bounds_varnames() ...: Out[1]: (None, None)
- pyku.meta.get_projection_yx_varnames(ds)[source]#
Get the name of projection coordinate names
- Parameters:
ds (
xarray.Dataset) – Input dataset.- Returns:
(y, x) Name of projection coordinates in dataset.
- Return type:
tuple[str]
Example
In [1]: import pyku ...: ds = pyku.resources.get_test_data('hyras-tas-monthly') ...: ds.pyku.get_projection_yx_varnames() ...: Out[1]: ('y', 'x')
- pyku.meta.get_pyku_metadata()[source]#
Get pyku metadata
- Returns:
dictionary of pyku metadata
- Return type:
dict
- pyku.meta.get_spatial_bounds_varnames(ds)[source]#
Get name of spatial bounds variable
- Parameters:
ds (
xarray.Dataset) – The input dataset.- Returns:
Names of the time bounds
- Return type:
list
Example
In [1]: import pyku ...: ds = pyku.resources.get_test_data('hyras') ...: ds.pyku.get_spatial_bounds_varnames() ...: Out[1]: []
- pyku.meta.get_spatial_varnames(ds)[source]#
Get name of spatial variables:
spatial_vertices_varnamesspatial_bounds_varnamesgeographic_latlon_varnamesprojection_yx_varnamescrs_varname
- Parameters:
ds (
xarray.Dataset) – The input dataset- Returns:
Names of the time bounds
- Return type:
list
Example
In [1]: import pyku ...: ds = pyku.resources.get_test_data('hyras') ...: ds.pyku.get_spatial_varnames() ...: Out[1]: ['lat', 'lon', 'y', 'x', 'crs']
- pyku.meta.get_spatial_vertices_varnames(ds)[source]#
Get name of spatial vertices variables
- Parameters:
ds (
xarray.Dataset) – The input dataset- Returns:
The names of the time bounds
- Return type:
list
Example
In [1]: import pyku ...: ds = pyku.resources.get_test_data('hyras') ...: ds.pyku.get_spatial_vertices_varnames() ...: Out[1]: []
- pyku.meta.get_time_bounds(ds, which=None)[source]#
Get time bounds from dataset
- Parameters:
ds (
xarray.Dataset) – The input dataset.which (str) – Either
None,lower, orupper. Default is None.
- Returns:
Array of time bounds.
- Return type:
Example
In [1]: import pyku ...: ds = pyku.resources.get_test_data('hyras') ...: ds.pyku.get_time_bounds()[0:5] ...: Out[1]: array([['1981-01-01T00:00:00.000000000', '1981-01-02T00:00:00.000000000'], ['1981-01-02T00:00:00.000000000', '1981-01-03T00:00:00.000000000'], ['1981-01-03T00:00:00.000000000', '1981-01-04T00:00:00.000000000'], ['1981-01-04T00:00:00.000000000', '1981-01-05T00:00:00.000000000'], ['1981-01-05T00:00:00.000000000', '1981-01-06T00:00:00.000000000']], dtype='datetime64[ns]')
- pyku.meta.get_time_bounds_varname(ds)[source]#
Get name of time bounds variable
- Parameters:
ds (
xarray.Dataset) – The input dataset.- Returns:
Name of the time bounds.
- Return type:
str
Example
In [1]: import pyku ...: ds = pyku.resources.get_test_data('hyras') ...: ds.pyku.get_time_bounds_varname() ...: Out[1]: 'time_bnds'
- pyku.meta.get_time_dependent_varnames(ds)[source]#
Get time dependent variables
- Parameters:
ds (
xarray.Dataset) – The input dataset.- Returns:
List of variables depending on time
- Return type:
list(str)
Example
In [1]: import pyku ...: ds = pyku.resources.get_test_data('hyras') ...: ds.pyku.get_time_dependent_varnames() ...: Out[1]: ['number_of_stations', 'tas', 'time_bnds']
- pyku.meta.get_time_intervals(ds)[source]#
Get time intervals between consecutive datapoints.
- Parameters:
ds (
xarray.Dataset) – The input dataset.- Returns:
Dataset with time intervals
- Return type:
Example
In [1]: import pyku ...: ds = pyku.resources.get_test_data('hyras') ...: ds.pyku.get_time_intervals().interval.values[0:5] ...: Out[1]: array([86400., 86400., 86400., 86400., 86400.])
- pyku.meta.get_unidentified_varnames(ds)[source]#
Get name of unidentified variables
- Parameters:
ds (
xarray.Dataset) – The input dataset.- Returns:
The names of unidentified variables.
- Return type:
List[str]
Example
In [1]: import pyku ...: ds = pyku.resources.get_test_data('hyras') ...: ds.pyku.get_unidentified_varnames() ...: Out[1]: ['number_of_stations']
- pyku.meta.has_geographic_coordinates(dat)[source]#
Determine if the data has geographic coordinates.
- Parameters:
dat (
xarray.Dataset) – The input data.- Returns:
True if data has geograpic coordinates.
- Return type:
bool
Example
In [1]: import pyku ...: ds = pyku.resources.get_test_data('hyras') ...: ds.pyku.has_geographic_coordinates() ...: Out[1]: True
- pyku.meta.has_ordered_dimensions_and_coordinates(ds)[source]#
Checks whether the dimensions and coordinates of the dataset are ordered according to Pyku’s recommendations:
‘time’ appears first (if it exists)
‘lat’ and ‘lon’ are positioned last (if they exist)
All other coordinates retain their relative order
While Pyku can handle any order of dimensions and coordinates, following this recommended structure ensures a more standardized data layout, reducing the likelihood of encountering edge cases.
- Parameters:
dataset (
xarray.Dataset) – The input dataset.- Returns:
Whether the dataset ordering of dimensions and coordinates corresponds pyku’s recommendations.
- Return type:
bool
- pyku.meta.has_projection_coordinates(dat)[source]#
Determine if the data has y/x projection coordinates.
- Parameters:
dat (
xarray.Dataset) – The input data- Returns:
True if data has projection coordinates.
- Return type:
bool
Example
In [1]: import pyku ...: ds = pyku.resources.get_test_data('hyras') ...: ds.pyku.has_projection_coordinates() ...: Out[1]: True
- pyku.meta.has_time_bounds(ds)[source]#
Check if dataset has time bounds
- Parameters:
ds (
xarray.Dataset) – The input dataset- Returns:
True if dataset has time bounds, False otherwise.
- Return type:
bool
Example
In [1]: import pyku ...: ds = pyku.resources.get_test_data('hyras') ...: ds.pyku.has_time_bounds() ...: Out[1]: True
- pyku.meta.has_unstructured_geographic_coordinates(ds)[source]#
Determine if the lat/lon geographic coordinates are unstructured.
- Parameters:
ds (
xarray.Dataset) – The input dataset.- Returns:
True if the lat/lon geographic coordinates are unstructured.
- Return type:
bool
- pyku.meta.is_georeferenced(ds)[source]#
Determine if the dataset is georeferenced.
A dataset is considered georeferenced if projection information is available in any supported format (CF, EPSG, WKT, or PROJ string) and either geographic or projected coordinates are present to compute the lower-left and upper-right corners.
- Parameters:
dat (
xarray.Dataset) – The input dataset.- Returns:
True if the dataset is georeferenced, False otherwise.
- Return type:
bool
Example
In [1]: import pyku ...: ds = pyku.resources.get_test_data('hyras') ...: ds.pyku.is_georeferenced() ...: Out[1]: True
- pyku.meta.reorder_dimensions_and_coordinates(ds)[source]#
Reorders dataset dimensions and coordinates to ensure: - ‘time’ comes first (if it exists) - ‘lat’ and ‘lon’ come last (if they exist) - All other coordinates maintain their relative order between them.
While Pyku can handle any order of dimensions and coordinates, following this recommended structure ensures a more standardized data layout, reducing the likelihood of encountering edge cases.
:param
xarray.Dataset: The input dataset.- Returns
xarray.Dataset: Dataset with reordered dimensions and coordinates.
Examples
In [1]: import pyku ...: ds = pyku.resources.get_test_data('fake_cmip6_data') ...: ...: # Shuffle dimensions and coordinates ...: # ---------------------------------- ...: ...: ds = ds.transpose('lon', 'lat', 'time') ...: ds = ds.assign_coords({ ...: 'lon': ds.lon, 'lat': ds.lat, 'time': ds.time ...: }) ...: ...: # Apply pyku default dimensions and coordinates ordering ...: # ------------------------------------------------------ ...: ...: ds.pyku.reorder_dimensions_and_coordinates() ...: Out[1]: <xarray.Dataset> Size: 189MB Dimensions: (time: 365, lon: 360, lat: 180) Coordinates: * time (time) datetime64[ns] 3kB 2023-01-01 2023-01-02 ... 2023-12-31 * lon (lon) float64 3kB -180.0 -179.0 -178.0 -177.0 ... 178.0 179.0 180.0 * lat (lat) float64 1kB -90.0 -88.99 -87.99 -86.98 ... 87.99 88.99 90.0 Data variables: tas (time, lon, lat) float64 189MB 28.06 25.39 20.84 ... 6.376 7.319 Attributes: (12/51) name: /ccc/work/cont003/gencmip6/checagar/IGCM_OUT/IPSL... Conventions: CF-1.7 CMIP-6.2 creation_date: 2020-10-18T15:18:15Z tracking_id: hdl:21.14100/4f03accf-6a30-44d9-a20e-8ac4fde7055f description: CMIP6 historical title: IPSL-CM6A-LR-INCA model output prepared for CMIP6... ... ... variable_id: zg variant_label: r1i1p1f1 EXPID: historical CMIP6_CV_version: cv=6.2.15.1 dr2xml_md5sum: b6f602401512e82e2d7cadc2c6f36c2a model_version: 6.1.11
- pyku.meta.select_common_datetimes(*args, **kwargs)[source]#
This function has moved to
pyku.timekit.select_common_datetimes()
- pyku.meta.set_time_bounds(*args, **kwargs)[source]#
This function has changed name and moved to timekit.set_time_bounds_from_time_labels.
- pyku.meta.set_time_labels_from_time_bounds(*args, **kwargs)[source]#
This function has moved to
pyku.timekit.set_time_labels_from_time_bounds()
- pyku.meta.to_gregorian_calendar(*args, **kwargs)[source]#
This function has moved to
pyku.timekit.to_gregorian_calendar().
- pyku.meta.to_netcdf(ds, output_file)[source]#
Deprecated. Use
pyku.magic.to_netcdf()instead