Developers#

This page describes the workflow to contribute to pyku.

Design philosophy#

pyku functions should take in xarray.Dataset and output xarray.Datasets. xarray.DataArray should not be used in the interface, since essential climate metadata would be lost (e.g. the crs variable).

Alternatively use geopandas.GeoDataFrame for the input and outputs where polyons or point data are needed.

Write independent functions for simplicity, and only use objects where necessary.

Developer installation#

First, create a new virtual environment and install pyku:

# Create a new virtual environment
# --------------------------------

python3 -m venv yourvenv

# Activate
# --------

source yourvenv/bin/activate

Create a fork, a feature branch, clone and cd to the pyku directory:

git clone https://github.com/deutscherwetterdienst/pyku
cd pyku

Tip

If you installed pyku in development mode on one branch or master, but the pyproject.toml was modified on the other feature branch you are trying or want to work on, you will need to run pip install --editable . again. Also, maybe run pip uninstall pyku first to keep it tidy.

Now pyku can be installed in development mode. All changes made are then immediately available so that you can debug:

pip install -e .

Reload in jupyter#

If you are working in jupyter or in ipython, you will likely need to reload the part of pyku that you are working on with importlib:

import importlib

# First import
# ------------

import pyku.meta as meta

# Reload library to test changes on-the-fly
# -----------------------------------------

importlib.reload(meta)
meta.get_frequency(ds, dtype='freqstr')

Doctests#

The efficacy of testing is greatly enhanced through the integration of documentation, which not only elucidates the code’s functionality but also serves as a pivotal testing resource. Leveraging doctests, developers can seamlessly embed executable code snippets within the documentation, thereby facilitating both understanding and validation of the software’s behavior. This dual-purpose approach ensures that documentation not only elucidates the intricacies of the codebase but also serves as a robust testing suite, enhancing the overall quality and reliability of the software.

Concretely and as an example, the docstring of the function pyku.check.check_units() serves as executable code within the documentation.

"""
Check units

Arguments:
    ds (:class:`xarray.Dataset`): The input dataset.

Returns:
    :class:`pandas.DataFrame`: Dataframe containing checks and issues.

Example:
    .. ipython::

       In [0]: import xarray, pyku
          ...: ds = pyku.resources.get_test_data('hyras')
          ...: ds.pyku.check_units()
"""

During documentation build, this code is automatically executed, and any errors encountered will cause the pipeline to fail, signaling issues. This also serves to automatically generate the function documentation from the docstring:

pyku.check.check_units(ds)[source]

Check units

Parameters:

ds (xarray.Dataset) – The input dataset.

Returns:

The checks and issues.

Return type:

pandas.DataFrame

Example

In [2]: import pyku
   ...: ds = pyku.resources.get_test_data('hyras')
   ...: ds.pyku.check_units()
   ...: 
Out[2]: 
                     key  value issue                               description
0  tas_units_can_be_read   True  None  Check if units can be read automatically

Unit testing#

Unit testing is set up and configured. You can run unit tests outside the pipeline with:

python3 -m unittest discover -v -s ./testing -p "*_test.py"

Logging#

The logging level for debugging can be set like so:

import pyku.resources as resources
import pyku.geo as geo
import logging

logging.getLogger('pyku').setLevel(logging.DEBUG)
logging.basicConfig(level=logging.WARNING)

ds = resources.get_test_data('air_temperature')
geo.sort_georeferencing(ds)

Building the documentation#

The documentation is built with sphinx. Go the the doc directory and you will have the following options. Mostly all options are usefull depending on context.

Building the documentation is resource intensive due to the amount of testing that runs. To work only on part of the documentation, you can specify in doc/conf.py the input_patterns variable which consist of the list of files that will be build. This permits to build only the part of the documentation you are working on.

make html

The make html is very simple and outputs clear errors:

# Build documentation
# -------------------

make html

# Serve the documentation
# -----------------------

python -m http.server --directory _build/html/ --bind=$HOSTNAME

You can then access the documentation from your web browser. For example, if my machine is oflws261, the documentation will be available at http://oflws261.dwd.de:8000

You can also build a pdf with make latexpdf or push directly to confluence if you configured it so and installed the dependencies with make confluence.

sphinx-build

The sphinx-build method is usefull because you can set up multiprocessing with the -j option and specify the output directory. However the error output is hidden due to multiprocessing:

# Build documentation
# -------------------

sphinx-build -j 8 ./ _build/html/

# Serve the documentation
# -----------------------

python -m http.server --directory _build/html/ --bind=$HOSTNAME

sphinx-autobuild

sphinx-autobuild also permits to set up multiprocessing with the -j option, specify the output directory and changes are tracked in real time as you make your changes. However again the error output is mostly hidden due to multiprocessing.

# Build documentation
# -------------------

sphinx-autobuild -j 8 ./ _build/html/ --host $HOSTNAME