Configuration#
CLIX leverages Dask as its default backend to perform all computations in parallel, which is crucial for handling large climate datasets efficiently.
Dask Temporary Directory#
For small computations, the default Dask settings are usually sufficient. However, to prevent potential issues during runtime when processing larger datasets, it is highly recommended to configure Dask’s temporary directory to a disk location with ample storage capacity.
To do this, set the following environment variable before running CLIX:
export DASK_TEMPORARY_DIRECTORY=/path/to/bigger/tmpdir
This ensures that intermediate Dask files have enough space, preventing “disk full” errors or performance bottlenecks.
Local Cluster Settings and Feedback#
The configuration of the local Dask cluster, including internal chunking strategies, is managed automatically by CLIX. By default, CLIX executes all computations sequentially, without starting a local Dask cluster.
Parallel execution can be enabled explicitly by setting the number of Dask
workers to a value greater than zero using the --workers option. This allows
CLIX to distribute computations across multiple CPU cores, which can
significantly improve performance for large datasets, provided sufficient
memory is available.
To manually control the resources allocated to the local Dask cluster, the following command-line options are available:
--memory_limit <GB>: Sets the memory limit per Dask worker in gigabytes. For example,--memory_limit 16allocates 16 GB of RAM to each worker. This option helps prevent individual workers from exceeding available memory. The default value is12GB per worker.--workers <N>: Sets the number of Dask workers used for parallel computation. A value of0(the default) disables parallel execution. Setting this value to1or higher enables a local Dask cluster and allows computations to run in parallel on multi-core systems.
If you encounter performance issues such as memory exhaustion or excessive I/O operations—even after adjusting these parameters—please report them to the CLIX development team. Such feedback is valuable for improving the internal cluster management and default configurations.
Using a remote cluster#
An experimental option has been implemented to use a remote or distributed cluster. In order to try this use the --dask_cluster option and provide the address to execute computation on the linked cluster.
clixrun --dask_cluster address ...