emsarray.utils#

Utility functions for working with datasets. These are low-level functions that apply fixes to datasets, or provide functionality missing in xarray. Most users will not have to call these functions directly.

See also

emsarray.operations

timed_func(fn)#

Log the execution time of the decorated function. Logs “Calling <func.__qualname__>” before the wrapped function is called, and “Completed <func.__qualname__> in <time>``s" after. The name of the logger is taken from ``func.__module__.

Example

class Grass(Convention):
    @cached_property
    @timed_func
    def polygons(self):
        return ...

When called, this will log something like:

DEBUG Calling Grass.polygons
DEBUG Completed Grass.polygons in 3.14s
to_netcdf_with_fixes(dataset, path, time_variable=None, **kwargs)#

Saves a xarray.Dataset to a netCDF4 file, applies various fixes to make it compatible with CSIRO software.

Specifically, this:

Parameters:
  • dataset – The xarray.Dataset to save

  • path – Where to save the dataset

  • time_variable – The name of the time variable which needs fixing. Optional, if not provided the time variable will not be fixed for EMS.

  • kwargs – Any extra kwargs are passed to xarray.Dataset.to_netcdf()

format_time_units_for_ems(units, calendar='proleptic_gregorian')#

Reformat a given time unit string to an EMS-compatible string. xarray will always format time unit strings using ISO8601 strings with a T separator and no space before the timezone. EMS is unable to parse this, and needs spaces between the date, time, and timezone components.

Parameters:
  • units – A CF ‘units’ description of a time variable.

  • calendar – A CF ‘calendar’ attribute. Defaults to “proleptic_gregorian”.

Returns:

str – A new CF ‘units’ string, representing the same time, but formatted for EMS.

Example

>>> format_time_units_for_ems("days since 1990-01-01T00:00:00+10:00")
"days since 1990-01-01 00:00:00 +10:00"
fix_time_units_for_ems(dataset_path, variable_name)#

Updates time units in a file so they are compatible with EMS. EMS only supports parsing a subset of valid time unit strings.

When saving xarray.Dataset objects, any time-based variables will be saved with a units like "days since 1990-01-01T00:00:00+10:00" - a full ISO 8601 date string. EMS is old and grumpy, and only accepts time units with a format like "days since 1990-01-01 00:00:00 +10".

This function will do an in-place update of the time variable units in a dataset on disk. It will not recalculate any values, merely update the attribute.

Parameters:
  • dataset_path – The path to the dataset on disk.

  • variable_name – The name of the time variable in the dataset to fix.

disable_default_fill_value(dataset_or_array)#

Update all variables on this dataset or data array and disable the automatic _FillValue xarray sets. An automatic fill value can spoil missing_value, violate CF conventions for coordinates, and generally change a dataset that was loaded from disk in unintentional ways.

Parameters:

dataset_or_array – The xarray.Dataset or xarray.DataArray to update

dataset_like(sample_dataset, new_dataset)#

Take an example dataset, and another dataset with identical variable names and coordinates, and rearrange the new dataset to have identical ordering to the sample. Useful for making a multi-file dataset resemble a sample dataset, for example fter masking and saving each variable one-by-one to a file.

Parameters:
Returns:

xarray.Dataset – A new dataset with attributes and orderings taken from sample_dataset and data taken from new_dataset.

extract_vars(dataset, variables, keep_bounds=True, errors='raise')#

Extract a set of variables from a dataset, dropping all others.

This is approximately the opposite of xarray.Dataset.drop_vars().

Parameters:
  • dataset – The dataset to extract the variables from

  • variables – A list of variable names

  • keep_bounds – If true (the default), additionally keep any bounds variables for the included variables and all coordinates.

  • errors ({"raise", "ignore"}, optional) – If ‘raise’ (default), raises a ValueError error if any of the variable passed are not in the dataset. If ‘ignore’, any given names that are in the dataset are kept and no error is raised.

Returns:

xarray.Dataset – A new dataset with only the named variables included.

pairwise(iterable)#

Iterate over values in an iterator in pairs.

Example

>>> for a, b in pairwise("ABCD"):
...     print(a, b)
A B
B C
C D
dimensions_from_coords(dataset, coordinate_names)#

Get the names of the dimensions for a set of coordinates.

Parameters:
  • dataset – The dataset to get the dimensions from

  • coordinate_names – The names of some coordinate variables.

Returns:

list of Hashable – The name of the relevant dimension for each coordinate variable.

check_data_array_dimensions_match(dataset, data_array)#

Check that the dimensions of a xarray.DataArray match the dimensions of a xarray.Dataset. This is useful when using the metadata of a particular dataset to display a data array, without requiring the data array to be taken directly from the dataset.

If the dimensions do not match, a ValueError is raised, indicating the mismatched dimension.

Parameters:
  • dataset – The dataset used as a reference

  • data_array – The data array to check the dimensions of

Raises:

ValueError – Raised if the dimensions do not match

move_dimensions_to_end(data_array, dimensions)#

Transpose the dimensions of a xarray.DataArray such that the given dimensions appear as the last dimensions, in the order given.

Other dimensions appear as the first dimensions, in the same order they are present in the original dataset

Parameters:
  • `data_array` (xarray.DataArray) – The data array to transpose

  • `dimensions` (list of Hashable) – The dimensions to move to the end

Examples

>>> data_array.dims
('a', 'b', 'c', 'd')
>>> transposed = move_dimensions_to_end(data_array, ['c', 'b'])
>>> transposed.dims
('a', 'd', 'c', 'b')
ravel_dimensions(data_array, dimensions, linear_dimension=None)#

Flatten the given dimensions of a DataArray. Other dimensions are kept as-is. This is useful for turning a DataArray with dimensions (‘t’, ‘z’, ‘y’, ‘x’) in to (‘t’, ‘z’, ‘index’).

Parameters:
  • `data_array` (xarray.DataArray) – The data array to linearize

  • `dimensions` (list of Hashable) – The dimensions to linearize, in the desired order. These dimensions can be in any order and any position in the input data array.

  • `linear_dimension` (Hashable, optional) – The name of the new dimension of flattened data. Defaults to index, or index_0, index_1, etc if not given.

Returns:

xarray.DataArray – A new data array with the specified dimensions flattened. Only data, coordinates, and dimensions are set, attributes and encodings are not copied over.

Examples

>>> data_array = xarray.DataArray(
...     data=numpy.random.random((3, 5, 7)),
...     dims=['x', 'y', 'z'],
... )
>>> flattened = ravel_dimensions(data_array, ['y', 'x'])
>>> flattened.dims
('z', 'index')
>>> flattened.shape
(7, 15)
>>> expected = numpy.transpose(data_array.isel(z=0).values).ravel()
>>> all(flattened.isel(z=0).values == expected)
True
wind_dimension(data_array, dimensions, sizes, *, linear_dimension='index')#

Replace a dimension in a data array by reshaping it in to one or more other dimensions.

Parameters:
  • data_array (xarray.DataArray) – The data array to reshape

  • dimensions (sequence of Hashable) – The names of the new dimensions after reshaping.

  • sizes (sequence of int) – The sizes of the new dimensions. The product of these sizes should match the size of the dimension being reshaped.

  • linear_dimension (Hashable) – The name of the dimension to reshape. Defaults to ‘index’, the default name for linear dimensions returned by ravel_dimensions().

Returns:

xarray.DataArray – The original data array, with the linear dimension reshaped in to the new dimensions.

Examples

>>> data_array = xarray.DataArray(
...     data=numpy.arange(11 * 7 * 5 * 3).reshape(11, -1, 3),
...     dims=('time', 'index', 'colour'),
... )
>>> data_array.sizes
Frozen({'time': 11, 'index': 35, 'colour': 3})
>>> wound_array = wind_dimensions(data_array, ['y', 'x'], [7, 5])
>>> wound_array.sizes
Frozen({'time': 11, 'y': 7, 'x': 5, 'colour': 3})

See also

ravel_dimensions

The inverse operation

datetime_from_np_time(np_time)#

Convert a numpy datetime64 to a python datetime. Useful when formatting dates for human consumption, as a numpy datetime64 has no equivalent of datetime.datetime.strftime().

This does present the possibility of losing precision, as a numpy datetime64 has variable accuracy up to an attosecond, while Python datetimes have fixed microsecond accuracy. A conversion that truncates data is not reported as an error. If you’re using numpy datetime64 with attosecond accuracy, the Python datetime formatting methods are insufficient for your needs anyway.

exception RequiresExtraException(extra)#

Raised when the optional dependencies for some functionality have not been installed, and a function requiring them is called.

See also

requires_extra()

make_polygons_with_holes(points, *, out=None)#

Make a numpy.ndarray of shapely.Polygon from an array of (n, m, 2) points. n is the number of polygons, m is the number of vertices per polygon. If any point in a polygon is numpy.nan, that Polygon is skipped and will be None in the returned array.

Parameters:
  • points (numpy.ndarray) – A (n, m, 2) array. Each row represents the m points of a polygon.

  • out (numpy.ndarray, optional) – Optional. An array to fill with polygons.

Returns:

numpy.ndarray – The polygons in a array of size n.