emsarray.utils#
Utility functions for working with datasets. These are low-level functions that apply fixes to datasets, or provide functionality missing in xarray. Most users will not have to call these functions directly.
See also
emsarray.operations
- emsarray.utils.timed_func(fn)#
Log the execution time of the decorated function. Logs “Calling
<func.__qualname__>” before the wrapped function is called, and “Completed<func.__qualname__>in<time>``s" after. The name of the logger is taken from ``func.__module__.Example
class Grass(Convention): @cached_property @timed_func def polygons(self): return ...
When called, this will log something like:
DEBUG Calling Grass.polygons DEBUG Completed Grass.polygons in 3.14s
- emsarray.utils.to_netcdf_with_fixes(dataset, path, time_variable=None, **kwargs)#
Saves a
xarray.Datasetto a netCDF4 file, applies various fixes to make it compatible with CSIRO software.Specifically, this:
prevents superfluous
_FillValueattributes being added usingutils.disable_default_fill_value(),Reformats time units after saving to make it compatible with EMS using
utils.fix_time_units_for_ems()
- Parameters:
dataset – The
xarray.Datasetto savepath – Where to save the dataset
time_variable – The name of the time variable which needs fixing. Optional, if not provided the time variable will not be fixed for EMS.
kwargs – Any extra kwargs are passed to
xarray.Dataset.to_netcdf()
- emsarray.utils.format_time_units_for_ems(units, calendar='proleptic_gregorian')#
Reformat a given time unit string to an EMS-compatible string.
xarraywill always format time unit strings using ISO8601 strings with aTseparator and no space before the timezone. EMS is unable to parse this, and needs spaces between the date, time, and timezone components.- Parameters:
units – A CF ‘units’ description of a time variable.
calendar – A CF ‘calendar’ attribute. Defaults to “proleptic_gregorian”.
- Returns:
str– A new CF ‘units’ string, representing the same time, but formatted for EMS.
Example
>>> format_time_units_for_ems("days since 1990-01-01T00:00:00+10:00") "days since 1990-01-01 00:00:00 +10:00"
- emsarray.utils.fix_time_units_for_ems(dataset_path, variable_name)#
Updates time units in a file so they are compatible with EMS. EMS only supports parsing a subset of valid time unit strings.
When saving
xarray.Datasetobjects, any time-based variables will be saved with aunitslike"days since 1990-01-01T00:00:00+10:00"- a full ISO 8601 date string. EMS is old and grumpy, and only accepts time units with a format like"days since 1990-01-01 00:00:00 +10".This function will do an in-place update of the time variable units in a dataset on disk. It will not recalculate any values, merely update the attribute.
- Parameters:
dataset_path – The path to the dataset on disk.
variable_name – The name of the time variable in the dataset to fix.
- emsarray.utils.disable_default_fill_value(dataset_or_array)#
Update all variables on this dataset or data array and disable the automatic
_FillValuexarraysets. An automatic fill value can spoilmissing_value, violate CF conventions for coordinates, and generally change a dataset that was loaded from disk in unintentional ways.- Parameters:
dataset_or_array – The
xarray.Datasetorxarray.DataArrayto update
- emsarray.utils.dataset_like(sample_dataset, new_dataset)#
Take an example dataset, and another dataset with identical variable names and coordinates, and rearrange the new dataset to have identical ordering to the sample. Useful for making a multi-file dataset resemble a sample dataset, for example fter masking and saving each variable one-by-one to a file.
- Parameters:
sample_dataset – The
xarray.Datasetto copy the order and attributes fromnew_dataset – The
xarray.Datasetto copy the data from.
- Returns:
xarray.Dataset– A new dataset with attributes and orderings taken fromsample_datasetand data taken fromnew_dataset.
- emsarray.utils.extract_vars(dataset, variables, keep_bounds=True, errors='raise')#
Extract a set of variables from a dataset, dropping all others.
This is approximately the opposite of
xarray.Dataset.drop_vars().- Parameters:
dataset – The dataset to extract the variables from
variables – A list of variable names
keep_bounds – If true (the default), additionally keep any bounds variables for the included variables and all coordinates.
errors (
{"raise", "ignore"}, optional) – If ‘raise’ (default), raises aValueErrorerror if any of the variable passed are not in the dataset. If ‘ignore’, any given names that are in the dataset are kept and no error is raised.
- Returns:
xarray.Dataset– A new dataset with only the named variables included.
See also
- emsarray.utils.pairwise(iterable)#
Iterate over values in an iterator in pairs.
Example
>>> for a, b in pairwise("ABCD"): ... print(a, b) A B B C C D
- emsarray.utils.dimensions_from_coords(dataset, coordinates)#
Get the names of the dimensions for a set of coordinates.
- Parameters:
dataset – The dataset to get the dimensions from
coordinate_names – The names of some coordinate variables.
- Returns:
listofHashable– The name of the relevant dimension for each coordinate variable.
- emsarray.utils.check_data_array_dimensions_match(dataset, data_array, *, dimensions=None)#
Check that the dimensions of a
xarray.DataArraymatch the dimensions of axarray.Dataset. This is useful when using the metadata of a particular dataset to display a data array, without requiring the data array to be taken directly from the dataset.If the dimensions do not match a ValueError is raised indicating the mismatched dimension.
- Parameters:
dataset (
xarray.Dataset) – The dataset used as a referencedata_array (
xarray.DataArray) – The data array to check the dimensions ofdimensions (
listofHashable, optional) – The dimension names to check for equal sizes. Optional, defaults to checking all dimensions on the data array.
- Raises:
ValueError – Raised if the dimensions do not match
- emsarray.utils.move_dimensions_to_end(data_array, dimensions)#
Transpose the dimensions of a
xarray.DataArraysuch that the given dimensions appear as the last dimensions, in the order given.Other dimensions appear as the first dimensions, in the same order they are present in the original dataset
- Parameters:
`data_array` (
xarray.DataArray) – The data array to transpose`dimensions` (
listofHashable) – The dimensions to move to the end
Examples
>>> data_array.dims ('a', 'b', 'c', 'd') >>> transposed = move_dimensions_to_end(data_array, ['c', 'b']) >>> transposed.dims ('a', 'd', 'c', 'b')
- emsarray.utils.find_unused_dimension(dataset_or_data_array, prefix='index')#
Find an unused dimension name in a
xarray.Datasetorxarray.DataArray. Useful when transforming datasets in a way that creates a new dimension.- Parameters:
dataset_or_data_array (
xarray.Datasetorxarray.DataArray) – A dataset or data arrayprefix (
str, optional) – The name of the new dimension. If this dimension already exists, prefix_0 is checked, then prefix_1, prefix_2, etc.
- Returns:
str– A dimension name that does not exist in the dataset or data array passed in.
- emsarray.utils.find_unused_name(dataset, candidate)#
Find an unused variable name in a
xarray.Dataset. Useful when adding a derived variable to a dataset, such as bounds variables. This first tests a candidate name to see if it exists, then appends numeric suffixes (“_0”, “_1”, “_2”, …) until a valid name is found.- Parameters:
dataset (
xarray.Dataset) – The dataset to find an unused name incandidate (
Hashable) – A candidate variable name.
- Returns:
Hashable– A variable name that does not clash with any other names in the dataset.
- emsarray.utils.ravel_dimensions(data_array, dimensions, linear_dimension=None)#
Flatten the given dimensions of a
DataArray. Other dimensions are kept as-is. This is useful for turning a DataArray with dimensions (‘t’, ‘z’, ‘y’, ‘x’) in to (‘t’, ‘z’, ‘index’).- Parameters:
`data_array` (
xarray.DataArray) – The data array to linearize`dimensions` (
listofHashable) – The dimensions to linearize, in the desired order. These dimensions can be in any order and any position in the input data array.`linear_dimension` (
Hashable, optional) – The name of the new dimension of flattened data. Defaults to index, or index_0, index_1, etc if not given.
- Returns:
xarray.DataArray– A new data array with the specified dimensions flattened. Only data, coordinates, and dimensions are set, attributes and encodings are not copied over.
Examples
>>> data_array = xarray.DataArray( ... data=numpy.random.random((3, 5, 7)), ... dims=['x', 'y', 'z'], ... ) >>> flattened = ravel_dimensions(data_array, ['y', 'x']) >>> flattened.dims ('z', 'index') >>> flattened.shape (7, 15) >>> expected = numpy.transpose(data_array.isel(z=0).values).ravel() >>> all(flattened.isel(z=0).values == expected) True
- emsarray.utils.wind_dimension(data_array, dimensions, sizes, *, linear_dimension='index')#
Replace a dimension in a data array by reshaping it in to one or more other dimensions.
- Parameters:
data_array (
xarray.DataArray) – The data array to reshapedimensions (
sequenceofHashable) – The names of the new dimensions after reshaping.sizes (
sequenceofint) – The sizes of the new dimensions. The product of these sizes should match the size of the dimension being reshaped.linear_dimension (
Hashable) – The name of the dimension to reshape. Defaults to ‘index’, the default name for linear dimensions returned byravel_dimensions().
- Returns:
xarray.DataArray– The original data array, with the linear dimension reshaped in to the new dimensions.
Examples
>>> data_array = xarray.DataArray( ... data=numpy.arange(11 * 7 * 5 * 3).reshape(11, -1, 3), ... dims=('time', 'index', 'colour'), ... ) >>> data_array.sizes Frozen({'time': 11, 'index': 35, 'colour': 3}) >>> wound_array = wind_dimensions(data_array, ['y', 'x'], [7, 5]) >>> wound_array.sizes Frozen({'time': 11, 'y': 7, 'x': 5, 'colour': 3})
See also
ravel_dimensionsThe inverse operation
- emsarray.utils.datetime_from_np_time(np_time, *, tz=datetime.timezone.utc)#
Convert a numpy
datetime64to a pythondatetime. Useful when formatting dates for human consumption, as a numpydatetime64has no equivalent ofdatetime.datetime.strftime().This does present the possibility of losing precision, as a numpy datetime64 has variable accuracy up to an attosecond, while Python datetimes have fixed microsecond accuracy. A conversion that truncates data is not reported as an error. If you’re using numpy datetime64 with attosecond accuracy, the Python datetime formatting methods are insufficient for your needs anyway.
- Parameters:
np_time (
numpy.datetime64) – The numpy datetime64 to convert to a Python datetime.tz (
datetime.tzinfo) – The timezone that the numpy datetime is in. Defaults to UTC, as xarray will convert all time variables to UTC when opening files. The returned Python datetime will be in this timezone.
- Returns:
datetime.datetime– A timezone aware Python datetime.datetime instance.
- exception emsarray.utils.RequiresExtraException(extra)#
Raised when the optional dependencies for some functionality have not been installed, and a function requiring them is called.
See also
requires_extra()
- emsarray.utils.make_polygons_with_holes(points, *, out=None)#
Make a
numpy.ndarrayofshapely.Polygonfrom an array of (n, m, 2) points.nis the number of polygons,mis the number of vertices per polygon. If any point in a polygon isnumpy.nan, that Polygon is skipped and will beNonein the returned array.- Parameters:
points (
numpy.ndarray) – A (n, m, 2) array. Each row represents the m points of a polygon.out (
numpy.ndarray, optional) – Optional. An array to fill with polygons.
- Returns:
numpy.ndarray– The polygons in a array of size n.
- emsarray.utils.name_to_data_array(dataset, data_array)#
Takes either a data array or the name of a data array in the dataset, and returns the data array. If passed a name, a data array with that name must exist in the dataset. If passed a data array, the data array must have compatible dimension sizes.
Useful for operations using data arrays and datasets where the data array must have a matching shape but does not need to be a variable in the dataset. This allows for transformed data arrays to be plotted, for example.
- Parameters:
dataset (
xarray.Dataset) – The dataset to check data arrays againstdata_array (
Hashableorxarray.DataArray) – A data array or the name of a data array in the dataset.
- Returns:
xarray.DataArray– The data array passed in, extracted from the dataset if necessary.
- emsarray.utils.data_array_to_name(dataset, data_array)#
Takes either a data array or the name of a data array, and returns just the name. If passed a name, a data array with that name must exist in the dataset. If passed a data array, a data array with the same name and with matching dimensions must exist in the dataset.
This is useful for operations that must be performed using the data array name, such as manipulations of the dataset itself.
- Parameters:
dataset (
xarray.Dataset) – The dataset to check data arrays againstdata_array (
Hashableorxarray.DataArray) – A data array or the name of a data array, or a list of data arrays or names of data arrays.
- Returns:
Hashable– The name of the data array passed in.
- emsarray.utils.estimate_bounds_1d(dataset, coordinate, *, bounds_name=None, bounds_dimension='Two')#
Estimate the bounds of a one dimensional coordinate variable. The bounds between two coordinates is the average of the two values, while the bounds on each end are the first and last coordinate values. This is a crude approach.
- Parameters:
dataset (
xarray.Dataset) – The dataset containing the coordinate.coordinate (
xarray.DataArrayorstr) – The coordinate variable to estimate the bounds of.bounds_name (
Hashable, optional) – The name of the bounds variable to create. Optional, defaults to the name of the coordinate with a “_bounds” suffix.bounds_dimension (
Hashable, default"Two") – The name of the second dimension of the bounds variable. This dimension will have size 2. This dimension can be reused by other one-dimensional bounds variables. Defaults to “Two”.
- Returns:
xarray.Dataset– A copy of the original dataset including the new estimated bounds.- Raises:
ValueError – Raised if the coordinate variable already has a ‘bounds’ attribute.
- emsarray.utils.coordinates_plus_bounds(dataset, names)#
Given a list of coordinate variable names, return a list of all these coordinates plus the names of their bounds variables, if such bounds exist.
- Parameters:
dataset (
xarray.Dataset) – The dataset with coordinate variablesnames (
listofHashable) – A list of coordinate variables
- Returns:
listofHashable– All of the coordinates in names, plus any bounds variables named in the attributes of these coordinate variables.