emsarray.utils
emsarray.utils#
Utility functions for working with datasets. These are low-level functions that apply fixes to datasets, or provide functionality missing in xarray. Most users will not have to call these functions directly.
See also
- emsarray.utils.to_netcdf_with_fixes(dataset, path, time_variable=None, **kwargs)#
Saves a
xarray.Dataset
to a netCDF4 file, applies various fixes to make it compatible with CSIRO software.Specifically, this:
prevents superfluous
_FillValue
attributes being added usingutils.disable_default_fill_value()
,Reformats time units that
xarray
struggles with usingutils.fix_bad_time_units()
,Reformats time units after saving to make it compatible with EMS using
utils.fix_time_units_for_ems()
- Parameters
dataset – The
xarray.Dataset
to savepath – Where to save the dataset
time_variable – The name of the time variable which needs fixing. Optional, if not provided the time variable will not be fixed for EMS.
kwargs – Any extra kwargs are passed to
xarray.Dataset.to_netcdf()
- emsarray.utils.format_time_units_for_ems(units, calendar='proleptic_gregorian')#
Reformat a given time unit string to an EMS-compatible string.
xarray
will always format time unit strings using ISO8601 strings with aT
separator and no space before the timezone. EMS is unable to parse this, and needs spaces between the date, time, and timezone components.- Parameters
units – A CF ‘units’ description of a time variable.
calendar – A CF ‘calendar’ attribute. Defaults to “proleptic_gregorian”.
- Returns
str
– A new CF ‘units’ string, representing the same time, but formatted for EMS.
Example
>>> format_time_units_for_ems("days since 1990-01-01T00:00:00+10:00") "days since 1990-01-01 00:00:00 +10:00"
- emsarray.utils.fix_time_units_for_ems(dataset_path, variable_name)#
Updates time units in a file so they are compatible with EMS. EMS only supports parsing a subset of valid time unit strings.
When saving
xarray.Dataset
objects, any time-based variables will be saved with aunits
like"days since 1990-01-01T00:00:00+10:00"
- a full ISO 8601 date string. EMS is old and grumpy, and only accepts time units with a format like"days since 1990-01-01 00:00:00 +10"
.This function will do an in-place update of the time variable units in a dataset on disk. It will not recalculate any values, merely update the attribute.
- Parameters
dataset_path – The path to the dataset on disk.
variable_name – The name of the time variable in the dataset to fix.
- emsarray.utils.disable_default_fill_value(dataset_or_array)#
Update all variables on this dataset or data array and disable the automatic
_FillValue
xarray
sets. An automatic fill value can spoilmissing_value
, violate CF conventions for coordinates, and generally change a dataset that was loaded from disk in unintentional ways.- Parameters
dataset_or_array – The
xarray.Dataset
orxarray.DataArray
to update
- emsarray.utils.fix_bad_time_units(dataset_or_array)#
Some datasets have a time units string that causes xarray to raise an error when saving the dataset. The unit string is parsed when reading the dataset just fine, only saving is the issue. This function will check for these bad unit strings and change them in place to something xarray can handle.
This issue was fixed in https://github.com/pydata/xarray/pull/6049 and released in version 0.21.0. Once the minimum supported version of xarray is 0.21.0 or higher this entire function can be removed.
- emsarray.utils.dataset_like(sample_dataset, new_dataset)#
Take an example dataset, and another dataset with identical variable names and coordinates, and rearrange the new dataset to have identical ordering to the sample. Useful for making a multi-file dataset resemble a sample dataset, for example fter masking and saving each variable one-by-one to a file.
- Parameters
sample_dataset – The
xarray.Dataset
to copy the order and attributes fromnew_dataset – The
xarray.Dataset
to copy the data from.
- Returns
xarray.Dataset
– A new dataset with attributes and orderings taken fromsample_dataset
and data taken fromnew_dataset
.
- emsarray.utils.maybe_mask_and_scale(variable, name=None)#
Mask and scale a variable, if it appears to not have been masked and scaled already. Does nothing if the data is already masked or scaled.
- emsarray.utils.mask_and_scale(variable, name=None)#
Mask and scale a variable by running the relevant xarray Coders. Does the same thing as if mask_and_scale=True had been passed when opening the dataset.
- emsarray.utils.extract_vars(dataset, variables, keep_bounds=True, errors='raise')#
Extract a set of variables from a dataset, dropping all others.
This is approximately the opposite of
xarray.Dataset.drop_vars()
.- Parameters
dataset – The dataset to extract the variables from
variables – A list of variable names
keep_bounds – If true (the default), additionally keep any bounds variables for the included variables and all coordinates.
errors (
{"raise", "ignore"}
, optional) – If ‘raise’ (default), raises aValueError
error if any of the variable passed are not in the dataset. If ‘ignore’, any given names that are in the dataset are kept and no error is raised.
- Returns
xr.Dataset
– A new dataset with only the named variables included.
See also
- emsarray.utils.pairwise(iterable)#
Iterate over values in an iterator in pairs.
Example
>>> for a, b in pairwise("ABCD"): ... print(a, b) A B B C C D
- emsarray.utils.dimensions_from_coords(dataset, coordinate_names)#
Get the names of the dimensions for a set of coordinates.
- emsarray.utils.check_data_array_dimensions_match(dataset, data_array)#
Check that the dimensions of a
xarray.DataArray
match the dimensions of axarray.Dataset
. This is useful when using the metadata of a particular dataset to display a data array, without requiring the data array to be taken directly from the dataset.If the dimensions do not match, a ValueError is raised, indicating the mismatched dimension.
- Parameters
dataset – The dataset used as a reference
data_array – The data array to check the dimensions of
- Raises
ValueError – Raised if the dimensions do not match
- emsarray.utils.move_dimensions_to_end(data_array, dimensions)#
Transpose the dimensions of a
xarray.DataArray
such that the given dimensions appear as the last dimensions, in the order given.Other dimensions appear as the first dimensions, in the same order they are present in the original dataset
- Parameters
data_array (
xarray.DataArray
) – The data array to transposedimensions (
list
ofstr
) – The dimensions to move to the end
Examples
>>> data_array.dims ('a', 'b', 'c', 'd') >>> transposed = move_dimensions_to_end(data_array, ['c', 'b']) >>> transposed.dims ('a', 'd', 'c', 'b')
- emsarray.utils.linearise_dimensions(data_array, dimensions, linear_dimension=None)#
Flatten the given dimensions of a
DataArray
. Other dimensions are kept as-is. This is useful for turning a DataArray with dimensions (‘t’, ‘z’, ‘y’, ‘x’) in to (‘t’, ‘z’, ‘index’).- Parameters
data_array (
xarray.DataArray
) – The data array to linearizedimensions (
list
ofstr
) – The dimensions to linearize, in the desired order. These dimensions can be in any order and any position in the input data array.linear_dimension (
str
, optional) – The name of the new dimension of flattened data. Defaults to index, or index_0, index_1, etc if not given.
- Returns
xarray.DataArray
– A new data array with the specified dimensions flattened. Only data, coordinates, and dimensions are set, attributes and encodings are not copied over.
Examples
>>> data_array = xr.DataArray( ... data=np.random.random((3, 5, 7)), ... dims=['x', 'y', 'z'], ... ) >>> flattened = linearise_dimensions(data_array, ['y', 'x']) >>> flattened.dims ('z', 'index') >>> flattened.shape (7, 15) >>> expected = np.transpose(data_array.isel(z=0).values).ravel() >>> all(flattened.isel(z=0).values == expected) True
- emsarray.utils.datetime_from_np_time(np_time)#
Convert a numpy
datetime64
to a pythondatetime
. Useful when formatting dates for human consumption, as a numpydatetime64
has no equivalent ofdatetime.datetime.strftime()
.This does present the possibility of losing precision, as a numpy datetime64 has variable accuracy up to an attosecond, while Python datetimes have fixed microsecond accuracy. A conversion that truncates data is not reported as an error. If you’re using numpy datetime64 with attosecond accuracy, the Python datetime formatting methods are insufficient for your needs anyway.
- exception emsarray.utils.RequiresExtraException(extra)#
Raised when the optional dependencies for some functionality have not been installed, and a function requiring them is called.
See also
requires_extra()