Each supported geometry convention represents data differently.
The Convention class abstracts these differences away,
allowing developers to query and manipulate datasets
without worrying about the details.
See Supported dataset conventions
for a list of implemented conventions.
All conventions have the concept of a cell at a geographic location,
vertically stacked layers of cells,
and multiple timesteps of data.
A convention may support additional grids, such as face edges and vertices.
Refer to Grids for more information.
A cell can be addressed using a linear index or a native index.
A linear index is always an int,
while the native index type will depend on the specific convention.
You can convert between a linear and a native index
using ravel_index() and wind_index().
Refer to Indexing for more information.
This may check for variables of the correct dimensions,
the presence of specific attributes,
or the layout of the dataset dimensions.
Specific subclasses must implement this function.
It will be called by the convention autodetector
when guessing the correct convention for a dataset.
If the dataset matches, the return value indicates how specific the match is.
When autodetecting the correct convention implementation
the convention with the highest specicifity will be used.
Many conventions extend the CF grid conventions,
so the CF Grid convention classes will match many datasets.
However this match is very generic.
A more specific implementation such as SHOC may be supported.
The SHOC convention implementation should return a higher specicifity
than the CF grid convention.
Parameters:
dataset (xarray.Dataset) – The dataset instance to inspect.
Returns:
int, optional – If this convention implementation can handle this dataset
some integer greater than zero is returned.
The higher the number, the more specific the support.
If the dataset does not match this convention, None is returned.
Values on the Specificity enum
are used by emsarray itself to indicated specificity.
New convention implementations are free to use these values,
or use any integer value.
# Open a dataset built using the GRASS conventiondataset=xarray.open_dataset("grass-dataset.nc")# Construct a Grass instance for the dataset and bind itconvention=Grass(dataset)convention.bind()# dataset.ems is now the bound conventionassertdataset.emsisconvention
If the dataset already has a bound convention, an error is raised.
To bind a new convention to a dataset, make a copy of the dataset first:
The CF Conventions state that
a depth variable is identifiable by units of pressure; or
the presence of the positive attribute with value of up or down[2].
In practice, many datasets do not follow this convention.
In addition to checking for the positive attribute,
all coordinates are checked for a standard_name:"depth",
coordinate_type:"Z", or axiz:"Z".
grid_kind (GridKind, optional) – Used to indicate what kind of index is being wound,
for conventions with multiple grids.
Optional, if not provided the default grid kind will be used.
Returns:
Index – The convention native index for that same cell
Example
If the dataset used the CF Grid conventions,
across a (latitude, longitude) grid of size (30, 40):
ValueError – If the data array passed in is not indexable using any native index type
a ValueError is raised.
Depth coordinates or time coordinates are examples of data arrays
that will not be indexable and will raise an error.
Example
For a UGRID dataset
with temperature data defined at the cell centres
and current defined as flux through the cell edges:
ValueError – If the data array passed in is not indexable using any native index type
a ValueError is raised.
Depth coordinates or time coordinates are examples of data arrays
that will not be indexable and will raise an error.
Example
For a UGRID dataset
with temperature data defined at the cell centres
and current defined as flux through the cell edges:
Flatten the surface dimensions of a DataArray,
returning a flatter numpy.ndarray indexed in the same order as the linear index.
For DataArrays with extra dimensions such as time or depth,
only the surface dimensions are flattened.
Other dimensions are left as is.
For datasets with multiple grids,
with data defined on edges or vertices for example,
this will flatten those data arrays in the correct linear order
to be indexed by the relevant index type.
Parameters:
data_array (xarray.DataArray) – One of the data variables from this dataset.
linear_dimension (Hashable, optional) – The name of the new dimension to flatten the surface dimensions to.
Defaults to ‘index’.
Returns:
xarray.DataArray – A new data array, where all the surface dimensions
have been flattened in to one linear array.
The values for each cell, in the same order as the linear index for this dataset.
Any other dimensions, such as depth or time, will be retained.
Wind a flattened DataArray
so that it has the same shape as data variables in this dataset.
By using grid_size and wind() together
it is possible to construct new data variables for a dataset
of any arbitrary shape.
Parameters:
data_array (xarray.DataArray) – One of the data variables from this dataset.
grid_kind (GridKind) – The kind of grid this data array represents,
for those conventions with multiple grid kinds.
Optional, defaults to the default grid kind.
axis (int, optional) – The axis number that should be wound.
Optional, defaults to the last axis.
linear_dimension (Hashable, optional) – The axis number that should be wound.
Optional, defaults to the last dimension.
Returns:
xarray.DataArray – A new data array where the linear data have been wound
to match the shape of the convention.
Any other dimensions, such as depth or time, will be retained.
Examples
The following will construct a data array of the correct shape
for any convention supported by emsarray:
The data array can either be passed in directly,
or the name of a data array on this Convention.dataset instance.
The data array does not have to come from the same dataset,
as long as the dimensions are the same.
This method will only plot a single time step and depth layer.
Callers are responsible for selecting a single slice before calling this method.
vector (tuple of xarray.DataArray or str) – A tuple of the u and v components of a vector.
The components should be a DataArray,
or the name of an existing DataArray in this Dataset.
This method is most useful when working in Jupyter notebooks
which display figures automatically.
This method is a wrapper around plot_on_figure()
that creates and shows a Figure for you.
All arguments are passed on to plot_on_figure(),
refer to that function for details.
coordinate (Hashable or xarray.DataArray, optional) – The coordinate to vary across the animation.
Pass in either the name of a coordinate variable
or coordinate variable itself.
Optional, if not supplied the time coordinate
from get_time_name() is used.
Other appropriate coordinates to animate over include depth.
Make a PolyCollection
from the geometry of this Dataset.
This can be used to make custom matplotlib plots from your data.
If a DataArray is passed in,
the values of that are assigned to the PolyCollection array parameter.
Parameters:
data_array (Hashable or xarray.DataArray, optional) – A data array, or the name of a data variable in this dataset. Optional.
If given, the data array is ravelled
and passed to PolyCollection.set_array().
The data is used to colour the patches.
Refer to the matplotlib documentation for more information on styling.
**kwargs – Any keyword arguments are passed to the
PolyCollection constructor.
Returns:
PolyCollection – A PolyCollection constructed using the geometry of this dataset.
u, v (xarray.DataArray or str, optional) – The DataArrays or the names of DataArrays in this dataset
that make up the u and v components of the vector.
If omitted, a Quiver will be constructed with all components set to 0.
**kwargs – Any keyword arguments are passed on to the Quiver constructor.
The order of the polygons in the list
corresponds to the linear index of this dataset.
Not all valid cell indices have a polygon,
these holes are represented as None in the list.
If you want a list of just polygons, apply the mask:
A numpy ndarray of face centres, which are (x, y) pairs.
The first dimension will be the same length and in the same order
as Convention.polygons,
while the second dimension will always be of size 2.
A boolean numpy.ndarray indicating which cells have valid polygons.
This can be used to select only items from linear arrays
that have a corresponding polygon.
A shapely.strtree.STRtree spatial index of all cells in this dataset.
This allows for fast spatial lookups, querying which cells lie at
a point, or which cells intersect a geometry.
Querying the STRtree will return the linear indices of any matching cells.
Use polygons to find the geometries associated with each index.
Use wind_index() to transform this back to a native index,
or ravel() to linearise a variable.
Examples
Find the indices of all cells that intersect a line:
A shapely.strtree.STRtree spatial index of all cells in this dataset.
This allows for fast spatial lookups, querying which cells lie at
a point, or which cells intersect a geometry.
This existed as a wrapper around a Shapely STRtree,
which changed its interface in Shapely 2.0.
Shapely 1.8.x is no longer supported by emsarray
so this compatibility wrapper is deprecated.
Use Convention.strtree directly instead.
Querying this spatial index will return a list of
(polygon, SpatialIndexItem) tuples
corresponding to each matching cell.
SpatialIndexItem instances have the cells linear index, native index, and polygon.
The query results from the STRtree contain all geometries with overlapping bounding boxes.
Query results need to be refined further
by comparing the cell geometry to the query geometry.
Refer to the Shapely 1.8.x STRtree docs for examples.
See also
SpatialIndexItem : The dataclass returned from querying the STRtree.
point (shapely.Point) – The geographic point to query
Returns:
SpatialIndexItem, optional – The SpatialIndexItem for the point queried.
This indicates the polygon that intersected the point
and the index of that polygon in the dataset.
If the point does not intersect the dataset, None is returned.
Notes
In the case where the point intersects multiple cells
the cell with the lowest linear index is returned.
This can happen if the point is exactly one of the cell vertices,
or falls on a cell edge,
or if the geometry of the dataset contains overlapping polygons.
Return a new dataset that contains values only from a single index.
This is much like doing a xarray.Dataset.isel() on an index,
but works with convention native index types.
An index is associated with a grid kind.
The returned dataset will only contain variables that were defined on this grid,
with the single indexed point selected.
For example, if the index of a face is passed in,
the returned dataset will not contain any variables defined on an edge.
Parameters:
index (Index) – The index to select.
The index must be for the default grid kind for this dataset.
Returns:
xarray.Dataset – A new dataset that is subset to the one index.
Notes
The returned dataset will most likely not have sufficient coordinate data
to be used with a particular Convention any more.
The dataset.ems accessor will raise an error if accessed on the new dataset.
Make a new Dataset that can be used to clip this dataset to only the
cells that intersect some geometry.
This dataset can be saved to a file to be reused to cut multiple
datasets with identical shapes, such as a series of files representing
multiple time series of a model.
The mask can be applied to this dataset (or other datasets identical in
shape) using apply_clip_mask().
Parameters:
clip_geometry (shapely.BaseGeometry) – The desired area to cut out. This can be any shapely geometry type,
but will most likely be a polygon
buffer (int, optional) – If set to a positive integer,
a buffer of that many cells will be added around the clip region.
This is useful if you need to clip to a particular area,
but also would like to do some interpolation on the output cells.
Apply a clip mask to this dataset, and return a new dataset.
Call make_clip_mask() to create a clip mask from a clip geometry.
The clip_mask can be saved and loaded to disk if the mask needs to
be reused across multiple datasets, such as multiple time series from
one model.
Depending on the implementation, the input dataset may be sliced in to
multiple files during cutting, and the returned Dataset
might be a multi-file Dataset built from these
temporary files. The caller must either load the dataset in to memory
using load() or compute(),
or save the dataset to disk somewhere outside of the working directory
before the working directory is cleaned up.
work_dir (str or pathlib.Path) – A directory where temporary files can be written to.
Callers must create and manage this temporary directory,
perhaps using tempfile.TemporaryDirectory.
clip_geometry (shapely.BaseGeometry) – The desired area to cut out.
This can be any shapely geometry type,
but will most likely be a polygon
work_dir (str or pathlib.Path) – A directory where temporary files can be written to.
Callers must create and manage this temporary directory,
perhaps using tempfile.TemporaryDirectory.
buffer (int, optional) – If set to a positive integer,
a buffer of that many cells will be added around the clip region.
This is useful if you need to clip to a particular area,
but also would like to do some interpolation on the output cells.
The dimensions associated with a particular grid kind.
This is a mapping between gridkinds
and an ordered list of dimension names.
Each dimension in the dataset must be associated with at most one grid kind.
Each grid kind must be associated with at least one dimension.
The dimensions must be in the order expected in a dataset,
if order is significant.
This property may introspect the dataset
to determine which dimensions are used.
The property should be cached.
Some type that can enumerate the different grid types
present in a dataset.
This can be an enum.Enum listing each different kind of grid.
Index values will be included in the feature properties
of exported geometry from emsarray.operations.geometry.
If the index type includes the grid kind,
the grid kind needs to be JSON serializable.
The easiest way to achieve this is to make your GridKind type subclass str:
An index to a specific point on a grid in this convention.
For conventions with multiple grids (e.g. cells, edges, and nodes),
this should be a tuple whos first element is GridKind.
For conventions with a single grid, GridKind is not required.
How specific a match is when autodetecting a convention.
Matches with higher specificity will be prioritised.
General conventions such as CF Grid are low specificity,
as many conventions extend and build on CF Grid conventions.
The SHOC conventions extend the CF grid conventions,
so a SHOC file will be detected as both CF Grid and SHOC.
ShocStandard should return a higher specificity
so that the correct convention implementation is used.