Convention interface

Contents

Convention interface#

Each supported convention implements the Convention interface.

emsarray.open_dataset(path, **kwargs)#

Open a dataset and determine the correct Convention implementation for it. If a valid Convention implementation can not be found, an error is raised.

Parameters:
  • path – The path to the dataset to open

  • kwargs – These are passed straight through to xarray.open_dataset().

Returns:

xarray.Dataset – The opened dataset

Example

import emsarray
dataset = emsarray.open_dataset("./tests/datasets/ugrid_mesh2d.nc")
emsarray.get_dataset_convention(dataset)#

Find the most appropriate Convention subclass for this dataset.

Parameters:

dataset – The dataset to introspect

Returns:

Convention – A Convention subclass appropriate for this dataset, or None if nothing appropriate can be found.

Example

>>> ds = emsarray.tutorial.open_dataset('austen.nc')
>>> get_dataset_convention(ds)
emsarray.conventions.ugrid.UGrid

Convention#

All dataset conventions have the following methods:

class emsarray.conventions.Convention(dataset)#

Each supported geometry convention represents data differently. The Convention class abstracts these differences away, allowing developers to query and manipulate datasets without worrying about the details. See Supported dataset conventions for a list of implemented conventions.

All conventions have the concept of a cell at a geographic location, vertically stacked layers of cells, and multiple timesteps of data. A convention may support additional grids, such as face edges and vertices. Refer to Grids for more information.

A cell can be addressed using a linear index or a native index. A linear index is always an int, while the native index type will depend on the specific convention. You can convert between a linear and a native index using ravel_index() and wind_index(). Refer to Indexing for more information.

dataset#

The xarray.Dataset instance for this Convention

classmethod check_validity(dataset)#

Checks that the dataset is OK to use. Called during __init__, and raises an exception if the dataset has problems.

abstractmethod classmethod check_dataset(dataset)#

Check if a dataset uses this convention.

This may check for variables of the correct dimensions, the presence of specific attributes, or the layout of the dataset dimensions.

Specific subclasses must implement this function. It will be called by the convention autodetector when guessing the correct convention for a dataset.

If the dataset matches, the return value indicates how specific the match is. When autodetecting the correct convention implementation the convention with the highest specicifity will be used. Many conventions extend the CF grid conventions, so the CF Grid convention classes will match many datasets. However this match is very generic. A more specific implementation such as SHOC may be supported. The SHOC convention implementation should return a higher specicifity than the CF grid convention.

Parameters:

dataset (xarray.Dataset) – The dataset instance to inspect.

Returns:

int, optional – If this convention implementation can handle this dataset some integer greater than zero is returned. The higher the number, the more specific the support. If the dataset does not match this convention, None is returned. Values on the Specificity enum are used by emsarray itself to indicated specificity. New convention implementations are free to use these values, or use any integer value.

Example

>>> import xarray
... from emsarray.conventions.shoc import ShocStandard
... from emsarray.conventions.ugrid import UGrid
... dataset = xarray.open_dataset("./tests/datasets/shoc_standard.nc")
>>> ShocStandard.check_dataset(dataset)
True
>>> UGrid.check_dataset(dataset)
False
bind()#

Bind this Convention instance as the default convention for the xarray.Dataset. This convention instance will be assigned to dataset.ems.

You can use a Convention instance without binding it to a Dataset, binding is only necessary if you need to use the dataset.ems accessor.

Note

If you use emsarray.open_dataset() or dataset.ems to autodetect the dataset convention you do not need to call this method. Convention.bind() is only useful if you manually construct a Convention.

Examples

# Open a dataset built using the GRASS convention
dataset = xarray.open_dataset("grass-dataset.nc")

# Construct a Grass instance for the dataset and bind it
convention = Grass(dataset)
convention.bind()

# dataset.ems is now the bound convention
assert dataset.ems is convention

If the dataset already has a bound convention, an error is raised. To bind a new convention to a dataset, make a copy of the dataset first:

new_dataset = dataset.copy()
convention = Grass(new_dataset)
convention.bind()
property time_coordinate#

The time coordinate for this dataset.

Raises:

exceptions.NoSuchCoordinateError – If no time coordinate was found

Notes

xarray will find all time variables and convert them to numpy datetimes when opening a dataset.

The CF Conventions state that a time variable is defined by having a units attribute formatted according to the UDUNITS package [1].

In practice, some datasets do not follow this conventions. emsarray will first look for a time coordinate that conforms to the CF Conventions, but if none are found it will fall back to picking a datetime variable that has at least one coordinate_type: time, standard_name: time, or axis: T attribute set.

References

Type:

xarray.DataArray

property depth_coordinate#

The depth coordinate for this dataset. For datasets with multiple depth coordinates this will be the depth coordinate with the smallest size.

Raises:

exceptions.NoSuchCoordinateError – If no depth coordinate was found

Notes

The CF Conventions state that a depth variable is identifiable by units of pressure; or the presence of the positive attribute with value of up or down [2].

In practice, many datasets do not follow this convention. In addition to checking for the positive attribute, all coordinates are checked for a standard_name: "depth", coordinate_type: "Z", or axiz: "Z".

References

Type:

xarray.DataArray

property depth_coordinates#

All the depth coordinate for this dataset.

Type:

tuple of xarray.DataArray

get_depth_coordinate_for_data_array(data_array)#

Find the depth coordinate for a particular data array. Some conventions will contain multiple depth coordinates, meaning that a default depth_coordinate value can be misleading.

Parameters:

data_array (xarray.DataArray or Hashable) – A data array or the name of a data array in the dataset.

Returns:

xarray.DataArray – The depth coordinate variable for the data array.

Raises:
  • NoSuchCoordinateError – If data array does not have an associated depth coordinate

  • ValueError – If multiple depth coordinates matched the data array.

See also

depth_coordinate

The default or main depth coordinate for this dataset, but not necessarily the correct depth coordinate for all variables.

depth_coordinates

All the depth coordinates in this dataset.

ravel_index(index)#

Convert a convention native index to a linear index.

Each convention has a different native index type, read the specific convention documentation for more information.

Parameters:

index (Index) – The native index to ravel.

Returns:

int – The linear index for that native index.

Example

If the dataset used the CF Grid conventions, across a (latitude, longitude) grid of size (30, 40):

>>> temp = dataset.data_vars['temp']
>>> temp.dims
('t', 'z', 'y', 'x')
>>> temp.shape
(10, 20, 30, 40)
>>> dataset.ems.ravel_index((CFGridKind.face, 3, 4))
124

Cell polygons are indexed in the same order as the linear indexes for cells. To find the polygon for the cell with the native index (3, 4):

>>> grid = dataset.ems.grids[CFGridKind.face]
>>> index = (CFGridKind.face, 3, 4)
>>> linear_index = grid.ravel_index(index)
>>> polygon = grid.geometry[linear_index]

See also

Convention.wind_index()

The inverse operation

wind_index(linear_index, *, grid_kind=None)#

Convert a linear index to a convention native index.

Each convention has a different native index type, read the specific convention documentation for more information.

Parameters:
  • linear_index (int) – The linear index to wind.

  • grid_kind (GridKind, optional) – Used to indicate what kind of index is being wound, for conventions with multiple grids. Optional, if not provided the default grid kind will be used.

Returns:

Index – The convention native index for that same cell

Example

If the dataset used the CF Grid conventions, across a (latitude, longitude) grid of size (30, 40):

>>> temp = dataset.data_vars['temp']
>>> temp.dims
('t', 'z', 'y', 'x')
>>> temp.shape
(10, 20, 30, 40)
>>> dataset.ems.wind_index(124)
(CFGridKind.face, 3, 4)

See also

Convention.ravel_index()

The inverse operation

abstract property grid_kinds#

A set of the grid kinds this dataset includes.

abstract property default_grid_kind#

The default grid kind for this dataset. For most datasets this should be a polygon grid.

abstract property grids#

All of the grids this dataset is defined on, as a dict of GridKind: Grid. Every GridKind in Convention.grid_kinds is present as a key in the dictionary.

property grid_size#

The linear size of each grid kind.

Deprecated since version 1.0.0: Use Grid.size instead:

dataset = xarray.open_dataset(...)
grid = dataset.ems.default_grid
grid.size
abstractmethod get_grid_kind(data_array)#

Determines the relevant grid kind for this data array. If the data array doesn’t match any grid kind for this dataset a ValueError is raised

Parameters:

data_array (xarray.DataArray) – The data array to inspect

Returns:

GridKind

Raises:

ValueError – If the data array passed in doesn’t match any grid kind for this dataset a ValueError is raised. Depth coordinates or time coordinates are examples of data arrays that will not be indexable and will raise an error.

Example

For a UGRID dataset with temperature data defined at the cell centres and current defined as flux through the cell edges:

>>> dataset.data_vars['temp'].dims
('time', 'depth', 'face')
>>> dataset.data_vars['u1'].dims
('time', 'depth', 'edge')
>>> dataset.ems.get_grid_kind(dataset.data_vars['temp'])
UGridKind.face
>>> dataset.ems.get_grid_kind(dataset.data_vars['u1'])
UGridKind.edge
get_grid(data_array)#

Get the Grid for a xarray.DataArray. If the data array doesn’t match any grid kind for this dataset a ValueError is raised

Parameters:

data_array (xarray.DataArray) – A xarray.DataArray from this dataset, or a data array with matching dimensions.

Returns:

Grid – A Grid from Convention.grids

Raises:

ValueError – If the data array passed in doesn’t match any grid kind for this dataset a ValueError is raised. Depth coordinates or time coordinates are examples of data arrays that will not be indexable and will raise an error.

abstractmethod get_grid_kind_for_index(index)#

Determines the relevant GridKind for this Index.

Parameters:

index (Index) – An index

Returns:

grid_kind (GridKind) – The GridKind for the Index

get_grid_for_index(index)#

Determines the relevant Grid for this Index.

Parameters:

index (Index) – An index

Returns:

grid (Grid) – The Grid for the Index

ravel(data_array, *, linear_dimension=None)#

Flatten the surface dimensions of a DataArray, returning a flatter numpy.ndarray indexed in the same order as the linear index.

For DataArrays with extra dimensions such as time or depth, only the surface dimensions are flattened. Other dimensions are left as is.

For datasets with multiple grids, with data defined on edges or vertices for example, this will flatten those data arrays in the correct linear order to be indexed by the relevant index type.

Parameters:
  • data_array (xarray.DataArray) – One of the data variables from this dataset.

  • linear_dimension (Hashable, optional) – The name of the new dimension to flatten the surface dimensions to. Defaults to ‘index’.

Returns:

xarray.DataArray – A new data array, where all the surface dimensions have been flattened in to one linear array. The values for each cell, in the same order as the linear index for this dataset. Any other dimensions, such as depth or time, will be retained.

See also

Grid.ravel

The actual implementation

Convention.wind

The inverse operation.

utils.ravel_dimensions

A function that ravels some given dimensions in a dataset.

wind(data_array, *, grid_kind=None, axis=None, linear_dimension=None)#

Wind a flattened DataArray so that it has the same shape as data variables in this dataset.

See Grid.wind() for full documentation.

Parameters:
  • data_array (xarray.DataArray) – One of the data variables from this dataset.

  • grid_kind (GridKind) – The kind of grid this data array represents, for those conventions with multiple grid kinds. Optional, defaults to the default grid kind.

  • axis (int, optional) – The axis number that should be wound. Optional, defaults to the last axis. Mutually exclusive with the linear_dimension parameter.

  • linear_dimension (Hashable, optional) – The name of the dimension in the data array that should be wound. Optional, defaults to the last dimension. Mutually exclusive with the axis parameter.

Returns:

xarray.DataArray – A new data array where the linear data have been wound to match the shape of the convention. Any other dimensions, such as depth or time, will be retained.

See also

Grid.wind

The actual implementation.

utils.wind_dimension

Reshape a particular dimension in a data array.

Convention.ravel

The inverse operation.

property data_crs#

The coordinate reference system that coordinates in this dataset are defined in. Used by plotting methods when creating Artists. Defaults to cartopy.crs.PlateCarree.

If your dataset uses a different coordinate reference system this property can be set manually:

dataset = emsarray.open_dataset(...)
dataset.ems.data_crs = pyproj.CRS('EPSG:32755')
plot(*args, **kwargs)#

Plot a data array and automatically display it.

This method is most useful when working in Jupyter notebooks which display figures automatically. This method is a wrapper around plot_on_figure() that creates and shows a Figure for you. All arguments are passed on to plot_on_figure(), refer to that function for details.

This method is a shortcut for quickly generating simple plots. It is not intended to be fully featured. See the examples for more comprehensive plotting examples.

See also

plot_on_figure()

plot_on_figure(figure, *variables, scalar=None, vector=None, title=None, **kwargs)#

Plot values for a DataArray on a matplotlib Figure.

The data array can either be passed in directly, or the name of a data array on this Convention.dataset instance. The data array does not have to come from the same dataset, as long as the dimensions are the same.

This method will only plot a single time step and depth layer. Callers are responsible for selecting a single slice before calling this method.

This method is a shortcut for quickly generating simple plots. It is not intended to be fully featured. See the examples for more comprehensive plotting examples.

Parameters:

See also

Convention.plot()

A shortcut that automatically displays the figure

plot.plot_on_figure()

The underlying implementation

animate_on_figure(figure, *variables, scalar=None, vector=None, coordinate=None, title=None, **kwargs)#

Make an animated plot of a data array.

This method is a shortcut for quickly generating simple animations. It is not intended to be fully featured. See the examples for more comprehensive plotting examples, and specifically Animated plot for an animated example.

Parameters:
  • figure (matplotlib.figure.Figure) – The matplotlib.figure.Figure to plot the animation on

  • *variables (xarray.DataArray or tuples of xarray.DataArray) – Any number of dataset variables to plot. Scalar variables should be passed in directly, while vector pairs should be passed in as a tuple. These will be passed to Convention.make_artist().

  • scalar (data array or name) – The DataArray to plot, or the name of an existing DataArray in this Dataset.

    Deprecated since version 1.0.0: Pass in variables as positional arguments instead

  • vector (tuple of data array or name) – A tuple of the u and v components of a vector. The components should be a DataArray, or the name of an existing DataArray in this Dataset.

    Deprecated since version 1.0.0: Pass in variables as positional arguments instead

  • coordinate (Hashable or xarray.DataArray, optional) – The coordinate to vary across the animation. Pass in either the name of a coordinate variable or coordinate variable itself. Optional, if not supplied the time coordinate from Convention.time_coordinate is used. Other appropriate coordinates to animate over include depth.

  • **kwargs – Any extra arguments are passed to plot.animate_on_figure().

Returns:

matplotlib.animation.Animation – The data variable plotted as an animation on the figure. Call Animation.save() to save animation to a file, or display it in a Notebook using Animation.to_jshtml().

abstractmethod make_artist(axes, variable, **kwargs)#

Make a matplotlib artists for the data array, adding it to the Axes and returning the Artist. This method will pick a reasonable way of drawing the variable based on the convention and the grid that the variable is defined on. See the documentation on each convention for specific details.

For most conventions:

  • scalar variables defined on a polygon grid will use make_polygon_scalar_collection()

  • vector pairs defined on a polygon grid will use make_polygon_vector_quiver()

  • scalar variables defined on a node grid will use make_node_scalar_artist()

Parameters:
  • axes (matplotlib.axes.Axes) – The axes to add the artist to.

  • variable (data array or name or tuple of data array or name) – The data array, or tuple of data arrays, to make an artist for. A sensible artist type is picked based on the data arrays passed in and the grids they are defined on.

  • kwargs (Any) – Any extra kwargs are passed on to the artist and can be used to style it. The specific kwargs that are accepted depend on the artist that is used.

Returns:

emsarray.plot.GridArtist – The artist for the data array passed in. The artist will already have been added to the axes.

See also

emsarray.plot

For a list of all supported artists

abstractmethod plot_geometry(axes)#

Plot the geometry of this dataset on the axes. What this means is dependent on the dataset convention used.

Parameters:

axes (matplotlib.axes.Axes)

Returns:

emsarray.plot.GridArtist – The artists that will draw the geometry.

make_poly_collection(data_array=None, **kwargs)#

Make a PolyCollection from the geometry of this Dataset.

Deprecated since version 1.0.0: Use Convention.make_artist() instead

If a DataArray is passed in, the values of that are assigned to the PolyCollection array parameter.

Parameters:
  • data_array (Hashable or xarray.DataArray, optional) – A data array, or the name of a data variable in this dataset. Optional. If given, the data array is ravelled and passed to PolyCollection.set_array(). The data is used to colour the patches. Refer to the matplotlib documentation for more information on styling.

  • **kwargs – Any keyword arguments are passed to the PolyCollection constructor.

Returns:

PolyCollection – A PolyCollection constructed using the geometry of this dataset.

make_quiver(axes, u=None, v=None, **kwargs)#

Make a matplotlib.quiver.Quiver instance to plot vector data.

Deprecated since version 1.0.0: Use Convention.make_artist() instead.

Parameters:
  • axes (matplotlib.axes.Axes) – The axes to make this quiver on.

  • u, v (xarray.DataArray or str, optional) – The DataArrays or the names of DataArrays in this dataset that make up the u and v components of the vector. If omitted, a Quiver will be constructed with all components set to 0.

  • **kwargs – Any keyword arguments are passed on to the Quiver constructor.

Returns:

matplotlib.quiver.Quiver – A quiver instance that can be added to a plot

make_geometry(grid_kind)#

Make geometry for the specified GridKind. This should normally be accessed via Grid.geometry, which stores a cached copy of the geometry for each grid.

make_geometry_centroid(grid_kind)#

Make geometry centroids for the specified GridKind. This should normally be accessed via Grid.centroid, which stores a cached copy of the geometry for each grid.

property geometry#

A shapely.Polygon or shapely.MultiPolygon that represents the geometry of the entire dataset.

This is equivalent to the union of all polygons in the dataset, although specific conventions may have a simpler way of constructing this.

property bounds#

Returns minimum bounding region (minx, miny, maxx, maxy) of the entire dataset.

This is equivalent to the bounds of the dataset geometry, although specific conventions may have a simpler way of constructing this.

selector_for_index(index)#

Convert a convention native index into a selector that can be passed to Dataset.isel.

Parameters:

index (Index) – A convention native index

Returns:

selector (xarray.Dataset) – A selector suitable for passing to xarray.Dataset.isel() that will select values at this index.

Notes

The returned selector is an xarray.Dataset, but the contents of the dataset are dependent on the specific convention and may change between versions of emsarray. The selector should be treated as an opaque value that is only useful when passed to xarray.Dataset.isel().

abstractmethod selector_for_indexes(indexes, *, index_dimension=None)#

Convert a list of convention native indexes into a selector that can be passed to Dataset.isel.

Parameters:

indexes (list of Index) – A list of convention native indexes

Returns:

selector (xarray.Dataset) – A selector suitable for passing to xarray.Dataset.isel() that will select values at this index.

Notes

The returned selector is an xarray.Dataset, but the contents of the dataset are dependent on the specific convention and may change between versions of emsarray. The selector should be treated as an opaque value that is only useful when passed to xarray.Dataset.isel().

select_index(index, drop_geometry=True)#

Return a new dataset that contains values only from a single index. This is much like doing a xarray.Dataset.isel() on an index, but works with convention native index types.

An index is associated with a grid kind. The returned dataset will only contain variables that were defined on this grid, with the single indexed point selected. For example, if the index of a face is passed in, the returned dataset will not contain any variables defined on an edge.

Parameters:
  • index (Index) – The index to select.

  • drop_geometry (bool, default True) – Whether to drop geometry variables from the returned point dataset. If the geometry data is kept the associated geometry data will no longer conform to the dataset convention and may not conform to any sensible convention at all. The format of the geometry data left after selecting points is convention-dependent.

Returns:

xarray.Dataset – A new dataset that is subset to the one index.

Notes

The returned dataset will most likely not have sufficient coordinate data to be used with a particular Convention any more. The dataset.ems accessor will raise an error if accessed on the new dataset.

select_indexes(indexes, *, index_dimension=None, drop_geometry=True)#

Return a new dataset that contains values only at the selected indexes. This is much like doing a xarray.Dataset.isel() on some indexes, but works with convention native index types.

An index is associated with a grid kind. The returned dataset will only contain variables that were defined on this grid, with the indexed points selected. For example, if the index of a face is passed in, the returned dataset will not contain any variables defined on an edge.

Parameters:
  • indexes (list of Index) – The indexes to select. The indexes must all be for the same grid kind.

  • index_dimension (str, optional) – The name of the new dimension added for each index to select. Defaults to the first unused dimension with prefix index.

  • drop_geometry (bool, default True) – Whether to drop geometry variables from the returned point dataset. If the geometry data is kept the associated geometry data will no longer conform to the dataset convention and may not conform to any sensible convention at all. The format of the geometry data left after selecting points is convention-dependent.

Returns:

xarray.Dataset – A new dataset that is subset to the indexes.

Notes

The returned dataset will most likely not have sufficient coordinate data to be used with a particular Convention any more. The dataset.ems accessor will raise an error if accessed on the new dataset.

select_point(point, *, grid_kind=None)#

Return a new dataset that contains values for a single point. This is a shortcut for querying the Grid.strtree to find intersecting geometry, then selecting the cell using select_index().

If the point is not inside the dataset a ValueError is raised.

Parameters:

point (shapely.Point) – The point to select

Returns:

xarray.Dataset – A dataset of values at the point

select_points(points, *, point_dimension=None, missing_points='error')#

Extract values from all variables on the default grid at a sequence of points.

Parameters:
  • points (list of shapely.Point) – The points to extract

  • point_dimension (str, optional) – The name of the new dimension used to index points. Defaults to ‘point’, or ‘point_0’, ‘point_1’, etc if those dimensions already exist.

  • missing_points ({'error', 'drop'}, default 'error') – What to do if a point does not intersect the dataset. ‘raise’ will raise an error, while ‘drop’ will drop those points.

Returns:

xarray.Dataset – A dataset with values extracted from the points. No variables not defined on the default grid and no geometry variables will be present.

abstractmethod get_all_geometry_names()#

Return a list of the names of all geometry variables used by this convention.

drop_geometry()#

Return a new xarray.Dataset with all geometry variables dropped. Useful when significantly transforming the dataset, such as extracting point data.

select_variables(variables)#

Select only a subset of the variables in this dataset, dropping all others.

This will keep all coordinate variables and all geometry variables.

Parameters:

variables (iterable of data array or name) – The data variables to select.

Returns:

xarray.DataArray – A new dataset with the same geometry and coordinates, but only the selected data variables.

abstractmethod make_clip_mask(clip_geometry, *, buffer=0)#

Make a new Dataset that can be used to clip this dataset to only the cells that intersect some geometry.

This dataset can be saved to a file to be reused to cut multiple datasets with identical shapes, such as a series of files representing multiple time series of a model.

The mask can be applied to this dataset (or other datasets identical in shape) using apply_clip_mask().

Parameters:
  • clip_geometry (shapely.BaseGeometry) – The desired area to cut out. This can be any shapely geometry type, but will most likely be a polygon

  • buffer (int, optional) – If set to a positive integer, a buffer of that many cells will be added around the clip region. This is useful if you need to clip to a particular area, but also would like to do some interpolation on the output cells.

Returns:

xarray.Dataset – The mask

abstractmethod apply_clip_mask(clip_mask, work_dir)#

Apply a clip mask to this dataset, and return a new dataset. Call make_clip_mask() to create a clip mask from a clip geometry.

The clip_mask can be saved and loaded to disk if the mask needs to be reused across multiple datasets, such as multiple time series from one model.

Depending on the implementation, the input dataset may be sliced in to multiple files during cutting, and the returned Dataset might be a multi-file Dataset built from these temporary files. The caller must either load the dataset in to memory using load() or compute(), or save the dataset to disk somewhere outside of the working directory before the working directory is cleaned up.

Parameters:
Returns:

xarray.Dataset – A new Dataset clipped using the mask

clip(clip_geomery, work_dir, *, buffer=0)#

Generates a clip mask and applies it in one step.

See the documentation for make_clip_mask() and apply_clip_mask() for more details.

Parameters:
  • clip_geometry (shapely.BaseGeometry) – The desired area to cut out. This can be any shapely geometry type, but will most likely be a polygon

  • work_dir (str or pathlib.Path) – A directory where temporary files can be written to. Callers must create and manage this temporary directory, perhaps using tempfile.TemporaryDirectory.

  • buffer (int, optional) – If set to a positive integer, a buffer of that many cells will be added around the clip region. This is useful if you need to clip to a particular area, but also would like to do some interpolation on the output cells.

Returns:

xarray.Dataset – A new Dataset clipped using the mask

to_netcdf(path, **kwargs)#

Save this dataset to a netCDF file, and also fix up the time units to make the EMS compatible.

ocean_floor()#

An alias for emsarray.operations.depth.ocean_floor()

normalize_depth_variables(*, positive_down=None, deep_to_shallow=None)#

An alias for emsarray.operations.depth.normalize_depth_variables()

hash_geometry(hash)#

Updates the provided hash with all of the relevant geometry data for this dataset.

Parameters:

hash (hashlib-style hash instance) – The hash instance to update with geometry data. This must follow the interface defined in hashlib.

make_triangulation()#

Triangulates the polygons in the dataset. Subclasses may have improved implementations.

This requires the dataset to have a grid with polygonal geometry. If there is an additional grid corresponding to the vertices of the polygons then subclasses should use this information to inform the triangulation.

Returns:

tuple of vertices, triangles, and cell_indexes – A tuple of three numpy arrays is returned, containing vertices, triangles, and cell indexes respectively.

vertices is a numpy array of shape (V, 2) where V is the number of unique vertices in the dataset. The vertex coordinates are in (x, y) or (lon, lat) order. If the dataset has a vertex grid associated with the polygons then this array replicates that data.

triangles is a numpy array of shape (T, 3) where T is the number of triangles in the dataset. Each triangle is a set of three vertex indexes.

cell_indexes is a numpy list of length T. Each entry indicates which polygon from the dataset a triangle is a part of.

See also

triangulate()

class emsarray.conventions.Grid(convention: 'Convention[GridKind, Index]', grid_kind: GridKind, geometry_type: type[shapely.geometry.base.BaseGeometry])#
convention#

The Convention this grid is associated with

grid_kind#

The GridKind this grid represents

geometry_type#

A Shapely geometry class such as shapely.Polygon or shapely.Point.

abstract property size#

The linear size of DataArrays on this grid, not accounting for other dimensions such as time or depth.

property geometry#

The geometry of this grid as a numpy.ndarray of Shapely geometries. Different grids might have different geometry types. Some elements may be None, depending on the dataset convention and the particulars of the dataset.

property strtree#

A shapely.strtree.STRtree filled with the geometry of this grid. The indexes returned when querying this STRtree correspond to the linear index of this grid. The indexes correspond with the geometry in Grid.geometry.

Example

This example will find the linear index, native index, selector, and geometry at a point.

import emsarray
import shapely

dataset = emsarray.tutorial.open_dataset('gbr4')
point = shapely.Point(151.869, -23.386)
grid = dataset.ems.grids['face']
intersecting_indexes = grid.strtree.query(point, predicate="intersects")

if len(intersecting_indexes) == 0:
    print("No intersecting geometry")

else:
    linear_index = intersecting_indexes[0]
    native_index = grid.wind_index(linear_index)
    selector = dataset.ems.selector_for_index(native_index)
    geometry = grid.geometry[linear_index]
property mask#

A boolean numpy.ndarray of the same length as Grid.geometry, True where the geometry is defined, False if the geometry is undefined or invalid.

property centroid#

The centres of the geometry of this grid as a numpy.ndarray of Shapely points. Defaults to the shapely.centroid() of Grid.geometry, but some conventions might have more specific ways of finding the centres. If geometry is empty the corresponding index in this array will be None.

See also

Grid.centroid

The same data represented as a two-dimensional array of coordinates.

Grid.mask

A boolean array indicating where the grid has geometry

property centroid_coordinates#

The centres of the geometry of this grid as a 2-dimensional numpy.ndarray with shape (Grid.size, 2). If geometry is empty the corresponding row in this array will be [numpy.nan, numpy.nan].

See also

Grid.centroid

The same data represented as an array of shapely.Point.

Grid.mask

A boolean array indicating where the grid has geometry

abstractmethod ravel_index(index)#

Convert a native Index to a linear index for this grid.

abstractmethod wind_index(linear_index)#

Convert a linear index to a native Index for this grid.

abstractmethod ravel(data_array, *, linear_dimension=None)#

Flatten the surface dimensions of a DataArray, returning a flatter numpy.ndarray indexed in the same order as the linear index.

For DataArrays with extra dimensions such as time or depth, only the surface dimensions are flattened. Other dimensions are left as is.

Parameters:
  • data_array (xarray.DataArray) – One of the data variables from this dataset.

  • linear_dimension (Hashable, optional) – The name of the new dimension to flatten the surface dimensions to. Defaults to ‘index’.

Returns:

xarray.DataArray – A new data array where all the surface dimensions have been flattened in to one linear array. The values for each cell are in the same order as the linear index for this grid. Any other dimensions, such as depth or time, will be retained.

See also

Convention.ravel

An alias for this that will find the correct grid for a data array first.

utils.ravel_dimensions

A function that ravels some given dimensions in a dataset.

abstractmethod wind(data_array, *, axis=None, linear_dimension=None)#

Wind a flattened DataArray so that it has the same shape as this grid.

By using size and wind() together it is possible to construct new data variables for a dataset of any arbitrary shape.

Parameters:
  • data_array (xarray.DataArray) – One of the data variables from this dataset.

  • axis (int, optional) – The axis number that should be wound. Optional, defaults to the last axis. Mutually exclusive with the linear_dimension parameter.

  • linear_dimension (Hashable, optional) – The name of the dimension in the data array that should be wound. Optional, defaults to the last dimension. Mutually exclusive with the axis parameter.

Returns:

xarray.DataArray – A new data array where the linear data have been wound to match the shape of this grid. Any other dimensions, such as depth or time, will be retained.

Examples

The following will construct a data array of the correct shape for any convention supported by emsarray:

import emsarray
import numpy
import xarray

dataset = emsarray.tutorial.open_dataset('kgari')
grid = dataset.ems.default_grid
flat_array = xarray.DataArray(
    data=numpy.arange(grid.size),
    dims=['index'],
)
data_array = grid.wind(flat_array)

This will construct a boolean array indicating which cells of a dataset intersect a target geometry:

import emsarray
import numpy
import shapely
import xarray

dataset = emsarray.tutorial.open_dataset('gbr4')
target = shapely.Polygon([
    [152.8088379, -22.7863108],
    [153.9184570, -22.2280904],
    [153.4680176, -20.9614396],
    [151.8255615, -20.4012720],
    [151.4135742, -21.8309067],
    [152.0068359, -22.4313402],
    [152.8088379, -22.7863108],
])

grid = dataset.ems.default_grid
hits = grid.strtree.query(target, predicate='intersects')
intersecting_cells = xarray.DataArray(
    data=numpy.zeros(grid.size, dtype=bool),
    dims=['index'],
)
intersecting_cells.values[hits] = True
intersecting_cells = grid.wind(intersecting_cells)

dataset.ems.plot(intersecting_cells)

See also

Grid.ravel

The inverse operation.

Convention.wind

A shortcut to this method on the Convention

utils.wind_dimension

Reshape a particular dimension in a data array.

Concepts#

type emsarray.conventions.GridKind#

All datasets define variables on one or more grid. GridKind enumerates all the available grids for a specific convention, while Grid provides methods for introspecting a particular grid.

The GridKind for a dataset is usually an enum.Enum listing each different kind of grid.

Notes

Index values will be included in the feature properties of exported geometry from emsarray.operations.geometry. As the native index for a convention usually includes the grid kind, the grid kind needs to be JSON serializable. The easiest way to achieve this is to make your GridKind type subclass str:

class MyGridKind(str, enum.Enum):
    face = 'face'
    edge = 'edge'
    node = 'node'

For cases where the convention only supports a single grid, a singleton enum can be used.

More esoteric cases involving datasets with a potentially unbounded numbers of grids can use a type that supports this instead.

type emsarray.conventions.Index#

An index to a specific point on a grid in this convention. For conventions with multiple grids (e.g. cells, edges, and nodes), this should be a tuple whos first element is GridKind. For conventions with a single grid, GridKind is not required.

Dimension conventions#

Most dataset conventions have grids that are uniquely identifiable by their dimensions. For these conventions the DimensionConvention subclass provides many default implementations. These details are most relevant for developers implementing new Convention subclasses.

class emsarray.conventions.DimensionConvention(dataset)#

A Convention subclass where different grid kinds are always defined on unique sets of dimension. This covers most conventions.

This subclass adds the abstract properties grid_dimensions and geometry_types that subclasses must define.

Default implementations are provided for all abstract Grid methods using DimensionGrid.

grid_dimensions#

The dimensions associated with a particular grid kind.

This is a mapping between grid kinds and an ordered list of dimension names. Each dimension in the dataset must be associated with at most one grid kind. Each grid kind must be associated with at least one dimension. The dimensions must be in the order expected in a dataset, if order is significant.

This property may introspect the dataset to determine which dimensions are used. The property should be cached.

class emsarray.conventions.DimensionGrid(convention, grid_kind, geometry_type, *, dimensions)#

A Grid subclass that complements DimensionConvention. Has one extra required argument dimensions, and provides an implementation of all abstract Grid methods.

dimensions#

The dimensions that data arrays on this grid must have.

class emsarray.conventions.Specificity(*values)#

How specific a match is when autodetecting a convention. Matches with higher specificity will be prioritised.

General conventions such as CF Grid are low specificity, as many conventions extend and build on CF Grid conventions.

The SHOC conventions extend the CF grid conventions, so a SHOC file will be detected as both CF Grid and SHOC. ShocStandard should return a higher specificity so that the correct convention implementation is used.

LOW = 10#
MEDIUM = 20#
HIGH = 30#