Format interface
Format interface#
Each supported format implements the Format
interface.
- emsarray.formats.get_file_format(dataset)#
Find the most appropriate Format subclass for this dataset.
- class emsarray.formats.Format(dataset)#
Each supported file format represents data differently. The
Format
class abstracts these differences away, allowing developers to query and manipulate datasets without worrying about the details. See Supported dataset formats for a list of implemented formats.All formats have the concept of a cell at a geographic location, vertically stacked layers of cells, and multiple timesteps of data. A format may support additional grids, such as face edges and vertices. Refer to Grids for more information.
A cell can be addressed using a linear index or a native index. A linear index is always an
int
, while the native index type will depend on the specific format. You can convert between a linear and a native index usingravel_index()
andunravel_index()
. Refer to Indexing for more information.The depths of each layer can be found using
get_depths()
. The timesteps in a dataset can be found usingget_times()
.- dataset#
The
xarray.Dataset
instance for thisFormat
helper.
- classmethod open_dataset(path, **kwargs)#
Open the dataset at
path
, setting any flags necessary for this format.- Parameters
path – The path to the dataset to open
kwargs – These are passed straight through to
xarray.open_dataset()
.
- Returns
xarray.Dataset
– The opened dataset
See also
- classmethod check_validity(dataset)#
Checks that the dataset is OK to use. Called during __init__, and raises exceptions if the dataset has problems.
- abstract classmethod check_dataset(dataset)#
Check if a dataset is of this format.
This may check for variables of the correct dimensions, the presence of specific attributes, or the layout of the dataset dimensions.
Specific subclasses must implement this function. It will be called by the format autodetector when guessing the correct helper for a dataset.
If the dataset matches, the return value indicates how specific the match is. When autodetecting the correct format helper the helper with the highest specicifity will be used. Many formats extend the CF grid conventions, so the CF Grid format classes will match many datasets. However this match is very generic. A more specific implementation such as SHOC may be supported. The SHOC format helper should return a higher specicifity than the CF grid helper.
- Parameters
dataset (
xarray.Dataset
) – The dataset instance to inspect.- Returns
int
, optional – If this format implementation can handle this dataset some integer greater than zero is returned. The higher the number, the more specific the support. If the dataset does not match this format, None is returned.
Example
>>> import xarray as xr ... from emsarray.formats.shoc import ShocStandard ... from emsarray.formats.ugrid import UGrid ... dataset = xr.open_dataset("./tests/datasets/shoc_standard.nc") >>> ShocStandard.check_dataset(dataset) True >>> UGrid.check_dataset(dataset) False
- abstract get_time_name()#
Get the name of the time variable in this dataset.
- abstract get_depth_name()#
Get the name of the layer depth coordinate variable. For datasets with multiple depth variables, this should be the one that represents the centre of the layer, not the bounds.
Note that this is the name of the coordinate variable, not the name of the dimension, for datasets where these differ.
- abstract get_all_depth_names()#
Get the names of all depth layers. Some datasets include both a depth layer centre, and the depth layer ‘edges’.
Note that this is the names of the coordinate variables, not the names of the dimensions, for datasets where these differ.
- get_depths()#
Get the depth of each vertical layer in this dataset.
- Returns
numpy.ndarray
– An array of depths, one per vertical layer in the dataset.
- get_times()#
Get all timesteps in this dataset.
- Returns
numpy.ndarray
– An array of datetimes. The datetimes will be in what ever native format the dataset uses, likelynumpy.datetime64
.
- abstract ravel_index(index)#
Convert a format native index to a linear index.
Each format has a different native index type, read the specific format documentation for more information.
- Parameters
index (
Index
) – The native index to ravel.- Returns
int
– The linear index for that native index.
Example
If the dataset used the CF Grid conventions, across a (latitude, longitude) grid of size (30, 40):
>>> temp = dataset.data_vars['temp'] >>> temp.dims ('t', 'z', 'y', 'x') >>> temp.shape (10, 20, 30, 40) >>> dataset.ems.ravel_index((3, 4)) 124
Cell polygons are indexed in the same order as the linear indices for cells. To find the polygon for the cell with the native index
(3, 4)
:>>> index = (3, 4) >>> linear_index = dataset.ems.ravel_index(index) >>> polygon = dataset.ems.polygons[linear_index]
See also
- abstract unravel_index(linear_index, grid_kind=None)#
Convert a linear index to a format native index.
Each format has a different native index type, read the specific format documentation for more information.
- Parameters
linear_index – The linear index to unravel.
grid_kind (
GridKind
) – Used to indicate what kind of index is being unravelled, for formats with multiple grids. Optional, if not provided this will return the unravelled face index.
- Returns
Index
– The format native index for that same cell
Example
If the dataset used the CF Grid conventions, across a (latitude, longitude) grid of size (30, 40):
>>> temp = dataset.data_vars['temp'] >>> temp.dims ('t', 'z', 'y', 'x') >>> temp.shape (10, 20, 30, 40) >>> dataset.ems.unravel_index(124) (3, 4)
See also
- abstract property grid_kinds#
All of the
grid kinds
this dataset includes.
- abstract property default_grid_kind#
The default
grid kind
for this dataset. For most datasets this should be the face grid.
- abstract get_grid_kind_and_size(data_array)#
Determines the relevant index kind and the extent of the linear index space for this data array.
If the data array is not indexable using the native index types a ValueError is raised.
- Parameters
data_array – The data array to introspect
- Returns
- Raises
ValueError – If the data array passed in is not indexable using any native index type a ValueError is raised. Depth coordinates or time coordinates are examples of data arrays that will not be indexable and will raise an error.
Example
For a
UGRID
dataset with temperature data defined at the cell centres and current defined as flux through the cell edges:>>> dataset.ems.topology.face_count 4 >>> dataset.ems.topology.edge_count 9 >>> dataset.data_vars['temp'].dims ('time', 'depth', 'face') >>> dataset.data_vars['u1'].dims ('time', 'depth', 'edge') >>> dataset.ems.get_grid_kind_and_size(dataset.data_vars['temp']) (UGridKind.face, 4) >>> dataset.ems.get_grid_kind_and_size(dataset.data_vars['u1']) (UGridKind.edge, 9)
- abstract make_linear(data_array)#
Flatten the surface dimensions of a
DataArray
, returning a flatternp.ndarray
indexed in the same order as the linear index.For DataArrays with extra dimensions such as time or depth, only the surface dimensions are flattened. Other dimensions are left as is.
For datasets with multiple grids, with data defined on edges or vertices for example, this will flatten those data arrays in the correct linear order to be indexed by the relevant index type.
- Parameters
data_array – One of the data variables from this dataset.
- Returns
xarray.DataArray
– A new data array, where all the surface dimensions have been flattened in to one linear array. The values for each cell, in the same order as the linear index for this dataset. Any other dimensions, such as depth or time, will be retained.
- property data_crs#
The coordinate reference system that coordinates in this dataset are defined in. Used by
make_patch_collection()
andmake_quiver()
. Defaults tocartopy.crs.PlateCarree
.
- plot_on_figure(figure, scalar=None, vector=None, title=None)#
Plot values for a
DataArray
on amatplotlib
Figure
.The data array can either be passed in directly, or the name of a data array on this
Format.dataset
instance. The data array does not have to come from the same dataset, as long as the dimensions are the same.- Parameters
figure – The
Figure
instance to plot this on.scalar (
data array
) – TheDataArray
to plot, or the name of an existing DataArray in this Dataset. This method will only plot a single time step and depth layer. Callers are responsible for selecting a single slice.vector (
tuple
ofdata arrays
) – A tuple of the u and v components of a vector. The components should be aDataArray
, or the name of an existing DataArray in this Dataset. This method will only plot a single time step and depth layer. Callers are responsible for selecting a single slice.
See also
plot.plot_on_figure()
The underlying implementation
- plot(scalar=None, vector=None, title=None)#
Plot a data array and automatically display it.
This method is most useful when working in Jupyter notebooks which display figures automatically.
See also
- animate_on_figure(figure, scalar=None, vector=None, coordinate=None, title=None, **kwargs)#
Make an animated plot of a data array.
For real world examples, refer to the
examples/animation.ipynb
notebook.- Parameters
figure – The
matplotlib.figure.Figure
to plot the animation ondata_array – The
xarray.DataArray
to plot. If a string is passed in, the variable with that name is taken fromdataset
.coordinate – The coordinate to vary across the animation. Pass in either the name of a coordinate variable or coordinate variable itself. Optional, if not supplied the time coordinate from
get_time_name()
is used. Other appropriate coordinates to animate over include depth.
- Returns
matplotlib.animation.Animation
– The data variable plotted as an animation on the figure. CallAnimation.save()
to save animation to a file, or display it in a Notebook usingAnimation.to_jshtml()
.
See also
- make_patch_collection(data_array=None, **kwargs)#
Make a
PatchCollection
from the geometry of thisDataset
. This can be used to make custom matplotlib plots from your data.If a
DataArray
is passed in, the values of that are assigned to the PatchCollection array parameter.- Parameters
data_array (
str
orxarray.DataArray
, optional) – A data array, or the name of a data variable in this dataset. Optional. If given, the data array islinearised
and passed toPatchCollection.set_array()
. The data is used to colour the patches. Refer to the matplotlib documentation for more information on styling.**kwargs – Any keyword arguments are passed to the
PatchCollection
constructor.
- Returns
PatchCollection
– A PatchCollection constructed using the geometry of this dataset.
Example
import cartopy.crs as ccrs import matplotlib.pyplot as plt import emsarray figure = plt.figure(figsize=(10, 8)) axes = plt.subplot(projection=ccrs.PlateCarree()) axes.set_aspect(aspect='equal', adjustable='datalim') ds = emsarray.open_dataset("./tests/datasets/ugrid_mesh2d.nc") ds = ds.isel(record=0, Mesh2_layers=-1) patches = ds.ems.make_patch_collection('temp') axes.add_collection(patches) figure.colorbar(patches, ax=axes, location='right', label='meters') axes.set_title("Depth") axes.autoscale() figure.show()
- abstract property polygons#
A
np.ndarray
ofPolygon
instances representing the cells in this dataset.The order of the polygons in the list corresponds to the linear index of this dataset. Not all valid cell indices have a polygon, these holes are represented as
None
in the list. If you want a list of just polygons, apply themask
:dataset = emsarray.open_dataset("...") only_polygons = dataset.ems.polygons[dataset.ems.mask]
See also
- property face_centres#
A numpy
ndarray
of face centres, which are (x, y) pairs. The first dimension will be the same length and in the same order asFormat.polygons
, while the second dimension will always be of size 2.
- property mask#
A boolean
np.ndarray
indicating which cells have valid polygons. This can be used to select only items from linear arrays that have a corresponding polygon.dataset = emsarray.open_dataset("...") mask = dataset.ems.mask plottable_polygons = dataset.ems.polygons[mask] plottable_values = dataset.ems.make_linear("eta")[mask]
See also
- property spatial_index#
A shapely
strtree.STRtree
spatial index of all cells in this dataset. This allows for fast spatial lookups, querying which cells lie at a point, or which cells intersect a geometry.Querying the index with
strtree.STRtree.query_items()
will return a list ofSpatialIndexItem
instances. This will contain all cells which have envelopes overlapping the queried geometry. The caller must then refine the results further, by checking for intersection, cover, or contains for example.Example
To find the indices of all cells that overlap a given shape:
indices = [ item.index for item in dataset.ems.spatial_index.query(shape) if item.polygon.intersects(shape) ]
See also
- get_index_for_point(point)#
Find the index for a
point
in the dataset.- Parameters
point (
Point
) – The geographic point to query- Returns
SpatialIndexItem
, optional – TheSpatialIndexItem
for the point queried. This indicates the polygon that intersected the point and the index of that polygon in the dataset.If the point does not intersect the dataset, None is returned.
Notes
In the case where the point intersects multiple cells the cell with the lowest linear index is returned. This can happen if the point is exactly one of the cell vertices, or falls on a cell edge, or if the geometry of the dataset contains overlapping polygons.
- abstract selector_for_index(index)#
Convert a format native index into a selector that can be passed to
Dataset.isel
.- Parameters
index (
Index
) – A format native index- Returns
selector
– A dict suitable for passing toxarray.Dataset.isel()
that will select values at this index.
See also
- select_index(index)#
Return a new dataset that contains values only from a single index. This is much like doing a
xarray.Dataset.isel()
on an index, but works with format native index types.An index is associated with a grid kind. The returned dataset will only contain variables that were defined on this grid, with the single indexed point selected. For example, if the index of a face is passed in, the returned dataset will not contain any variables defined on an edge.
- Parameters
index (
Index
) – The index to select. The index must be for the default grid kind for this dataset.- Returns
xarray.Dataset
– A new dataset that is subset to the one index.
Notes
The returned dataset will most likely not have sufficient coordinate data to be used with a particular
Format
any more. Thedataset.ems
accessor will raise an error if accessed on the new dataset.
- select_point(point)#
Return a new dataset that contains values for a single point. This is a shortcut for
get_index_for_point()
andselect_index()
.If the point is not inside the dataset a
ValueError
is raised.- Parameters
point (
Point
) – The point to select- Returns
xarray.Dataset
– A dataset of values at the point
- make_geojson_geometry()#
Make a
geojson.FeatureCollection
out of the cells in this dataset, one feature per cell.Each feature will include the linear index and the native index of the corresponding cell in its properties.
- Returns
geojson.FeatureCollection
– The geometry of this dataset
- abstract make_clip_mask(clip_geometry)#
Make a new Dataset that can be used to clip this dataset to only the cells that intersect some geometry.
This dataset can be saved to a file to be reused to cut multiple datasets with identical shapes, such as a series of files representing multiple time series of a model.
The mask can be applied to this dataset (or other datasets identical in shape) using
apply_clip_mask()
.- Parameters
clip_geometry – The desired area to cut out. This can be any shapely geometry type, but will most likely be a polygon
- Returns
xarray.Dataset
– The mask
See also
- abstract apply_clip_mask(clip_mask, work_dir)#
Apply a clip mask to this dataset, and return a new dataset. Call
make_clip_mask()
to create a clip mask from a clip geometry.The
clip_mask
can be saved and loaded to disk if the mask needs to be reused across multiple datasets, such as multiple time series from one model.Depending on the implementation, the input dataset may be sliced in to multiple files during cutting, and the returned
Dataset
might be a multi-fileDataset
built from these temporary files. The caller must either load the dataset in to memory usingload()
orcompute()
, or save the dataset to disk somewhere outside of the working directory before the working directory is cleaned up.- Parameters
clip_mask – The mask, as made by
make_clip_mask()
.work_dir – A directory where temporary files can be written to. Callers must create and manage this temporary directory, perhaps using
tempfile.TemporaryDirectory
.
- Returns
A new
Dataset
clipped using the mask
- clip(clip_geomery, work_dir)#
Generates a clip mask and applies it in one step.
See the documentation for
make_clip_mask()
andapply_clip_mask()
for more details.- Parameters
clip_geomery – The desired area to cut out. This can be any shapely geometry type, but will most likely be a polygon
work_dir – A directory where temporary files can be written to. Callers must create and manage this temporary directory, perhaps using
tempfile.TemporaryDirectory
.
- Returns
A new
Dataset
clipped using the mask
- to_netcdf(path, **kwargs)#
Save this dataset to a netCDF file, and also fix up the time units to make the EMS compatible.
- ocean_floor()#
An alias for
operations.ocean_floor()
- normalize_depth_variables(positive_down=True, deep_to_shallow=True)#
An alias for
operations.normalize_depth_variables()
- class emsarray.formats.SpatialIndexItem(linear_index, index, polygon)#
Information about an item in the STRtree spatial index for a dataset.
See also
- linear_index#
The linear index of this cell
- index#
The native index of this cell
- polygon#
The geographic shape of this cell
- emsarray.formats._base.GridKind#
Some type that can enumerate the different grid types present in a dataset. This can be an
enum.Enum
listing each different kind of grid.Index
values will be included in the feature properties ofFormat.make_geojson_geometry()
. If the index type includes the grid kind, the grid kind needs to be JSON serializable. The easiest way to achieve this is to make your GridKind type subclassstr
:class MyGridKind(str, enum.Enum): face = 'face' edge = 'edge' node = 'node'
For cases where the format only supports a single grid, a singleton enum can be used.
More esoteric cases involving datasets with a potentially unbounded numbers of grids can use a type that supports this instead.
alias of TypeVar(‘GridKind’)
- emsarray.formats._base.Index#
An index to a specific point on a grid in this format. For formats with multiple grids (e.g. cells, edges, and nodes), this should be a tuple whos first element is
GridKind
. For formats with a single grid,GridKind
is not required.alias of TypeVar(‘Index’)