Supporting additional conventions#
emsarray allows developers to add support for additional geometry conventions
by creating a new subclass of the emsarray.conventions.Convention class.
These additional conventions can be either packaged as part of your application
or distributed as reusable and installable plugins.
Creating a subclass#
You’ve just made a new dataset geometry convention called
“Gigantic Revolutionary Awesome Spatial System” - GRASS for short.
You’re making a Python package with a bunch of utilities
to help developers and scientists work with GRASS datasets,
called grass.
To add support for GRASS to emsarray,
make a new Convention subclass.
For this example, we will make a new module named grass.convention.
The complete implementation of the Grass class is available.
The following is a guided walk through developing this class.
We will need the following imports:
import enum
from collections.abc import Hashable, Sequence
from functools import cached_property
import numpy
import xarray
from shapely.geometry import Polygon
from shapely.geometry.base import BaseGeometry
from emsarray.conventions import DimensionConvention, Specificity
from emsarray.masking import blur_mask
from emsarray.types import Pathish
Grids and indexes#
A Convention must specify an enum of the different grids that it supports. If it only supports one grid, make an enum with a single member.
class GrassGridKind(enum.Enum):
field = 'field'
fence = 'fence'
A Convention must specify the convention native index types it uses. GRASS grids use indexes with two coordinates for fields, and one index for fences:
GrassIndex = tuple[GrassGridKind, Sequence[int]]
class Grass(DimensionConvention[GrassGridKind, GrassIndex]):
Specifying the index type is only used for type checking, it is not referred to or enforced at runtime.
Convention class#
Create a emsarray.conventions.Convention subclass named Grass,
and implement all the methods below.
class Grass(DimensionConvention[GrassGridKind, GrassIndex]):
#: All the grid kinds this dataset has
grid_kinds = frozenset(GrassGridKind)
#: Indicates the grid kind of cells
default_grid_kind = GrassGridKind.field
Convention.check_dataset() introspects a xarray.Dataset
and returns a value indicating whether this convention implementation can understand the dataset.
@classmethod
def check_dataset(cls, dataset: xarray.Dataset) -> int | None:
# A Grass dataset is recognised by the 'Conventions' global attribute
if dataset.attrs['Conventions'] == 'Grass 1.0':
return Specificity.HIGH
return None
DimensionConvention.unpack_index() and DimensionConvention.pack_index()
transform between native index types and a grid kind and indexes.
The native representation must be representable as JSON for GeoJSON export support.
The simplest representation is a tuple of (grid_kind, indexes):
def unpack_index(self, index: GrassIndex) -> tuple[GrassGridKind, Sequence[int]]:
return index[0], list(index[1])
def pack_index(self, grid_kind: GrassGridKind, indexes: Sequence[int]) -> GrassIndex:
return (grid_kind, list(indexes))
DimensionConvention.grid_dimensions() specifies which dataset dimensions
each grid kind is defined on.
This method can introspect the dataset to determine the correct dimensions if required.
This method should be cached.
@cached_property
def grid_dimensions(self) -> dict[GrassGridKind, Sequence[Hashable]]:
return {
GrassGridKind.field: ['warp', 'weft'],
GrassGridKind.fence: ['post'],
}
Convention.polygons is an array of shapely.Polygon instances,
one for each face in the dataset.
If a cell does not have a valid polygon
— for example, if the coordinates for that polygon have been dropped
or are outside of the valid region
— that index must be None.
It is strongly encouraged to use @cached_property for this property,
as it is typically slow to run.
@cached_property
def polygons(self) -> numpy.ndarray:
def make_polygon_for_cell(warp: int, weft: int) -> Polygon:
# Implementation left as an exercise for the reader
return Polygon(...)
return numpy.array([
make_polygon_for_cell(warp, weft)
for warp in range(self.dataset.dimensions['warp'])
for weft in range(self.dataset.dimensions['weft'])
])
The last thing to implement is clipping datasets,
via the Convention.make_clip_mask()
and Convention.apply_clip_mask() methods.
Implementers are encouraged to look at existing Convention implementations
for concrete examples.
def make_clip_mask(
self,
clip_geometry: BaseGeometry,
buffer: int = 0,
) -> xarray.Dataset:
# Find all the fields that intersect the clip geometry
field_indexes = self.strtree.query(clip_geometry, predicate='intersects')
# Find all the fences associated with each intesecting field
fence_indexes = numpy.unique([
self.ravel_index(fence_index)
for field_index in field_indexes
for fence_index in self.get_fences_around_field(field_index)
])
# Make an array of which fields to keep
keep_fields = xarray.DataArray(
data=numpy.zeros(self.grid_size[GrassGridKind.field], dtype=bool),
dims=['index'],
)
keep_fields.values[field_indexes] = True
keep_fields = self.wind(keep_fields, grid_kind=GrassGridKind.field)
# Same for fences
keep_fences = xarray.DataArray(
data=numpy.zeros(self.grid_size[GrassGridKind.fence], dtype=bool),
dims=['index'],
)
keep_fences.values[fence_indexes] = True
keep_fences = self.wind(keep_fences, grid_kind=GrassGridKind.fence)
# Blur the masks a bit if the clip region needs buffering
if buffer > 0:
keep_fields.values = blur_mask(keep_fields.values, size=buffer)
# Make a dataset out of these masks
return xarray.Dataset(
data_vars={
'fields': keep_fields,
'fences': keep_fences,
},
)
def apply_clip_mask(self, clip_mask: xarray.Dataset, work_dir: Pathish) -> xarray.Dataset:
# You're on your own, here.
# This depends entirely on how the mask and datasets interact.
pass
Registering as part of an application#
If you are making an application that needs to support GRASS,
or just experimenting with a new convention type,
but don’t intend on distributing the new convention implementation as a plugin,
you can use the register_convention() function.
This will add the convention to the internal convention registry.
It can be used as a decorator or called directly:
from emsarray.conventions import Convention, Specificity, register_convention
@register_convention
class Grass(...):
The convention implementation will not be automatically discovered by emsarray,
so you must ensure that the Python file containing the Grass subclass is imported
before you attempt to use it.
This can be done in your applications __init__.py as import grass.convention.
Distributing as a plugin#
emsarray uses entry points
to find convention implementations distributed as plugins.
Users can install your plugin and emsarray will automatically find the included subclass.
If you have created a convention subclass called Grass
in the module grass.convention
include the following entry point in your pyproject.toml:
[project.entry-points."emsarray.conventions"]
Grass = "grass.convention:Grass"
The name portion before the = is not used,
however we suggest using the same class name as your new convention implementation.
The value portion after the = is the import path to your class,
then a :, then the name of your class.
If your package contains multiple convention implementations, add one per line.
As a real world example, emsarray defines the following entry points:
[project.entry-points."emsarray.conventions"]
ArakawaC = "emsarray.conventions.arakawa_c:ArakawaC"
CFGrid1D = "emsarray.conventions.grid:CFGrid1D"
CFGrid2D = "emsarray.conventions.grid:CFGrid2D"
ShocSimple = "emsarray.conventions.shoc:ShocSimple"
ShocStandard = "emsarray.conventions.shoc:ShocStandard"
UGrid = "emsarray.conventions.ugrid:UGrid"