Supporting additional conventions#
emsarray
allows developers to add support for additional geometry conventions
by creating a new subclass of the emsarray.conventions.Convention
class.
These additional conventions can be either packaged as part of your application
or distributed as reusable and installable plugins.
Creating a subclass#
You’ve just made a new dataset geometry convention called
“Gigantic Revolutionary Awesome Spatial System” - GRASS for short.
You’re making a Python package with a bunch of utilities
to help developers and scientists work with GRASS datasets,
called grass
.
To add support for GRASS to emsarray
,
make a new Convention
subclass.
For this example, we will make a new module named grass.convention
.
The complete implementation of the Grass class is available
.
The following is a guided walk through developing this class.
We will need the following imports:
import enum
from functools import cached_property
from typing import Dict, Hashable, Optional, Sequence, Tuple
import numpy
import xarray
from shapely.geometry import Polygon
from shapely.geometry.base import BaseGeometry
from emsarray.conventions import DimensionConvention, Specificity
from emsarray.masking import blur_mask
from emsarray.types import Pathish
Grids and indexes#
A Convention must specify an enum of the different grids that it supports. If it only supports one grid, make an enum with a single member.
class GrassGridKind(enum.Enum):
field = 'field'
fence = 'fence'
A Convention must specify the convention native index types it uses. GRASS grids use indexes with two coordinates for fields, and one index for fences:
GrassIndex = Tuple[GrassGridKind, Sequence[int]]
class Grass(DimensionConvention[GrassGridKind, GrassIndex]):
Specifying the index type is only used for type checking, it is not referred to or enforced at runtime.
Convention class#
Create a emsarray.conventions.Convention
subclass named Grass
,
and implement all the methods below.
class Grass(DimensionConvention[GrassGridKind, GrassIndex]):
#: All the grid kinds this dataset has
grid_kinds = frozenset(GrassGridKind)
#: Indicates the grid kind of cells
default_grid_kind = GrassGridKind.field
Convention.check_dataset()
introspects a xarray.Dataset
and returns a value indicating whether this convention implementation can understand the dataset.
@classmethod
def check_dataset(cls, dataset: xarray.Dataset) -> Optional[int]:
# A Grass dataset is recognised by the 'Conventions' global attribute
if dataset.attrs['Conventions'] == 'Grass 1.0':
return Specificity.HIGH
return None
DimensionConvention.unpack_index()
and DimensionConvention.pack_index()
transform between native index types and a grid kind and indices.
The native representation must be representable as JSON for GeoJSON export support.
The simplest representation is a tuple of (grid_kind, indices):
def unpack_index(self, index: GrassIndex) -> Tuple[GrassGridKind, Sequence[int]]:
return index[0], list(index[1])
def pack_index(self, grid_kind: GrassGridKind, indices: Sequence[int]) -> GrassIndex:
return (grid_kind, list(indices))
DimensionConvention.grid_dimensions()
specifies which dataset dimensions
each grid kind is defined on.
This method can introspect the dataset to determine the correct dimensions if required.
This method should be cached.
@cached_property
def grid_dimensions(self) -> Dict[GrassGridKind, Sequence[Hashable]]:
return {
GrassGridKind.field: ['warp', 'weft'],
GrassGridKind.fence: ['post'],
}
Convention.polygons
is an array of shapely.Polygon
instances,
one for each face in the dataset.
If a cell does not have a valid polygon
— for example, if the coordinates for that polygon have been dropped
or are outside of the valid region
— that index must be None
.
It is strongly encouraged to use @cached_property
for this property,
as it is typically slow to run.
@cached_property
def polygons(self) -> numpy.ndarray:
def make_polygon_for_cell(warp: int, weft: int) -> Polygon:
# Implementation left as an exercise for the reader
return Polygon(...)
return numpy.array([
make_polygon_for_cell(warp, weft)
for warp in range(self.dataset.dimensions['warp'])
for weft in range(self.dataset.dimensions['weft'])
])
The last thing to implement is clipping datasets,
via the Convention.make_clip_mask()
and Convention.apply_clip_mask()
methods.
Implementers are encouraged to look at existing Convention implementations
for concrete examples.
def make_clip_mask(
self,
clip_geometry: BaseGeometry,
buffer: int = 0,
) -> xarray.Dataset:
# Find all the fields that intersect the clip geometry
field_indexes = self.strtree.query(clip_geometry, predicate='intersects')
# Find all the fences associated with each intesecting field
fence_indexes = numpy.unique([
self.ravel_index(fence_index)
for field_index in field_indexes
for fence_index in self.get_fences_around_field(field_index)
])
# Make an array of which fields to keep
keep_fields = xarray.DataArray(
data=numpy.zeros(self.grid_size[GrassGridKind.field], dtype=bool),
dims=['index'],
)
keep_fields.values[field_indexes] = True
keep_fields = self.wind(keep_fields, grid_kind=GrassGridKind.field)
# Same for fences
keep_fences = xarray.DataArray(
data=numpy.zeros(self.grid_size[GrassGridKind.fence], dtype=bool),
dims=['index'],
)
keep_fences.values[fence_indexes] = True
keep_fences = self.wind(keep_fences, grid_kind=GrassGridKind.fence)
# Blur the masks a bit if the clip region needs buffering
if buffer > 0:
keep_fields.values = blur_mask(keep_fields.values, size=buffer)
# Make a dataset out of these masks
return xarray.Dataset(
data_vars={
'fields': keep_fields,
'fences': keep_fences,
},
)
def apply_clip_mask(self, clip_mask: xarray.Dataset, work_dir: Pathish) -> xarray.Dataset:
# You're on your own, here.
# This depends entirely on how the mask and datasets interact.
pass
Registering as part of an application#
If you are making an application that needs to support GRASS,
or just experimenting with a new convention type,
but don’t intend on distributing the new convention implementation as a plugin,
you can use the register_convention()
function.
This will add the convention to the internal convention registry.
It can be used as a decorator or called directly:
from emsarray.conventions import Convention, Specificity, register_convention
@register_convention
class Grass(...):
The convention implementation will not be automatically discovered by emsarray
,
so you must ensure that the Python file containing the Grass subclass is imported
before you attempt to use it.
This can be done in your applications __init__.py
as import grass.convention
.
Distributing as a plugin#
emsarray
uses entry points
to find convention implementations distributed as plugins.
Users can install your plugin and emsarray
will automatically find the included subclass.
If you have created a convention subclass called Grass
in the module grass.convention
include the following entry point in your setup.cfg
:
[entry_points]
emsarray.conventions =
Grass = grass.convention:Grass
The name
portion before the =
is not used,
however we suggest using the same class name as your new convention implementation.
The value
portion after the =
is the import path to your class,
then a :
, then the name of your class.
If your package contains multiple convention implementations, add one per line.
As a real world example, emsarray
defines the following entry points:
[entry_points]
emsarray.conventions =
ArakawaC = emsarray.conventions.arakawa_c:ArakawaC
CFGrid1D = emsarray.conventions.grid:CFGrid1D
CFGrid2D = emsarray.conventions.grid:CFGrid2D
ShocSimple = emsarray.conventions.shoc:ShocSimple
ShocStandard = emsarray.conventions.shoc:ShocStandard
UGrid = emsarray.conventions.ugrid:UGrid