Supporting additional conventions#

emsarray allows developers to add support for additional geometry conventions by creating a new subclass of the emsarray.conventions.Convention class. These additional conventions can be either packaged as part of your application or distributed as reusable and installable plugins.

Creating a subclass#

You’ve just made a new dataset geometry convention called “Gigantic Revolutionary Awesome Spatial System” - GRASS for short. You’re making a Python package with a bunch of utilities to help developers and scientists work with GRASS datasets, called grass.

To add support for GRASS to emsarray, make a new Convention subclass. For this example, we will make a new module named grass.convention. The complete implementation of the Grass class is available. The following is a guided walk through developing this class.

We will need the following imports:

import enum
from functools import cached_property
from typing import Dict, Hashable, Optional, Sequence, Tuple

import numpy
import xarray
from shapely.geometry import Polygon
from shapely.geometry.base import BaseGeometry

from emsarray.conventions import DimensionConvention, Specificity
from emsarray.masking import blur_mask
from emsarray.types import Pathish

Grids and indexes#

A Convention must specify an enum of the different grids that it supports. If it only supports one grid, make an enum with a single member.

class GrassGridKind(enum.Enum):
    field = 'field'
    fence = 'fence'

A Convention must specify the convention native index types it uses. GRASS grids use indexes with two coordinates for fields, and one index for fences:

GrassIndex = Tuple[GrassGridKind, Sequence[int]]


class Grass(DimensionConvention[GrassGridKind, GrassIndex]):

Specifying the index type is only used for type checking, it is not referred to or enforced at runtime.

Convention class#

Create a emsarray.conventions.Convention subclass named Grass, and implement all the methods below.

class Grass(DimensionConvention[GrassGridKind, GrassIndex]):

    #: All the grid kinds this dataset has
    grid_kinds = frozenset(GrassGridKind)

    #: Indicates the grid kind of cells
    default_grid_kind = GrassGridKind.field

Convention.check_dataset() introspects a xarray.Dataset and returns a value indicating whether this convention implementation can understand the dataset.

    @classmethod
    def check_dataset(cls, dataset: xarray.Dataset) -> Optional[int]:
        # A Grass dataset is recognised by the 'Conventions' global attribute
        if dataset.attrs['Conventions'] == 'Grass 1.0':
            return Specificity.HIGH
        return None

DimensionConvention.unpack_index() and DimensionConvention.pack_index() transform between native index types and a grid kind and indices. The native representation must be representable as JSON for GeoJSON export support. The simplest representation is a tuple of (grid_kind, indices):

    def unpack_index(self, index: GrassIndex) -> Tuple[GrassGridKind, Sequence[int]]:
        return index[0], list(index[1])
    def pack_index(self, grid_kind: GrassGridKind, indices: Sequence[int]) -> GrassIndex:
        return (grid_kind, list(indices))

DimensionConvention.grid_dimensions() specifies which dataset dimensions each grid kind is defined on. This method can introspect the dataset to determine the correct dimensions if required. This method should be cached.

    @cached_property
    def grid_dimensions(self) -> Dict[GrassGridKind, Sequence[Hashable]]:
        return {
            GrassGridKind.field: ['warp', 'weft'],
            GrassGridKind.fence: ['post'],
        }

Convention.polygons is an array of shapely.Polygon instances, one for each face in the dataset. If a cell does not have a valid polygon — for example, if the coordinates for that polygon have been dropped or are outside of the valid region — that index must be None. It is strongly encouraged to use @cached_property for this property, as it is typically slow to run.

    @cached_property
    def polygons(self) -> numpy.ndarray:
        def make_polygon_for_cell(warp: int, weft: int) -> Polygon:
            # Implementation left as an exercise for the reader
            return Polygon(...)

        return numpy.array([
            make_polygon_for_cell(warp, weft)
            for warp in range(self.dataset.dimensions['warp'])
            for weft in range(self.dataset.dimensions['weft'])
        ])

The last thing to implement is clipping datasets, via the Convention.make_clip_mask() and Convention.apply_clip_mask() methods. Implementers are encouraged to look at existing Convention implementations for concrete examples.

    def make_clip_mask(
        self,
        clip_geometry: BaseGeometry,
        buffer: int = 0,
    ) -> xarray.Dataset:
        # Find all the fields that intersect the clip geometry
        field_indexes = self.strtree.query(clip_geometry, predicate='intersects')
        # Find all the fences associated with each intesecting field
        fence_indexes = numpy.unique([
            self.ravel_index(fence_index)
            for field_index in field_indexes
            for fence_index in self.get_fences_around_field(field_index)
        ])

        # Make an array of which fields to keep
        keep_fields = xarray.DataArray(
            data=numpy.zeros(self.grid_size[GrassGridKind.field], dtype=bool),
            dims=['index'],
        )
        keep_fields.values[field_indexes] = True
        keep_fields = self.wind(keep_fields, grid_kind=GrassGridKind.field)

        # Same for fences
        keep_fences = xarray.DataArray(
            data=numpy.zeros(self.grid_size[GrassGridKind.fence], dtype=bool),
            dims=['index'],
        )
        keep_fences.values[fence_indexes] = True
        keep_fences = self.wind(keep_fences, grid_kind=GrassGridKind.fence)

        # Blur the masks a bit if the clip region needs buffering
        if buffer > 0:
            keep_fields.values = blur_mask(keep_fields.values, size=buffer)

        # Make a dataset out of these masks
        return xarray.Dataset(
            data_vars={
                'fields': keep_fields,
                'fences': keep_fences,
            },
        )
    def apply_clip_mask(self, clip_mask: xarray.Dataset, work_dir: Pathish) -> xarray.Dataset:
        # You're on your own, here.
        # This depends entirely on how the mask and datasets interact.
        pass

Registering as part of an application#

If you are making an application that needs to support GRASS, or just experimenting with a new convention type, but don’t intend on distributing the new convention implementation as a plugin, you can use the register_convention() function. This will add the convention to the internal convention registry. It can be used as a decorator or called directly:

from emsarray.conventions import Convention, Specificity, register_convention

@register_convention
class Grass(...):

The convention implementation will not be automatically discovered by emsarray, so you must ensure that the Python file containing the Grass subclass is imported before you attempt to use it. This can be done in your applications __init__.py as import grass.convention.

Distributing as a plugin#

emsarray uses entry points to find convention implementations distributed as plugins. Users can install your plugin and emsarray will automatically find the included subclass.

If you have created a convention subclass called Grass in the module grass.convention include the following entry point in your setup.cfg:

[entry_points]
emsarray.conventions =
    Grass = grass.convention:Grass

The name portion before the = is not used, however we suggest using the same class name as your new convention implementation. The value portion after the = is the import path to your class, then a :, then the name of your class. If your package contains multiple convention implementations, add one per line.

As a real world example, emsarray defines the following entry points:

[entry_points]
emsarray.conventions =
    ArakawaC = emsarray.conventions.arakawa_c:ArakawaC
    CFGrid1D = emsarray.conventions.grid:CFGrid1D
    CFGrid2D = emsarray.conventions.grid:CFGrid2D
    ShocSimple = emsarray.conventions.shoc:ShocSimple
    ShocStandard = emsarray.conventions.shoc:ShocStandard
    UGrid = emsarray.conventions.ugrid:UGrid