emsarray.masking#

Common functions for working with dataset masks. Masks are used when clipping datasets to a smaller geographic subset, such as Convention.clip().

mask_grid_dataset(dataset, mask, work_dir, **kwargs)#

Apply a mask to a two-dimensional grid dataset, such as CFGrid1D and CFGrid2D, or datasets with multiple grids such as ArakawaC

Parameters:

dataset – The Dataset instance to mask
mask – The mask to apply. Different types of datasets need different masks.
work_dir – An empty directory where temporary files can be stored while applying the mask. The returned dataset will be built from files inside this directory, so callers must save the returned dataset before deleting this directory.
kwargs – Any extra kwargs are passed to open_mfdataset when assembling the new, clipped dataset.

Returns:

Dataset – The masked dataset

mask_grid_data_array(mask, data_array)#

Apply a mask to a single data array. A mask dataset contains one or more mask data arrays. The mask to apply is selected by comparing dimensions - the first mask found which has dimensions that are a subset of the data array dimensions is used.

Parameters:

mask (xarray.Dataset) – The mask dataset
data_array (xarray.DataArray) – The DataArray to mask

Returns:

xarray.DataArray – A new DataArray with any masked values replaced with _FillValue. The returned data array will be the same shape as the input data array. If no appropriate mask is found, the original data array is returned unmodified.

find_fill_value(data_array)#

Float-typed variables can easily be masked. If they don’t already have a fill value, they can be masked using NaN without issue. However there are some int-typed variables without a fill value that _cant_ be automatically masked.

Parameters:

data_array (xarray.DataArray) – The DataArray to find an appropriate _FillValue for.

Returns:

fill value – A numpy scalar value appropriate to use as the fill value in the data array.

For masked arrays, this will be numpy.ma.masked — note that xarray itself does not use masked arrays, but is compatible with them.
For data arrays that already have a _FillValue used by xarray, numpy.nan is returned. xarray will substitute in all _FillValue with numpy.nan when opening files.
For data arrays that have been opened with mask_and_scale=False, the existing _FillValue is returned.
If the data array has a float dtype, numpy.nan is returned.
If none of the above are true, a ValueError is raised.

calculate_grid_mask_bounds(mask)#

Calculate the included bounds of a mask dataset for each dimension.

Parameters:: mask (xarray.Dataset) – The mask dataset should contain one or more boolean data arrays.
Returns:: dict – A dict of {dimension_name: slice(min_index, max_index)} will be returned. This dict can be passed directly in to a xarray.Dataset.isel() call to crop a dataset to the bounds of a mask.

smear_mask(arr, pad_axes)#

Take a boolean numpy array and a list indicating which axes to smear along. Return a new array, expanded along the axes, with the boolean values smeared accordingly.

This is a half baked convolution operator where the pad_axes parameter is used to build the kernel.

Parameters:

arr (numpy.ndarray) – A boolean numpy numpy.ndarray.
pad_axes (list of bool) – A list of booleans, indicating which axes to smear along.

Returns:

numpy.ndarray – The smeared array. For every axis where pad_axes was True, the array will be one element larger.

Examples

>>> arr
array([[0, 0, 1, 0, 0],
       [0, 1, 0, 1, 0],
       [1, 0, 0, 0, 1]]

Smear along the y-axis:

>>> smear_mask(arr, [False, True])
array([[0, 0, 1, 1, 0, 0],
       [0, 1, 1, 1, 1, 0],
       [1, 1, 0, 0, 1, 1]]

Smear along both axes:

>>> smear_mask(arr, [True, True])
array([[0, 0, 1, 1, 0, 0],
       [0, 1, 1, 1, 1, 0],
       [1, 1, 1, 1, 1, 1],
       [1, 1, 0, 0, 1, 1])

blur_mask(arr, size=1)#

Take a boolean numpy array and blur it, such that all indices neighbouring a True value in the input array are True in the output array. The output array will have the same shape as the input array.

Parameters:

arr (numpy.ndarray) – A boolean array to blur
size (int) – The kernel size to use when blurring. In the output array, any cell that has a true value within its size neighbours along any axis is true.

Returns:

numpy.ndarray – The blurred array

Examples

>>> arr = numpy.array([
...     [1, 0, 0, 0, 0],
...     [0, 0, 0, 0, 0],
...     [0, 0, 0, 1, 0],
...     [0, 0, 0, 0, 1]])
>>> blur_mask(arr)
array([[1, 1, 0, 0, 0],
       [1, 1, 1, 1, 1],
       [0, 0, 1, 1, 1],
       [0, 0, 1, 1, 1]]

emsarray.masking

Contents

emsarray.masking#