emsarray.operations.point_extraction#

Subset a dataset at a set of points.

extract_dataframe() takes a pandas DataFrame, subsets the dataset at the point specified in each row, and merges the dataset with the dataframe. The points extracted will form the coordinates for the new dataset.

extract_points() takes a list of Points, subsets the dataset at these points, and returns a new dataset with out any associated geometry. This is useful if you want to add your own metadata to the subset dataset.

emsarray extract-points is a command line interface to extract_dataframe().

Functions#

extract_dataframe(dataset, dataframe, coordinate_columns, *, point_dimension='point', missing_points='error', fill_value=<NA>)#

Extract the points listed in a pandas DataFrame, and merge the remaining columns in to the Dataset.

Parameters:
  • dataset (xarray.Dataset) – The dataset to extract point data from

  • dataframe (pandas.DataFrame) – A dataframe with longitude and latitude columns, and possibly other columns.

  • coordinate_columns (tuple of str, str) – The names of the longitude and latitude columns in the dataframe.

  • point_dimension (Hashable, optional) – The name of the new dimension to create in the dataset. Optional, defaults to “point”.

  • missing_points ({'error', 'drop', 'fill'}, default 'error') – How to handle points that do not intersect the dataset geometry:

    • ‘error’ will raise a NonIntersectingPoints exception.

    • ‘drop’ will drop those points from the dataset.

    • ‘fill’ will include those points but all data variables will be filled with an appropriate fill value such as numpy.nan for float values.

  • fill_value – Passed to xarray.Dataset.merge() when missing_points is ‘fill’. See the documentation for that method for all options. Defaults to a sensible fill value for each variables dtype.

Returns:

xarray.Dataset – A new dataset that only contains data at the given points, plus any new columns present in the dataframe. The point_dimension dimension will have a coordinate with the same name whose values match the row numbers of the dataframe. This is useful when missing_points is “drop” to find out which points were dropped.

Example

import emsarray
import pandas
from emsarray.operations import point_extraction

ds = emsarray.tutorial.open_dataset('gbr4')
df = pandas.DataFrame({
    'lon': [152.807, 152.670, 153.543],
    'lat': [-24.9595, -24.589, -25.488],
    'name': ['a', 'b', 'c'],
})
point_data = point_extraction.extract_dataframe(
    ds, df, ['lon', 'lat'])
point_data
<xarray.Dataset>
Dimensions:  (k: 47, point: 3, time: 1)
Coordinates:
    zc       (k) float32 ...
  * time     (time) datetime64[ns] 2022-05-11T14:00:00
  * point    (point) int64 0 1 2
    lon      (point) float64 152.8 152.7 153.5
    lat      (point) float64 -24.96 -24.59 -25.49
Dimensions without coordinates: k
Data variables:
    botz     (point) float32 ...
    eta      (time, point) float32 ...
    salt     (time, k, point) float32 ...
    temp     (time, k, point) float32 ...
    name     (point) object 'a' 'b' 'c'
Attributes: (12/14)
    ...
extract_points(dataset, points, *, point_dimension='point', missing_points='error')#

Drop all data except for cells that intersect the given points. Return a new dataset with a new dimension named point_dimension, with the same size as the nubmer of points, containing only data at those points.

The returned dataset has no coordinate information.

Parameters:
  • dataset (xarray.Dataset) – The dataset to extract point data from.

  • points (list of shapely.Point) – The points to select.

  • point_dimension (Hashable, optional) – The name of the new dimension to index points along. Defaults to "point".

  • errors ({'raise', 'drop'}, default 'raise') – How to handle points which do not intersect the dataset.

    • If ‘raise’, a NonIntersectingPoints is raised.

    • If ‘drop’, the points are dropped from the returned dataset.

Returns:

xarray.Dataset – A subset of the input dataset that only contains data at the given points. The dataset will only contain the values, without any geometry coordinates. The point_dimension dimension will have a coordinate with the same name whose values match the indices of the points array. This is useful when errors is ‘drop’ to find out which points were dropped.

Exceptions#

exception NonIntersectingPoints(indices, points)#

Raised when a point to extract does not intersect the dataset geometry.

indices#

The indices of the points that do not intersect

points#

The non-intersecting points