emsarray.operations.point_extraction#
Subset a dataset at a set of points.
extract_dataframe()
takes a pandas DataFrame
,
subsets the dataset at the point specified in each row,
and merges the dataset with the dataframe.
The points extracted will form the coordinates for the new dataset.
extract_points()
takes a list of Points
,
subsets the dataset at these points,
and returns a new dataset with out any associated geometry.
This is useful if you want to add your own metadata to the subset dataset.
emsarray extract-points is a command line interface to extract_dataframe()
.
Functions#
- extract_dataframe(dataset, dataframe, coordinate_columns, *, point_dimension='point', missing_points='error', fill_value=<NA>)#
Extract the points listed in a pandas
DataFrame
, and merge the remaining columns in to theDataset
.- Parameters:
dataset (
xarray.Dataset
) – The dataset to extract point data fromdataframe (
pandas.DataFrame
) – A dataframe with longitude and latitude columns, and possibly other columns.coordinate_columns (
tuple
ofstr
,str
) – The names of the longitude and latitude columns in the dataframe.point_dimension (
Hashable
, optional) – The name of the new dimension to create in the dataset. Optional, defaults to “point”.missing_points (
{'error', 'drop', 'fill'}
, default'error'
) – How to handle points that do not intersect the dataset geometry:‘error’ will raise a
NonIntersectingPoints
exception.‘drop’ will drop those points from the dataset.
‘fill’ will include those points but all data variables will be filled with an appropriate fill value such as
numpy.nan
for float values.
fill_value – Passed to
xarray.Dataset.merge()
when missing_points is ‘fill’. See the documentation for that method for all options. Defaults to a sensible fill value for each variables dtype.
- Returns:
xarray.Dataset
– A new dataset that only contains data at the given points, plus any new columns present in the dataframe. The point_dimension dimension will have a coordinate with the same name whose values match the row numbers of the dataframe. This is useful when missing_points is “drop” to find out which points were dropped.
Example
import emsarray import pandas from emsarray.operations import point_extraction ds = emsarray.tutorial.open_dataset('gbr4') df = pandas.DataFrame({ 'lon': [152.807, 152.670, 153.543], 'lat': [-24.9595, -24.589, -25.488], 'name': ['a', 'b', 'c'], }) point_data = point_extraction.extract_dataframe( ds, df, ['lon', 'lat']) point_data
<xarray.Dataset> Dimensions: (k: 47, point: 3, time: 1) Coordinates: zc (k) float32 ... * time (time) datetime64[ns] 2022-05-11T14:00:00 * point (point) int64 0 1 2 lon (point) float64 152.8 152.7 153.5 lat (point) float64 -24.96 -24.59 -25.49 Dimensions without coordinates: k Data variables: botz (point) float32 ... eta (time, point) float32 ... salt (time, k, point) float32 ... temp (time, k, point) float32 ... name (point) object 'a' 'b' 'c' Attributes: (12/14) ...
- extract_points(dataset, points, *, point_dimension='point', missing_points='error')#
Drop all data except for cells that intersect the given points. Return a new dataset with a new dimension named
point_dimension
, with the same size as the nubmer ofpoints
, containing only data at those points.The returned dataset has no coordinate information.
- Parameters:
dataset (
xarray.Dataset
) – The dataset to extract point data from.points (
list
ofshapely.Point
) – The points to select.point_dimension (
Hashable
, optional) – The name of the new dimension to index points along. Defaults to"point"
.errors (
{'raise', 'drop'}
, default'raise'
) – How to handle points which do not intersect the dataset.If ‘raise’, a
NonIntersectingPoints
is raised.If ‘drop’, the points are dropped from the returned dataset.
- Returns:
xarray.Dataset
– A subset of the input dataset that only contains data at the given points. The dataset will only contain the values, without any geometry coordinates. The point_dimension dimension will have a coordinate with the same name whose values match the indices of the points array. This is useful when errors is ‘drop’ to find out which points were dropped.
See also