Data Provider#
The icesat2db.IceSat2Provider module in icesat2db is the core interface for accessing structured ICESat-2 data and metadata from a tileDB database. With this module, you can execute spatial and temporal queries on ICESat-2 data, retrieving relevant variables efficiently and enabling complex geospatial operations. The icesat2db.IceSat2Provider class streamlines the process, making it easy to access the extensive data generated by the ICESat-2 mission for advanced analysis.
Key capabilities#
Spatial Queries: Query ICESat-2 data based on specific spatial boundaries, enabling analyses within defined regions.
Temporal Queries: Filter data by date range to focus on specific time periods.
Variable Selection: Retrieve only the data variables needed for your analysis to optimize performance.
Quality Filters: Apply additional quality filters to refine data retrieval based on specific conditions.
Reference Point Query: Query ICESat-2 data based on a reference point and get the nearest shots within a defined radius.
Flexible Output Formats: Export results as either
xarray.Datasetfor multi-dimensional data orpandas.DataFramefor tabular data.
Potential available variables#
The database includes a wide range of variables from the ATL08 land and vegetation product, covering terrain elevation, canopy height metrics, quality flags, and ancillary data. Below is a table of commonly used variables:
Variable Name |
Description |
Units |
Category |
|---|---|---|---|
h_canopy |
98th percentile of relative canopy heights above estimated terrain |
meters |
Canopy |
h_max_canopy |
Maximum relative canopy height within segment (equivalent to RH100) |
meters |
Canopy |
h_mean_canopy |
Mean relative canopy height within segment |
meters |
Canopy |
h_te_best_fit |
Best-fit terrain elevation at the mid-point of each 100 m segment |
meters |
Terrain |
h_te_mean |
Mean terrain photon height above WGS84 Ellipsoid within segment |
meters |
Terrain |
canopy_h_metrics |
Canopy height metrics at 10–95th percentiles (18 values per segment) |
meters |
Canopy |
snr |
Signal-to-noise ratio of geolocated photons |
adimensional |
Land Segment |
night_flag |
Day/night flag derived from solar elevation (0=day, 1=night) |
adimensional |
Land Segment |
layer_flag |
Consolidated cloud/blowing snow flag (0=absent, 1=likely present) |
adimensional |
Land Segment |
segment_snowcover |
Daily snow/ice cover flag (0=ice-free water; 1=snow-free; 2=snow; 3=ice) |
adimensional |
Land Segment |
For the complete list of available variables, see TileDB Global Database for ICESat-2 ATL08 Data or call provider.get_available_variables().
Retrieving ICESat-2 data with the ICESat-2 provider#
The icesat2db.IceSat2Provider class is your main tool for querying ICESat-2 data from the tileDB database. The following example demonstrates how to configure and use the provider to retrieve data with options to include additional quality filters for customized data refinement.
Basic query example#
import geopandas as gpd
import icesat2db as isdb
# Load region of interest
region_of_interest = gpd.read_file('./data/geojson/region.geojson')
# Instantiate the IceSat2Provider
provider = isdb.IceSat2Provider(storage_type='local',
local_path="/path/to/your/database")
# Define the variables to query
variables = ["h_canopy", "h_te_best_fit"]
dataset = provider.get_data(variables=variables,
geometry=region_of_interest,
start_time="2019-01-01",
end_time="2024-12-31",
return_type='xarray')
Parameters for get_data()#
variables: List of variables (columns) to retrieve from the database. Profile and sub-segment variables (e.g.
canopy_h_metrics,h_canopy_20m) return all values per segment by default. To fetch a single element by label and save bandwidth, use the"variable:label"syntax, e.g."canopy_h_metrics:50"(50th-percentile only) or"h_canopy_20m:50"(centre 20 m bin only).geometry: (Optional) GeoPandas geometry for spatial filtering.
start_time: (Optional) Start date for temporal filtering (format: “YYYY-MM-DD”).
end_time: (Optional) End date for temporal filtering (format: “YYYY-MM-DD”).
return_type: Specifies the format of the returned data, either
xarray.Dataset(“xarray”). orpandas.DataFrame(“dataframe”) - The default is “xarray”.query_type: (Optional) Type of query to execute, either “nearest” or “bounding_box”, in case of nearest, a point has to be provided as well (default: “bounding_box”).
point: (Optional) Reference point for nearest query, required if query_type is “nearest” (format: Tuple[longitude, latitude]).
num_shots: (Optional) Number of shots to retrieve if the query_type is “nearest” (default: 10).
radius: (Optional) Radius in degrees around the point if the query_type is “nearest” (default: 0.1).
quality_filters: (Optional) Additional quality filters to apply to the query.
The returned data is formatted according to the return_type parameter, making it ready for further analysis.
Applying additional quality filters#
You can further refine the data retrieval by specifying additional quality filters. This customization allows filtering based on specific conditions for selected variables. The filters are added as keyword arguments in the form of field-value conditions.
Example with additional quality filters#
In the following example, we filter for night-time acquisitions with no cloud/blowing snow contamination:
import geopandas as gpd
import icesat2db as isdb
# Instantiate the IceSat2Provider
provider = isdb.IceSat2Provider(storage_type='local',
local_path="/path/to/your/database")
# Load region of interest
region_of_interest = gpd.read_file('./data/geojson/region.geojson')
# Define variables and quality filters
variables = ["h_canopy", "h_te_best_fit", "snr"]
quality_filters = {
'night_flag': "== 1",
'layer_flag': "== 0",
}
icesat2_data = provider.get_data(variables=variables,
geometry=region_of_interest,
start_time="2019-01-01",
end_time="2024-12-31",
return_type='xarray',
**quality_filters)
Quality filters are passed as key-value pairs where the key is the variable name and the value is the condition string (e.g., 'night_flag': "== 1"). This adds flexibility to refine the query based on specific criteria, improving the relevance of the retrieved data.
Supported output formats#
The icesat2db.IceSat2Provider supports the following output formats, allowing you to choose the structure that best suits your analysis:
xarray.Dataset: Ideal for multi-dimensional data that includes labeled dimensions, suitable for advanced numerical and geospatial analysis.pandas.DataFrame: Perfect for tabular data and smaller datasets, allowing for quick manipulation and export to CSV or other formats.
Below is an example of how the dataset looks in the xarray.Dataset format:
<xarray.Dataset>
Dimensions: (segment_id: 284305, percentile: 18)
Coordinates:
* segment_id (segment_id) int64 2MB 131271604800 ... 131271952640
* percentile (percentile) int32 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95
latitude (segment_id) float32 1MB 51.23 51.24 ... 47.89 47.90
longitude (segment_id) float32 1MB 10.45 10.45 ... 14.12 14.13
time (segment_id) datetime64[ns] 2MB 2021-06-15 ... 2021-06-15
Data variables:
h_canopy (segment_id) float32 1MB 18.4 22.1 ... 5.3 7.8
h_te_best_fit (segment_id) float32 1MB 312.1 315.6 ... 198.4 201.2
canopy_h_metrics (segment_id, percentile) float32 20MB 4.2 ... 17.9
The dataset includes multiple dimensions and variables:
Dimensions:
segment_id(unique ID for each 100 m land segment) andpercentile(coordinate axis for profile variables such ascanopy_h_metrics, with values 10–95). Sub-segment variables usealong_track_offset_m(values 10, 30, 50, 70, 90 m).Coordinates:
time,latitude, andlongitudedescribing each segment’s spatial and temporal context.Data Variables: Variables such as
h_canopy(98th percentile canopy height above terrain) andh_te_best_fit(best-fit terrain elevation).
Below is an example of how the dataset looks in the pandas.DataFrame format:
latitude longitude time h_canopy h_te_best_fit
0 51.231842 10.453218 2021-06-15 18.40 312.10
1 51.240115 10.453501 2021-06-15 22.10 315.60
2 51.248388 10.453784 2021-06-15 9.80 318.30
3 51.256661 10.454067 2021-06-15 15.60 320.90
4 51.264934 10.454350 2021-06-15 7.20 323.40
... ... ... ... ... ...
284300 47.898234 14.121456 2021-06-15 3.10 195.20
284301 47.890011 14.121739 2021-06-15 5.30 198.40
284302 47.881788 14.122022 2021-06-15 7.80 201.20
[284305 rows x 5 columns]
—