Tutorial
Intake-stac simply provides a thin interface that combines pystac and intake. It’s basic usage is shown below:
To begin, import intake:
In [1]: import intake
Loading a catalog
You can load data from a STAC Catalog by providing the URL to valid STAC Catalog (>1.0):
In [2]: url = 'https://raw.githubusercontent.com/radiantearth/stac-spec/v1.0.0/examples/catalog.json'
In [3]: catalog = intake.open_stac_catalog(url)
In [4]: list(catalog)
Out[4]: ['extensions-collection', 'sentinel-2', 'CS3-20160503_132131_08']
Intake-Stac uses pystac to parse
STAC objects. You can also pass pystac
objects (e.g.
pystac.Catalog
) directly to the Intake-stac constructors:
In [5]: import pystac
In [6]: root_url = 'https://raw.githubusercontent.com/relativeorbit/aws-rtc-12SYJ/main'
In [7]: pystac_cat = pystac.read_file(f'{root_url}/catalog.json')
In [8]: cat = intake.open_stac_catalog(pystac_cat)
You can also point to STAC Collections or Items. Each constructor returns a Intake Catalog with the top level corresponding to the STAC object used for initialization:
In [9]: stac_cat = intake.open_stac_catalog(
...: f'{root_url}/catalog.json',
...: )
...:
In [10]: collection_cat = intake.open_stac_collection(
....: f'{root_url}/sentinel1-rtc-aws/collection.json',
....: )
....:
In [11]: item_cat = intake.open_stac_item(
....: f'{root_url}/sentinel1-rtc-aws/12SYJ/2021/S1A_20210105_12SYJ_DSC/S1A_20210105_12SYJ_DSC.json'
....: )
....:
Using the catalog
Once you have a catalog, you can display its entries by iterating through its contents:
In [12]: print(list(stac_cat))
['sentinel1-rtc-aws']
In [13]: cat = stac_cat['sentinel1-rtc-aws']
In [14]: print(list(cat))
['12SYJ']
In [15]: subcat = cat['12SYJ']
In [16]: print(list(subcat))
['2016', '2017', '2018', '2019', '2020', '2021']
In [17]: subsubcat = subcat['2021']
In [18]: print(list(subsubcat)[:3])
['S1A_20210105_12SYJ_DSC', 'S1A_20210110_12SYJ_DSC', 'S1A_20210117_12SYJ_DSC']
When you locate an item of interest, you have access to metadata and methods to load assets into Python objects
In [19]: item = subsubcat['S1A_20210105_12SYJ_DSC']
In [20]: print(type(item))
<class 'intake_stac.catalog.StacItem'>
In [21]: print(item.metadata)
{'providers': [{'name': 'ESA', 'roles': ['licensor', 'producer'], 'url': 'https://sentinel.esa.int/web/sentinel/missions/sentinel-1'}, {'name': 'Indigo Ag Inc.', 'roles': ['licensor', 'processor', 'host'], 'url': 'https://registry.opendata.aws/sentinel-1-rtc-indigo', 'processing:level': 'L3', 'processing:lineage': 'https://sentinel-s1-rtc-indigo-docs.s3-us-west-2.amazonaws.com/methodology.html', 'processing:software': {'S1TBX': '7.0.2'}}], 'constellation': 'sentinel-1', 'platform': 'sentinel-1a', 'instruments': ['c-sar'], 'gsd': 20, 'start_datetime': '2021-01-05T13:09:50Z', 'end_datetime': '2021-01-05T13:10:40Z', 'created': '2021-11-09T19:00:26.415280Z', 'mgrs:utm_zone': '12', 'mgrs:latitude_band': 'S', 'mgrs:grid_square': 'YJ', 'sentinel:mgrs': '12SYJ', 'sentinel:product_ids': ['S1A_IW_GRDH_1SDV_20210105T130950_20210105T131015_036003_0437D1_6979', 'S1A_IW_GRDH_1SDV_20210105T131015_20210105T131040_036003_0437D1_A833'], 'sar:frequency_band': 'C', 'sar:center_frequency': 5.405, 'sar:observation_direction': 'right', 'sar:instrument_mode': 'IW', 'sar:product_type': 'RTC', 'sar:polarizations': ['VV', 'VH'], 'sar:resolution_range': 20.3, 'sar:resolution_azimuth': 22.6, 'sar:pixel_spacing_range': 10, 'sar:pixel_spacing_azimuth': 10, 'sar:looks_equivalent_number': 4.3, 'sar:looks_range': 5, 'sar:looks_azimuth': 1, 'sat:orbit_state': 'descending', 'sat:relative_orbit': 56, 'sat:absolute_orbit': 36003, 'proj:epsg': 32612, 'proj:transform': [20.0, 0.0, 699960.0, 0.0, -20.0, 4400040.0, 0.0, 0.0, 1.0], 'proj:shape': [5490, 5490], 'datetime': datetime.datetime(2021, 1, 5, 13, 10, 15, tzinfo=tzlocal()), 'bbox': [-108.69936, 38.70664, -107.38769, 39.72682], 'geometry': {'type': 'Polygon', 'coordinates': [[[-107.93008, 38.72054], [-107.93004, 38.72162], [-107.87213, 39.18727], [-107.74341, 39.69052], [-107.73879, 39.70444], [-107.38764, 39.69403], [-107.43793, 38.70656], [-107.93008, 38.72054]]]}, 'date': None, 'catalog_dir': ''}
In [22]: assets = list(item)
In [23]: print(assets)
['gamma0_vv', 'gamma0_vh', 'incidence']
Loading a dataset
Once you have identified a dataset, you can load it into a xarray.DataArray
using Intake’s to_dask() method. This reads only metadata, and streams values over the network when required by computations or visualizations:
In [24]: da = item['gamma0_vv'].to_dask()
In [25]: display(da)
<xarray.DataArray (band: 1, y: 5490, x: 5490)>
dask.array<open_rasterio-16fd4b50b84bbdf40a6058ffc660d2f0<this-array>, shape=(1, 5490, 5490), dtype=float32, chunksize=(1, 5490, 5490), chunktype=numpy.ndarray>
Coordinates:
* band (band) int64 1
* y (y) float64 4.4e+06 4.4e+06 4.4e+06 ... 4.29e+06 4.29e+06 4.29e+06
* x (x) float64 7e+05 7e+05 7e+05 ... 8.097e+05 8.097e+05 8.098e+05
Attributes:
transform: (20.0, 0.0, 699960.0, 0.0, -20.0, 4400040.0)
crs: +init=epsg:32612
res: (20.0, 20.0)
is_tiled: 1
nodatavals: (0.0,)
scales: (1.0,)
offsets: (0.0,)
descriptions: ('Gamma0_VV',)
ABSOLUTE_ORBIT_NUMBER: 36003
AREA_OR_POINT: Area
DATE: 2021-01-05
MISSION_ID: S1A
NUMBER_SCENES: 2
ORBIT_DIRECTION: descending
OVR_RESAMPLING_ALG: AVERAGE
SCENES: S1A_IW_GRDH_1SDV_20210105T130950_20210105T131015_...
SCENE_1_METADATA: {"title": "S1A_IW_GRDH_1SDV_20210105T130950_20210...
SCENE_1_PRODUCT_INFO: {"id": "S1A_IW_GRDH_1SDV_20210105T130950_20210105...
SCENE_2_METADATA: {"title": "S1A_IW_GRDH_1SDV_20210105T131015_20210...
SCENE_2_PRODUCT_INFO: {"id": "S1A_IW_GRDH_1SDV_20210105T131015_20210105...
TILE_ID: 12SYJ
VALID_PIXEL_PERCENT: 33.501
Working with pystac-client
Intake-stac integrates with pystac-client to faciliate dynamic search and discovery of assets through a STAC-API. To begin, construct a search query using pystac-client:
In [26]: import pystac_client
In [27]: URL = "https://earth-search.aws.element84.com/v0"
In [28]: catalog = pystac_client.Client.open(URL)
In [29]: results = catalog.search(
....: collections=["sentinel-s2-l2a-cogs"],
....: bbox = [35.48, -3.24, 35.58, -3.14],
....: datetime="2020-07-01/2020-08-15")
....:
In [30]: items = results.get_all_items()
In [31]: print(len(items))
18
In the code section above, items is a pystac.ItemsCollection object. Intake-stac can turn this object into an Intake catalog:
In [32]: catalog = intake.open_stac_item_collection(items)
In [33]: list(catalog)
Out[33]:
['S2A_36MYB_20200814_0_L2A',
'S2A_36MYB_20200811_0_L2A',
'S2B_36MYB_20200809_0_L2A',
'S2B_36MYB_20200806_0_L2A',
'S2A_36MYB_20200804_0_L2A',
'S2A_36MYB_20200801_0_L2A',
'S2B_36MYB_20200730_0_L2A',
'S2B_36MYB_20200727_0_L2A',
'S2A_36MYB_20200725_0_L2A',
'S2A_36MYB_20200722_0_L2A',
'S2B_36MYB_20200720_0_L2A',
'S2B_36MYB_20200717_0_L2A',
'S2A_36MYB_20200715_0_L2A',
'S2A_36MYB_20200712_0_L2A',
'S2B_36MYB_20200710_0_L2A',
'S2B_36MYB_20200707_0_L2A',
'S2A_36MYB_20200705_0_L2A',
'S2A_36MYB_20200702_0_L2A']
Using xarray-assets
Intake-stac uses the xarray-assets STAC extension to automatically use the appropriate keywords to load a STAC asset into a data container.
Intake-stac will automatically use the keywords from the xarray-assets STAC extension, if present, when loading data into a container.
For example, the STAC collection at <https://planetarycomputer.microsoft.com/api/stac/v1/collections/daymet-annual-hi> defines an
asset zarr-https
with the metadata "xarray:open_kwargs": {"consolidated": true}"
to indicate that this dataset should be
opened with the consolidated=True
keyword argument. This will be used automatically by .to_dask()
>>> collection = intake.open_stac_collection(
... "https://planetarycomputer.microsoft.com/api/stac/v1/collections/daymet-annual-hi"
... )
>>> source = collection.get_asset("zarr-https")
>>> source.to_dask()
<xarray.Dataset>
Dimensions: (nv: 2, time: 41, x: 284, y: 584)
Coordinates:
lat (y, x) float32 dask.array<chunksize=(584, 284), meta=np.ndarray>
lon (y, x) float32 dask.array<chunksize=(584, 284), meta=np.ndarray>
* time (time) datetime64[ns] 1980-07-01T12:00:00 ... 20...
* x (x) float32 -5.802e+06 -5.801e+06 ... -5.519e+06
* y (y) float32 -3.9e+04 -4e+04 ... -6.21e+05 -6.22e+05
Dimensions without coordinates: nv
Data variables:
lambert_conformal_conic int16 ...
prcp (time, y, x) float32 dask.array<chunksize=(1, 584, 284), meta=np.ndarray>
swe (time, y, x) float32 dask.array<chunksize=(1, 584, 284), meta=np.ndarray>
time_bnds (time, nv) datetime64[ns] dask.array<chunksize=(1, 2), meta=np.ndarray>
tmax (time, y, x) float32 dask.array<chunksize=(1, 584, 284), meta=np.ndarray>
tmin (time, y, x) float32 dask.array<chunksize=(1, 584, 284), meta=np.ndarray>
vp (time, y, x) float32 dask.array<chunksize=(1, 584, 284), meta=np.ndarray>
Attributes:
Conventions: CF-1.6
Version_data: Daymet Data Version 4.0
Version_software: Daymet Software Version 4.0
citation: Please see http://daymet.ornl.gov/ for current Daymet ...
references: Please see http://daymet.ornl.gov/ for current informa...
source: Daymet Software Version 4.0
start_year: 1980