Tutorial

Intake-stac simply provides a thin interface that combines pystac and intake. It’s basic usage is shown below:

To begin, import intake:

In [1]: import intake

Loading a catalog

You can load data from a STAC Catalog by providing the URL to valid STAC Catalog (>1.0):

In [2]: url = 'https://raw.githubusercontent.com/radiantearth/stac-spec/v1.0.0/examples/catalog.json'

In [3]: catalog = intake.open_stac_catalog(url)

In [4]: list(catalog)
Out[4]: ['extensions-collection', 'sentinel-2', 'CS3-20160503_132131_08']

Intake-Stac uses pystac to parse STAC objects. You can also pass pystac objects (e.g. pystac.Catalog) directly to the Intake-stac constructors:

In [5]: import pystac

In [6]: root_url = 'https://raw.githubusercontent.com/relativeorbit/aws-rtc-12SYJ/main'

In [7]: pystac_cat = pystac.read_file(f'{root_url}/catalog.json')

In [8]: cat = intake.open_stac_catalog(pystac_cat)

You can also point to STAC Collections or Items. Each constructor returns a Intake Catalog with the top level corresponding to the STAC object used for initialization:

In [9]: stac_cat = intake.open_stac_catalog(
   ...:     f'{root_url}/catalog.json',
   ...: )
   ...: 

In [10]: collection_cat = intake.open_stac_collection(
   ....:     f'{root_url}/sentinel1-rtc-aws/collection.json',
   ....: )
   ....: 

In [11]: item_cat = intake.open_stac_item(
   ....:     f'{root_url}/sentinel1-rtc-aws/12SYJ/2021/S1A_20210105_12SYJ_DSC/S1A_20210105_12SYJ_DSC.json'
   ....: )
   ....: 

Using the catalog

Once you have a catalog, you can display its entries by iterating through its contents:

In [12]: print(list(stac_cat))
['sentinel1-rtc-aws']

In [13]: cat = stac_cat['sentinel1-rtc-aws']

In [14]: print(list(cat))
['12SYJ']

In [15]: subcat = cat['12SYJ']

In [16]: print(list(subcat))
['2016', '2017', '2018', '2019', '2020', '2021']

In [17]: subsubcat = subcat['2021']

In [18]: print(list(subsubcat)[:3])
['S1A_20210105_12SYJ_DSC', 'S1A_20210110_12SYJ_DSC', 'S1A_20210117_12SYJ_DSC']

When you locate an item of interest, you have access to metadata and methods to load assets into Python objects

In [19]: item = subsubcat['S1A_20210105_12SYJ_DSC']

In [20]: print(type(item))
<class 'intake_stac.catalog.StacItem'>

In [21]: print(item.metadata)
{'providers': [{'name': 'ESA', 'roles': ['licensor', 'producer'], 'url': 'https://sentinel.esa.int/web/sentinel/missions/sentinel-1'}, {'name': 'Indigo Ag Inc.', 'roles': ['licensor', 'processor', 'host'], 'url': 'https://registry.opendata.aws/sentinel-1-rtc-indigo', 'processing:level': 'L3', 'processing:lineage': 'https://sentinel-s1-rtc-indigo-docs.s3-us-west-2.amazonaws.com/methodology.html', 'processing:software': {'S1TBX': '7.0.2'}}], 'constellation': 'sentinel-1', 'platform': 'sentinel-1a', 'instruments': ['c-sar'], 'gsd': 20, 'start_datetime': '2021-01-05T13:09:50Z', 'end_datetime': '2021-01-05T13:10:40Z', 'created': '2021-11-09T19:00:26.415280Z', 'mgrs:utm_zone': '12', 'mgrs:latitude_band': 'S', 'mgrs:grid_square': 'YJ', 'sentinel:mgrs': '12SYJ', 'sentinel:product_ids': ['S1A_IW_GRDH_1SDV_20210105T130950_20210105T131015_036003_0437D1_6979', 'S1A_IW_GRDH_1SDV_20210105T131015_20210105T131040_036003_0437D1_A833'], 'sar:frequency_band': 'C', 'sar:center_frequency': 5.405, 'sar:observation_direction': 'right', 'sar:instrument_mode': 'IW', 'sar:product_type': 'RTC', 'sar:polarizations': ['VV', 'VH'], 'sar:resolution_range': 20.3, 'sar:resolution_azimuth': 22.6, 'sar:pixel_spacing_range': 10, 'sar:pixel_spacing_azimuth': 10, 'sar:looks_equivalent_number': 4.3, 'sar:looks_range': 5, 'sar:looks_azimuth': 1, 'sat:orbit_state': 'descending', 'sat:relative_orbit': 56, 'sat:absolute_orbit': 36003, 'proj:epsg': 32612, 'proj:transform': [20.0, 0.0, 699960.0, 0.0, -20.0, 4400040.0, 0.0, 0.0, 1.0], 'proj:shape': [5490, 5490], 'datetime': datetime.datetime(2021, 1, 5, 13, 10, 15, tzinfo=tzlocal()), 'bbox': [-108.69936, 38.70664, -107.38769, 39.72682], 'geometry': {'type': 'Polygon', 'coordinates': [[[-107.93008, 38.72054], [-107.93004, 38.72162], [-107.87213, 39.18727], [-107.74341, 39.69052], [-107.73879, 39.70444], [-107.38764, 39.69403], [-107.43793, 38.70656], [-107.93008, 38.72054]]]}, 'date': None, 'catalog_dir': ''}

In [22]: assets = list(item)

In [23]: print(assets)
['gamma0_vv', 'gamma0_vh', 'incidence']

Loading a dataset

Once you have identified a dataset, you can load it into a xarray.DataArray using Intake’s to_dask() method. This reads only metadata, and streams values over the network when required by computations or visualizations:

In [24]: da = item['gamma0_vv'].to_dask()

In [25]: display(da)
<xarray.DataArray (band: 1, y: 5490, x: 5490)>
dask.array<open_rasterio-16fd4b50b84bbdf40a6058ffc660d2f0<this-array>, shape=(1, 5490, 5490), dtype=float32, chunksize=(1, 5490, 5490), chunktype=numpy.ndarray>
Coordinates:
  * band     (band) int64 1
  * y        (y) float64 4.4e+06 4.4e+06 4.4e+06 ... 4.29e+06 4.29e+06 4.29e+06
  * x        (x) float64 7e+05 7e+05 7e+05 ... 8.097e+05 8.097e+05 8.098e+05
Attributes:
    transform:              (20.0, 0.0, 699960.0, 0.0, -20.0, 4400040.0)
    crs:                    +init=epsg:32612
    res:                    (20.0, 20.0)
    is_tiled:               1
    nodatavals:             (0.0,)
    scales:                 (1.0,)
    offsets:                (0.0,)
    descriptions:           ('Gamma0_VV',)
    ABSOLUTE_ORBIT_NUMBER:  36003
    AREA_OR_POINT:          Area
    DATE:                   2021-01-05
    MISSION_ID:             S1A
    NUMBER_SCENES:          2
    ORBIT_DIRECTION:        descending
    OVR_RESAMPLING_ALG:     AVERAGE
    SCENES:                 S1A_IW_GRDH_1SDV_20210105T130950_20210105T131015_...
    SCENE_1_METADATA:       {"title": "S1A_IW_GRDH_1SDV_20210105T130950_20210...
    SCENE_1_PRODUCT_INFO:   {"id": "S1A_IW_GRDH_1SDV_20210105T130950_20210105...
    SCENE_2_METADATA:       {"title": "S1A_IW_GRDH_1SDV_20210105T131015_20210...
    SCENE_2_PRODUCT_INFO:   {"id": "S1A_IW_GRDH_1SDV_20210105T131015_20210105...
    TILE_ID:                12SYJ
    VALID_PIXEL_PERCENT:    33.501

Working with pystac-client

Intake-stac integrates with pystac-client to faciliate dynamic search and discovery of assets through a STAC-API. To begin, construct a search query using pystac-client:

In [26]: import pystac_client

In [27]: URL = "https://earth-search.aws.element84.com/v0"

In [28]: catalog = pystac_client.Client.open(URL)

In [29]: results = catalog.search(
   ....:     collections=["sentinel-s2-l2a-cogs"],
   ....:     bbox = [35.48, -3.24, 35.58, -3.14],
   ....:     datetime="2020-07-01/2020-08-15")
   ....: 

In [30]: items = results.get_all_items()

In [31]: print(len(items))
18

In the code section above, items is a pystac.ItemsCollection object. Intake-stac can turn this object into an Intake catalog:

In [32]: catalog = intake.open_stac_item_collection(items)

In [33]: list(catalog)
Out[33]: 
['S2A_36MYB_20200814_0_L2A',
 'S2A_36MYB_20200811_0_L2A',
 'S2B_36MYB_20200809_0_L2A',
 'S2B_36MYB_20200806_0_L2A',
 'S2A_36MYB_20200804_0_L2A',
 'S2A_36MYB_20200801_0_L2A',
 'S2B_36MYB_20200730_0_L2A',
 'S2B_36MYB_20200727_0_L2A',
 'S2A_36MYB_20200725_0_L2A',
 'S2A_36MYB_20200722_0_L2A',
 'S2B_36MYB_20200720_0_L2A',
 'S2B_36MYB_20200717_0_L2A',
 'S2A_36MYB_20200715_0_L2A',
 'S2A_36MYB_20200712_0_L2A',
 'S2B_36MYB_20200710_0_L2A',
 'S2B_36MYB_20200707_0_L2A',
 'S2A_36MYB_20200705_0_L2A',
 'S2A_36MYB_20200702_0_L2A']

Using xarray-assets

Intake-stac uses the xarray-assets STAC extension to automatically use the appropriate keywords to load a STAC asset into a data container.

Intake-stac will automatically use the keywords from the xarray-assets STAC extension, if present, when loading data into a container. For example, the STAC collection at <https://planetarycomputer.microsoft.com/api/stac/v1/collections/daymet-annual-hi> defines an asset zarr-https with the metadata "xarray:open_kwargs": {"consolidated": true}" to indicate that this dataset should be opened with the consolidated=True keyword argument. This will be used automatically by .to_dask()

>>> collection = intake.open_stac_collection(
...     "https://planetarycomputer.microsoft.com/api/stac/v1/collections/daymet-annual-hi"
... )

>>> source = collection.get_asset("zarr-https")
>>> source.to_dask()
<xarray.Dataset>
Dimensions:                  (nv: 2, time: 41, x: 284, y: 584)
Coordinates:
    lat                      (y, x) float32 dask.array<chunksize=(584, 284), meta=np.ndarray>
    lon                      (y, x) float32 dask.array<chunksize=(584, 284), meta=np.ndarray>
  * time                     (time) datetime64[ns] 1980-07-01T12:00:00 ... 20...
  * x                        (x) float32 -5.802e+06 -5.801e+06 ... -5.519e+06
  * y                        (y) float32 -3.9e+04 -4e+04 ... -6.21e+05 -6.22e+05
Dimensions without coordinates: nv
Data variables:
    lambert_conformal_conic  int16 ...
    prcp                     (time, y, x) float32 dask.array<chunksize=(1, 584, 284), meta=np.ndarray>
    swe                      (time, y, x) float32 dask.array<chunksize=(1, 584, 284), meta=np.ndarray>
    time_bnds                (time, nv) datetime64[ns] dask.array<chunksize=(1, 2), meta=np.ndarray>
    tmax                     (time, y, x) float32 dask.array<chunksize=(1, 584, 284), meta=np.ndarray>
    tmin                     (time, y, x) float32 dask.array<chunksize=(1, 584, 284), meta=np.ndarray>
    vp                       (time, y, x) float32 dask.array<chunksize=(1, 584, 284), meta=np.ndarray>
Attributes:
    Conventions:       CF-1.6
    Version_data:      Daymet Data Version 4.0
    Version_software:  Daymet Software Version 4.0
    citation:          Please see http://daymet.ornl.gov/ for current Daymet ...
    references:        Please see http://daymet.ornl.gov/ for current informa...
    source:            Daymet Software Version 4.0
    start_year:        1980