Title: | NASA 'EarthData' Access Utilities |
---|---|
Description: | Providing easy, portable access to NASA 'EarthData' products through the use of bearer tokens. Much of NASA's public data catalogs hosted and maintained by its 12 Distributed Active Archive Centers ('DAACs') are now made available on the Amazon Web Services 'S3' storage. However, accessing this data through the standard 'S3' API is restricted to only to compute resources running inside 'us-west-2' Data Center in Portland, Oregon, which allows NASA to avoid being charged data egress rates. This package provides public access to the data from any networked device by using the 'EarthData' login application programming interface (API), <https://www.earthdata.nasa.gov/eosdis/science-system-description/eosdis-components/earthdata-login>, providing convenient authentication and access to cloud-hosted NASA 'EarthData' products. This makes access to a wide range of earth observation data from any location straight forward and compatible with R packages that are widely used with cloud native earth observation data (such as 'terra', 'sf', etc.) |
Authors: | Carl Boettiger [aut, cre, cph] , Luis López [aut] , Yuvi Panda [aut], Bri Lind [aut] , Andy Teucher [ctb] , Openscapes [fnd] |
Maintainer: | Carl Boettiger <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.0.2.99 |
Built: | 2024-11-03 04:36:26 UTC |
Source: | https://github.com/boettiger-lab/earthdatalogin |
rstac::collections()
queryBy default, NASA STAC catalogue collections queries only return 10 collections at a time. This function will page through the collections and return them all.
collections_fetch(collections, ...)
collections_fetch(collections, ...)
collections |
an object of class |
... |
Optional arguments passed on to |
A doc_collections
object with all of the collections from the catalogue
specified in the rstac::collections()
query.
rstac::stac("https://cmr.earthdata.nasa.gov/stac/LPCLOUD") |> rstac::collections() |> rstac::get_request() |> collections_fetch()
rstac::stac("https://cmr.earthdata.nasa.gov/stac/LPCLOUD") |> rstac::collections() |> rstac::get_request() |> collections_fetch()
Replace https URLs with S3 URIs
edl_as_s3(href, prefix = "s3://")
edl_as_s3(href, prefix = "s3://")
href |
a https URL from an EarthData Cloud address |
prefix |
the preferred s3 prefix, e.g. |
a URI that strips basename and protocol and appends prefix
href <- lpdacc_example_url() edl_as_s3(href)
href <- lpdacc_example_url() edl_as_s3(href)
NOTE: This should be used primarily as a fallback mechanism! EarthData Cloud resources are often best accessed directly over HTTPS without download. This allows subsets to be extracted instead of downloading unnecessary bits. Unfortunately, certain formats do not support such HTTP-based range requests (e.g. HDF4), and require the asset is downloaded to a local POSIX filesystem first.
edl_download( href, dest = basename(href), auth = "netrc", method = "httr", username = default("user"), password = default("password"), netrc_path = edl_netrc_path(), cookie_path = edl_cookie_path(), quiet = TRUE, ... )
edl_download( href, dest = basename(href), auth = "netrc", method = "httr", username = default("user"), password = default("password"), netrc_path = edl_netrc_path(), cookie_path = edl_cookie_path(), quiet = TRUE, ... )
href |
the https URL of the asset |
dest |
local destination |
auth |
the authentication method ("token" for Bearer tokens or "netrc" for netrc.) |
method |
The download method, either "httr" or "curl". |
username |
EarthData Login User |
password |
EarthData Login Password |
netrc_path |
Path to the .netrc file to be created. Defaults to the
appropriate R package configuration location given by |
cookie_path |
Path to the file where cookies will be stored. Defaults
to the appropriate R package configuration location given by
|
quiet |
logical default TRUE. Show progress in download? |
... |
additional arguments to |
the dest
path, invisibly
href <- lpdacc_example_url() edl_download(href)
href <- lpdacc_example_url() edl_download(href)
NOTE this function uses heuristic rules to extract data from edl_search(). Users are strongly encouraged to rely on STAC searches instead.
edl_extract_urls(items)
edl_extract_urls(items)
items |
the content object from edl_search |
a character vector of URLs
items <- edl_search(short_name = "MUR-JPL-L4-GLOB-v4.1", temporal = c("2020-01-01", "2021-12-31"), parse_urls = FALSE) urls <- edl_extract_urls(items)
items <- edl_search(short_name = "MUR-JPL-L4-GLOB-v4.1", temporal = c("2020-01-01", "2021-12-31"), parse_urls = FALSE) urls <- edl_extract_urls(items)
This function creates a .netrc file with Earthdata Login (EDL) credentials (username and password) and sets the necessary environment variables for GDAL to use the .netrc file.
edl_netrc( username = default("user"), password = default("password"), netrc_path = edl_netrc_path(), cookie_path = edl_cookie_path(), cloud_config = TRUE )
edl_netrc( username = default("user"), password = default("password"), netrc_path = edl_netrc_path(), cookie_path = edl_cookie_path(), cloud_config = TRUE )
username |
EarthData Login User |
password |
EarthData Login Password |
netrc_path |
Path to the .netrc file to be created. Defaults to the
appropriate R package configuration location given by |
cookie_path |
Path to the file where cookies will be stored. Defaults
to the appropriate R package configuration location given by
|
cloud_config |
set |
The function sets the environment variables GDAL_HTTP_NETRC
and GDAL_HTTP_NETRC_FILE
to enable GDAL to use the .netrc file for
EDL authentication. GDAL_HTTP_COOKIEFILE and GDAL_HTTP_COOKIEJAR are also
set to allow the authentication to store and read access cookies.
Additionally, it manages the creation of a symbolic link to the .netrc file if GDAL version is less than 3.7.0 (and thus does not support GDAL_HTTP_NETRC_FILE location).
TRUE invisibly if successful
edl_netrc() url <- lpdacc_example_url() terra::rast(url, vsi=TRUE)
edl_netrc() url <- lpdacc_example_url() terra::rast(url, vsi=TRUE)
Users can only have at most 2 active tokens at any time. You don't need to keep track of a token since earthdatalogin can retrieve your tokens with your user name and password. However, should you want to revoke a token, you can do so with this function.
edl_revoke_token( username = default("user"), password = default("password"), token_number = 1 )
edl_revoke_token( username = default("user"), password = default("password"), token_number = 1 )
username |
EarthData Login User |
password |
EarthData Login Password |
token_number |
Which token (1 or 2) |
API response (invisibly)
edl_revoke_token()
edl_revoke_token()
Note that these S3 credentials will only work:
edl_s3_token( daac = "https://data.lpdaac.earthdatacloud.nasa.gov", username = default("user"), password = default("password"), prompt_for_netrc = interactive() )
edl_s3_token( daac = "https://data.lpdaac.earthdatacloud.nasa.gov", username = default("user"), password = default("password"), prompt_for_netrc = interactive() )
daac |
the base URL for the DAAC |
username |
EarthDataLogin user |
password |
EarthDataLogin Password |
prompt_for_netrc |
Often netrc is preferable, so this function will by default prompt the user to switch. Set to FALSE to silence this. |
On AWS instance in the us-west-2
region
Only for one hour before they expire
Only on the DAAC requested
Please consider using edl_netrc()
to avoid these limitations
list of access key, secret key, session token and expiration, invisibly. Also sets the corresponding AWS environmental variables.
edl_s3_token()
edl_s3_token()
NOTE: Use as a fallback method only! Users are strongly encouraged
to rely on the STAC endpoints for NASA EarthData, as shown in the
package vignettes. STAC is a widely used metadata standard by both
NASA and many other providers, and can be searched using the feature-rich
rstac
package. STAC return items can be more easily parsed as well.
edl_search( short_name = NULL, version = NULL, doi = NULL, daac = NULL, provider = NULL, temporal = NULL, bounding_box = NULL, page_size = 2000, recurse = TRUE, parse_results = TRUE, username = default("user"), password = default("password"), netrc_path = edl_netrc_path(), cookie_path = edl_cookie_path(), ... )
edl_search( short_name = NULL, version = NULL, doi = NULL, daac = NULL, provider = NULL, temporal = NULL, bounding_box = NULL, page_size = 2000, recurse = TRUE, parse_results = TRUE, username = default("user"), password = default("password"), netrc_path = edl_netrc_path(), cookie_path = edl_cookie_path(), ... )
short_name |
dataset short name e.g. ATL08 |
version |
dataset version |
doi |
DOI for a dataset |
daac |
NSIDC or PODAAC |
provider |
particular to each DAAC, e.g. POCLOUD, LPDAAC etc. |
temporal |
c("yyyy-mm-dd", "yyyy-mm-dd") |
bounding_box |
c(lower_left_lon, lower_left_lat, upper_right_lon, upper_right_lat) |
page_size |
maximum number of results to return per query. |
recurse |
If a query returns more than page_size results, should we make recursive calls to return all results? |
parse_results |
logical, default TRUE. Calls |
username |
EarthData Login User |
password |
EarthData Login Password |
netrc_path |
Path to the .netrc file to be created. Defaults to the
appropriate R package configuration location given by |
cookie_path |
Path to the file where cookies will be stored. Defaults
to the appropriate R package configuration location given by
|
... |
additional query parameters |
A character vector of data URLs matching the search criteria,
if parse_results = TRUE
(default). Otherwise, returns a response object
of the returned search information if parse_results = FALSE
.
items <- edl_search(short_name = "MUR-JPL-L4-GLOB-v4.1", temporal = c("2002-01-01", "2021-12-31"), recurse = TRUE, parse_urls = TRUE) urls <- edl_extract_urls(items)
items <- edl_search(short_name = "MUR-JPL-L4-GLOB-v4.1", temporal = c("2002-01-01", "2021-12-31"), recurse = TRUE, parse_urls = TRUE) urls <- edl_extract_urls(items)
This function will ping the EarthData API for any available tokens. If a token is not found, it will request one. You may only have two active tokens at any given time. Use edl_revoke_token to remove unwanted tokens. By default, the function will also set an environmental variable for the active R session to store the token. This allows popular R packages which use gdal to immediately authenticate any http addresses to NASA EarthData assets.
edl_set_token( username = default("user"), password = default("password"), token_number = 1, set_env_var = TRUE, format = c("token", "header", "file"), prompt_for_netrc = interactive() )
edl_set_token( username = default("user"), password = default("password"), token_number = 1, set_env_var = TRUE, format = c("token", "header", "file"), prompt_for_netrc = interactive() )
username |
EarthData Login User |
password |
EarthData Login Password |
token_number |
Which token (1 or 2) |
set_env_var |
Should we set the GDAL_HTTP_HEADER_FILE environmental variable? logical, default TRUE. |
format |
One of "token", "header" or "file." "header" adds the prefix used by http headers to the return string. "file" returns |
prompt_for_netrc |
Often netrc is preferable, so this function will by default prompt the user to switch. Set to FALSE to silence this. |
IMPORTANT: it is necessary to unset this token using edl_unset_token()
before trying to access HTTP resources that are not part of EarthData,
as setting this token will cause those calls to fail! OR simply use
edl_netrc()
to authenticate without facing this issue.
NOTE: Because GDAL >= 3.6.1 is required to recognize the GDAL_HTTP_HEADERS, but all versions recognize GDAL_HTTP_HEADER_FILE. So we set the Bearer token in a temporary file and provide this path as GDAL_HTTP_HEADER_FILE to improve compatibility with older versions.
A text string containing only the token (format=token),
or a token with the header prefix included, Authorization: Bearer <token>
edl_set_token() edl_unset_token()
edl_set_token() edl_unset_token()
Helper function for extracting URLs from STAC
edl_stac_urls(items, assets = "data")
edl_stac_urls(items, assets = "data")
items |
an items list from rstac |
assets |
name(s) of assets to extract |
a vector of hrefs for all discovered assets.
Unsets environmental variables set by edl_netrc() and removes
configuration files set by edl_netrc()
.
edl_unset_netrc( netrc_path = edl_netrc_path(), cookie_path = edl_cookie_path(), cloud_config = TRUE )
edl_unset_netrc( netrc_path = edl_netrc_path(), cookie_path = edl_cookie_path(), cloud_config = TRUE )
netrc_path |
Path to the .netrc file to be created. Defaults to the
appropriate R package configuration location given by |
cookie_path |
Path to the file where cookies will be stored. Defaults
to the appropriate R package configuration location given by
|
cloud_config |
set |
Note that this function should rarely be necessary, as unlike bearer token-based auth, netrc is mapped by domain name and will not interfere with access to non-earthdata-based URLs. It may still be necessary to deactivate in order to use one of the other earthdatalogin authentication methods.
To unset environmental variables without removing files, set that file
path argument to ""
(see examples)
Note that GDAL_HTTP_NETRC defaults to YES.
invisible TRUE, if successful (even if no env is set.)
edl_unset_netrc() # unset environmental variables only edl_unset_netrc("", "")
edl_unset_netrc() # unset environmental variables only edl_unset_netrc("", "")
The function uses Sys.unsetenv()
to remove the specified environment variables.
edl_unset_s3()
edl_unset_s3()
This function unsets the AWS S3-related environment variables:
AWS_ACCESS_KEY_ID
, AWS_SECRET_ACCESS_KEY
, and AWS_SESSION_TOKEN
.
edl_unset_s3()
edl_unset_s3()
External sources that don't need the token may error if token is set.
Call edl_unset_token
before accessing non-EarthData URLs.
edl_unset_token()
edl_unset_token()
unsets environmental variables token (no return object)
edl_unset_token()
edl_unset_token()
Sets GDAL environmental variables to recommended optimum settings for cloud-based access.
gdal_cloud_config()
gdal_cloud_config()
Based on https://gdalcubes.github.io/source/concepts/config.html#recommended-settings-for-cloud-access
sets recommended environmental variables and returns invisible TRUE if successful.
gdal_cloud_config() # remove settings: gdal_cloud_unconfig()
gdal_cloud_config() # remove settings: gdal_cloud_unconfig()
Unsets GDAL environmental variables set by gdal_cloud_config()
gdal_cloud_unconfig()
gdal_cloud_unconfig()
invisible TRUE if successful.
gdal_cloud_config() # remove settings: gdal_cloud_unconfig()
gdal_cloud_config() # remove settings: gdal_cloud_unconfig()
URL for an example of an LP DAAC COG file
lpdacc_example_url()
lpdacc_example_url()
The URL to a Cloud-Optimized Geotiff file from the LP DAAC.
lpdacc_example_url()
lpdacc_example_url()
expose any GDAL_*
or VSI_*
environmental variables to
gdalcubes, which calls GDAL in an isolated environment
and does not respect the global environmental variables.
with_gdalcubes(env = Sys.getenv())
with_gdalcubes(env = Sys.getenv())
env |
a named vector of set environmental variables. Default is usually best, which will configure all relevant global environmental variables for gdalcubes. |
NULL, invisibly.
with_gdalcubes()
with_gdalcubes()