// EARTH DATA ANALYTICS SERVICE (EDAS)
NOTICE: The NCCS has decided to disable our public facing analytics service, EDAS, while we revisit our vision for analytics support. If you are interested in information on future work, please contact us at support@nccs.nasa.gov. Thank you.
BIG DATA ANALYTICS FRAMEWORK
As the availability and volume of Earth data grows, researchers spend more time downloading and processing their data than doing science. The NCCS has developed the Earth Data Analytics Service (EDAS), a high-performance big data analytics framework built on Dask/xarray, to allow researchers to leverage our compute power to analyze large datasets located at the NCCS through a web-based interface, thereby eliminating the need to download the data.
EDAS provides access to a suite of “canonical operations”—min, max, sum, difference, average, root mean square, anomaly, and standard deviation— that researchers can combine to develop various workflows. EDAS uses a dynamic caching architecture, a custom framework, and a streaming parallel in-memory workflow for efficiently processing huge datasets within limited memory spaces at interactive response times. These operations and datasets can be accessed via a Web Processing Service (WPS) API using applications written by the user.
EDAS allows users to compute close to the data. Performance tests of commonly used workflows produced results 15 to 50 times faster than standard tools in our environment. EDAS is a local NCCS implementation of the Earth System Grid Federation's (ESGF) Compute Working Team (CWT) project to expose ESGF distributed compute resources via an API and a set of analytical operations.
// AVAILABLE OPERATIONS
The NCCS has created a set of operations available through EDAS:
Operation Type | Description | EDAS Kernel Name or Workflow |
---|---|---|
Min | Computes the minimum of the array elements along the given axes | xarray.min |
Max | Computes the maximum of the array elements along the given axes | xarray.max |
Sum | Computes the sum of the array elements along the given axes | xarray.sum |
Difference | Computes the point-by-point differences of pairs of arrays | xarray.eDiff |
Average | Computes the area-weighted average of the array elements along the given axes | xarray.ave |
Mean | Computes the unweighted average of the array elements along the given axes | xarray.mean |
Variance | Computes the variance of the array elements along the given axes | xarray.var |
Median | Computes the median of the array elements along the given axes | xarray.med |
Normalization | Normalizes input arrays by centering (computing anomaly) and then dividing by the standard deviation along the given axes | xarray.norm |
Anomaly | Centers the input arrays by subtracting off the mean along the given axes | xarray.anomaly |
Standard Deviation | Computes the standard deviation of the array elements along the given axes | xarray.std |
Decycle | Removes the seasonal cycle from the temporal dynamics | xarray.decycle |
Lowpass | Smooths the input arrays by applying a 1D convolution (lowpass) filter along the given axes | xarray.lowpass |
Detrend | Detrends input arrays by subtracting the result of applying a 1D convolution (lowpass) filter along the given axes | xarray.detrend |
Teleconnection | Produces teleconnection map by computing covariances at each point (in roi) with location specified by 'lat' and 'lon' parameters | xarray.telemap |
EoF | Computes PCs and EOFs along the time axis | xarray.eof |
Filter | Filters input arrays, currently only supports subsetting by month(s) | xarray.filter |
Cache | Cache kernel used to cache input rois for low latency access by subsequest requests | xarray.cache |
Subset | NoOp kernel used to return(subsetted) inputs | xarray.subset |
NoOp | NoOp kernel used to output intermediate products in workflow | xarray.noop |
Run this WPS GetCapabilities call to get a dynamic list of operations:
https://edas.nccs.nasa.gov/wps/cwt?request=GetCapabilities
// USING EDAS
Everything you need to know about how to work with ADAPT in one place.
- Getting started
- Best practices
- Datasets
- Instructionals
// EARTH DATA ANALYTICS SERVICE (EDAS)
As the EDAS is a local NCCS implementation of the Earth System Grid Federation's (ESGF) Compute Working Team (CWT) project to expose ESGF distributed compute resources via an API and a set of analytical operations.
// EDAS INSTRUCTIONAL
Table of Contents
How do I install the EDAS API?How do I use the EDAS API?
EDAS Collections
EDAS Operations
Plotting Results
Downloading Results
Example Code
When can I use it?
Errors and Messages
How do I install the EDAS API?
The EDAS client Esgf-compute-api, is installed via conda.
1) If you don't already have Anaconda installed on your local system, start by installing the correct version for your environment.
2) Run Conda to install the EDAS client and the necessary dependencies:
Detailed instructions are available here.
conda create -n edas -c cdat/label/v81 -c conda-forge -c cdat python=2.7 cdat pyzmq psutil lxml
This should update your path so that you can access the commands but some shells may need to be updated manually, for example:
export PATH=${HOME}/anaconda2/bin:${PATH} # for [ba]sh
setenv PATH ${HOME}/anaconda2/bin:${PATH} # for [t]csh
The conda environment is based on the one used for LLNL's UV-CDAT. While these directions should be sufficient for setting up conda in your environment, additional guidance is available on their site.
3) Initialize shell environment for EDAS and add the branch of the CWT API package that contains the modifications for the NCCS version of the API: source activate edas
git clone https://github.com/ESGF/esgf-compute-api.git
cd esgf-compute-api
git checkout updates_for_EDAS
git pull
python setup.py install
How do I use the EDAS API?
Start a Jupyter Notebook to access the API (Jupyter Notebook software is installed in step 1 of "How do I install EDAS API?"),
From within the EDAS conda environment, run:
source activate edas
and then:
jupyter notebook
OR execute the EDAS commands from any python script.
The EDAS API calls can be made to this address: https://edas.nccs.nasa.gov/wps/cwt
Essentially EDAS API accepts three parameters; variable, domain, operation. The client software is documented in a number of places.
The general ESGF CWT API is described in two documents:
API Description (describes Inputs (variable, domain, operation) and Outputs)
ESGF WPS EXTENSION API Summary (Original Definition Document)
Module documentation: Detailed documentation will be available Nov 20, 2017.
EDAS Collections
The NCCS has currently made a subset of their earth data holdings available through EDAS. This list will grow substantially as the service matures and resources become available.
Run this WPS GetCapabilities call to get a dynamic list of collections: https://edas.nccs.nasa.gov/wps/cwt?request=GetCapabilities&identifier=coll
V1.0 Collections:
Collection Name | Description |
---|---|
cip_merra_mth | CREATE-IP NASA GMAO MERRA Monthly |
cip_merra2_mth | CREATE-IP NASA GMAO MERRA2 Monthly |
cip_eraint_mth | CREATE-IP ECMWF ERA-Interim Monthly |
cip_cfsr_mth | CREATE-IP NOAA NCEP CFSR Monthly |
cip_jra25_mth | CREATE-IP JMA JRA-25 Monthly |
cip_jra55_mth | CREATE-IP JMA JRA-55 Monthly |
cip_20crv2c_mth | CREATE-IP NOAA ESRL 20CRv2c Monthly |
cip_jra55_6hr | CREATE-IP JMA JRA-55 6-hourly |
cip_merra_6hr | CREATE-IP NASA GMAO MERRA 6-hourly |
cip_merra2_6hr | CREATE-IP NASA GMAO MERRA2 6-hourly |
cip_eraint_6hr | CREATE-IP ECMWF ERA-Interim 6-hourly |
cip_cfsr_6hr | CREATE-IP NOAA-NCEP CFSR 6-hourly |
cip_jra55_6hr | CREATE-IP JMA JRA-55 6-hourly |
giss_E2-H_hist_r1i1p1 | GISS CMIP5 Historical Climate Projection |
giss_E2-R_rcp26_r1i1p1 | GISS CMIP5 Climate Projection RCP2.6 |
giss_E2-R_rcp45_r1i1p1 | GISS CMIP5 Climate Projection RCP4.5 |
giss_E2-R_rcp60_r1i1p1 | GISS CMIP5 Climate Projection RCP6.0 |
giss_E2-R_rcp85_r1i1p1 | GISS CMIP5 Climate Projection RCP8.5 |
iap-ua_era40_tas1hr | IAP-UA Reprocessed ERA-40 1-hr Surface Temperature |
iap-ua_eraint_tas1hr | IAP-UA Reprocessed ERA-Interim 1-hr Surface Temperature |
iap-ua_merra_tas1hr | IAP-UA Reprocessed MERRA 1-hr Surface Temperature |
iap-ua_nra_tas1hr | IAP-UA Reprocessed NRA 1-hr Surface Temperature |
merra2_inst1_2d_asm_Nx | MERRA2 2D Hourly Single-Level Diagnostics - M2I1NXASM |
merra2_inst1_2d_int_Nx | MERRA2 2D Hourly Vertically Integrated Diagnostics - M2I1NXINT |
nex-dcp30-hist | NEX-DCP30 Downscaled Climate Projections Historical |
nex-dcp30-rcp26 | NEX-DCP30 Downscaled Climate Projections RCP2.6 |
nex-dcp30-rcp45 | NEX-DCP30 Downscaled Climate Projections RCP4.5 |
nex-dcp30-rcp60 | NEX-DCP30 Downscaled Climate Projections RCP6.0 |
nex-dcp30-rcp85 | NEX-DCP30 Downscaled Climate Projections RCP8.5 |
nex-gddp-hist | NEX-GDDP Global Daily Downscaled Climate Projections Historical |
nex-gddp-rcp45 | NEX-GDDP Global Daily Downscaled Climate Projections RCP4.5 |
nex-gddp-rcp85 | NEX-GDDP Global Daily Downscaled Climate Projections RCP8.5 |
C-GLORSv5 | CREATE-IP CMCC C-GLORSv5 Ocean Reanalysis |
ECDAv31 | CREATE-IP GFDL ECDAv31 Ocean Reanalysis |
GODAS | CREATE-IP NCEP GODAS Ocean Reanalysis |
MOVE-G2i | CREATE-IP MRI MOVE-G2i Ocean Reanalysis |
ORAP5 | CREATE-IP ECMWF ORAP5 Ocean Reanalysis |
ORAS4 | CREATE-IP ECMWF ORAS4 Ocean Reanalysis |
CLIVAR-GSOP | CREATE-IP CLIVAR-GSOP Ocean Reanalysis Ensemble |
Additional information on the above datasets is available:
CREATE-IP
CREATE-IP Datasets
EDAS Operations
The NCCS has made the following initial set of operations available through EDAS. This list will grow substantially as the service matures and resources become available.
Run this WPS GetCapabilities call to get a dynamic list of operations: https://edas.nccs.nasa.gov/wps/cwt?request=GetCapabilities
Operation Type | Description | EDAS Kernel Name or Workflow |
---|---|---|
Min | Computes the minimum of the array elements along the given axes | xarray.min |
Max | Computes the maximum of the array elements along the given axes | xarray.max |
Sum | Computes the sum of the array elements along the given axes | xarray.sum |
Difference | Computes the point-by-point differences of pairs of arrays | xarray.eDiff |
Average | Computes the area-weighted average of the array elements along the given axes | xarray.ave |
Mean | Computes the unweighted average of the array elements along the given axes | xarray.mean |
Variance | Computes the variance of the array elements along the given axes | xarray.var |
Median | Computes the median of the array elements along the given axes | xarray.med |
Normalization | Normalizes input arrays by centering (computing anomaly) and then dividing by the standard deviation along the given axes | xarray.norm |
Anomaly | Centers the input arrays by subtracting off the mean along the given axes | xarray.anomaly |
Standard Deviation | Computes the standard deviation of the array elements along the given axes | xarray.std |
Decycle | Removes the seasonal cycle from the temporal dynamics | xarray.decycle |
Lowpass | Smooths the input arrays by applying a 1D convolution (lowpass) filter along the given axes | xarray.lowpass |
Detrend | Detrends input arrays by subtracting the result of applying a 1D convolution (lowpass) filter along the given axes | xarray.detrend |
Teleconnection | Produces teleconnection map by computing covariances at each point (in roi) with location specified by 'lat' and 'lon' parameters | xarray.telemap |
EoF | Computes PCs and EOFs along the time axis | xarray.eof |
Filter | Filters input arrays, currently only supports subsetting by month(s) | xarray.filter |
Cache | Cache kernel used to cache input rois for low latency access by subsequest requests | xarray.cache |
Subset | NoOp kernel used to return(subsetted) inputs | xarray.subset |
NoOp | NoOp kernel used to output intermediate products in workflow | xarray.noop |
Workflows are simply a combination of kernels that are run in succession. See the Example Code section below.
Plotting Results
The EDAS client provides some rudimentary plotting and printing routines:
mpl_timeplot: Plot a time series.
mpl_spaceplot: Plot a lat/long image.
print_Mdata: Print the resultant metadata.
Downloading Results
The output files are downloaded to the client's system and placed in /tmp.
You will get an output message that gives you the filename that has been downloaded and the location on the same file on our THREDDS server. Additional OPeNDAP functionality may be available in future releases. The message will contain the word "HREFS".
Example Code
Hints for easier usage:
Time ranges can be specified either by indices wherein each increment represents a timestep: 'time': {'start':0, 'end':100, 'crs':'indices’} or by dates: time":{"start":"1980-01-01T00:00:00","end":"1980-12-31T23:00:00”,"crs":"timestamps"}
Start and end times must go from earlier to later.
Lat and long values must go from smaller to larger.
Compute Average of Yearly US Cloud Cover for Solar Panel Installation
Compute Max to Find High Precipitation Events
Compute Sum of Precipitation Events
Compute Maximum Temperature to find Heat Waves
Compute Maximum Temperature to Investigate 2003 European Heat Wave
Compute Minimum Temprature in Rocky Mountain National Park
When can I use it?
Version 1.0 of the EDAS API is now available.
Errors and Messages
EDAS is currently somewhat verbose. Jupyter Notebooks have been known to repeat EDAS messages if cells aren't cleared and the kernel restarted, making the verbosity more noticeable.
Important Informational Messages: [2017-10-26 18:28:43,031][wps.py[execute:476]] HREFS: - This message will provide you with the location of the output file on the NCCS THREDDS server.
[2017-10-12 13:41:36,866][wps.py[download_result:416]] STATUS: QUEUED - The EDAS server is busy running a previously submitted command. Your command has been queued and will run shortly. If you command is queued for an excessively long period of time, 10 minutes or more, please contact the NCCS at support@nccs.nasa.gov Please include "EDAS" in the subject of your email.
[2017-10-12 13:41:36,866][wps.py[download_result:416]] STATUS: EXECUTING - The EDAS server is executing your command.
[2017-10-12 13:41:37,880][wps.py[download_result:416]] STATUS: COMPLETED - Your job has completed and your results are available.
Syntax errors are provided as needed.
// SUPPORT
E-Mail us at support@nccs.nasa.gov with subject line: EDAS.