// EARTH DATA ANALYTICS SERVICE (EDAS) 

BIG DATA ANALYTICS FRAMEWORK


As the availability and volume of Earth data grow, researchers spend more time downloading and processing their data than doing science. The NCCS has developed the Earth Data Analytics Service (EDAS), a high-performance big data analytics framework built on Dask/xarray, to allow researchers to leverage our compute power to analyze large datasets located at the NCCS through a web-based interface, thereby eliminating the need to download the data.

EDAS provides access to a suite of “canonical operations”—min, max, sum, difference, average, root mean square, anomaly, and standard deviation— that researchers can combine to develop various workflows. EDAS uses a dynamic caching architecture, a custom framework, and a streaming parallel in-memory workflow for efficiently processing huge datasets within limited memory spaces at interactive response times. These operations and datasets can be accessed via a Web Processing Service (WPS) API using applications written by the user.

EDAS allows users to compute close to the data. Performance tests of commonly used workflows produced results 15 to 50 times faster than standard tools in our environment. EDAS is a local NCCS implementation of the Earth System Grid Federation's (ESGF) Compute Working Team (CWT) project to expose ESGF distributed compute resources via an API and a set of analytical operations.

// ARCHITECTURAL COMPONENTS  

 
  1. Client-run software, either Jupyter Notebook or Python scripts, that is installed on the user's system via conda.
  2. War file that runs out of tomcat.  It listens for external requests, parses them, and submits them to the analytics server.
  3. Backend analytics code that utilizes Dask/xarray to partition the work to the available worker nodes.

The user invokes the client to access the tomcat server which forwards the request to the backend server.  Results are returned to the user's system as NetCDF files.


//  AVAILABLE OPERATIONS

The NCCS has created a set of operations available through EDAS:

Operation Type Description EDAS Kernel Name or Workflow
Min Computes the minimum of the array elements along the given axes xarray.min
Max Computes the maximum of the array elements along the given axes xarray.max
Sum Computes the sum of the array elements along the given axes xarray.sum
Difference Computes the point-by-point differences of pairs of arrays xarray.eDiff
Average Computes the area-weighted average of the array elements along the given axes xarray.ave
Mean Computes the unweighted average of the array elements along the given axes xarray.mean
Variance Computes the variance of the array elements along the given axes xarray.var
Median Computes the median of the array elements along the given axes xarray.med
Normalization Normalizes input arrays by centering (computing anomaly) and then dividing by the standard deviation along the given axes xarray.norm
Anomaly Centers the input arrays by subtracting off the mean along the given axes xarray.anomaly
Standard Deviation Computes the standard deviation of the array elements along the given axes xarray.std
Decycle Removes the seasonal cycle from the temporal dynamics xarray.decycle
Lowpass Smooths the input arrays by applying a 1D convolution (lowpass) filter along the given axes xarray.lowpass
Detrend Detrends input arrays by subtracting the result of applying a 1D convolution (lowpass) filter along the given axes xarray.detrend
Teleconnection Produces teleconnection map by computing covariances at each point (in roi) with location specified by 'lat' and 'lon' parameters xarray.telemap
EoF Computes PCs and EOFs along the time axis xarray.eof
Filter Filters input arrays, currently only supports subsetting by month(s) xarray.filter
Cache Cache kernel used to cache input rois for low latency access by subsequest requests xarray.cache
Subset NoOp kernel used to return(subsetted) inputs xarray.subset
NoOp NoOp kernel used to output intermediate products in workflow xarray.noop

Run this WPS GetCapabilities call to get a dynamic list of operations https://edas.nccs.nasa.gov/wps/cwt?request=GetCapabilities


EDAS COLLECTIONS


See the Earth data holdings available through EDAS


Data Collections

// USING EDAS 

analytics icon new[1]

Everything you need to know about how to work with ADAPT in one place.

  • Getting started
  • Best practices
  • Datasets
  • Instructionals

// EARTH DATA ANALYTICS SERVICE (EDAS) 

As the EDAS is a local NCCS implementation of the Earth System Grid Federation's (ESGF) Compute Working Team (CWT) project to expose ESGF distributed compute resources via an API and a set of analytical operations.

// EDAS INSTRUCTIONAL 

Table of Contents

How do I install the EDAS API?
How do I use the EDAS API?
EDAS Collections
EDAS Operations
Plotting Results
Downloading Results
Example Code
When can I use it?
Errors and Messages


How do I install the EDAS API?

The EDAS client Esgf-compute-api, is installed via conda.

1) If you don't already have Anaconda installed on your local system, start by installing the correct version for your environment.

2) Run Conda to install the EDAS client and the necessary dependencies:
Detailed instructions are available here.

conda create -n edas -c cdat/label/v81 -c conda-forge -c cdat python=2.7 cdat pyzmq psutil lxml


This should update your path so that you can access the commands but some shells may need to be updated manually, for example: 
export PATH=${HOME}/anaconda2/bin:${PATH} # for [ba]sh
setenv PATH ${HOME}/anaconda2/bin:${PATH} # for [t]csh
The conda environment is based on the one used for LLNL's UV-CDAT.  While these directions should be sufficient for setting up conda in your environment, additional guidance is available on their site.

3) Initialize shell environment for EDAS and add the branch of the CWT API package that contains the modifications for the NCCS version of the API: source activate edas  git clone https://github.com/ESGF/esgf-compute-api.git cd esgf-compute-api git checkout updates_for_EDAS git pull python setup.py install

How do I use the EDAS API?

Start a Jupyter Notebook to access the API (Jupyter Notebook software is installed in step 1 of "How do I install EDAS API?"),

From within the EDAS conda environment, run: source activate edas and then: jupyter notebook OR execute the EDAS commands from any python script. The EDAS API calls can be made to this address: https://edas.nccs.nasa.gov/wps/cwt Essentially EDAS API accepts three parameters; variable, domain, operation.  The client software is documented in a number of places.

The general ESGF CWT API is described in two documents:
API Description (describes Inputs (variable, domain, operation) and Outputs)
ESGF WPS EXTENSION API Summary (Original Definition Document) 
       

Module documentation:  Detailed documentation will be available Nov 20, 2017.

EDAS Collections

The NCCS has currently made a subset of their earth data holdings available through EDAS.  This list will grow substantially as the service matures and resources become available.

Run this WPS GetCapabilities call to get a dynamic list of collections: https://edas.nccs.nasa.gov/wps/cwt?request=GetCapabilities&identifier=coll

V1.0 Collections:

Collection Name Description
cip_merra_mth CREATE-IP NASA GMAO MERRA Monthly
cip_merra2_mth CREATE-IP NASA GMAO MERRA2 Monthly
cip_eraint_mth CREATE-IP ECMWF ERA-Interim Monthly
cip_cfsr_mth CREATE-IP NOAA NCEP CFSR Monthly
cip_jra25_mth CREATE-IP JMA JRA-25 Monthly
cip_jra55_mth CREATE-IP JMA JRA-55 Monthly
cip_20crv2c_mth CREATE-IP NOAA ESRL 20CRv2c Monthly
cip_jra55_6hr CREATE-IP JMA JRA-55 6-hourly
cip_merra_6hr CREATE-IP NASA GMAO MERRA 6-hourly
cip_merra2_6hr CREATE-IP NASA GMAO MERRA2 6-hourly
cip_eraint_6hr CREATE-IP ECMWF ERA-Interim 6-hourly
cip_cfsr_6hr CREATE-IP NOAA-NCEP CFSR 6-hourly
cip_jra55_6hr CREATE-IP JMA JRA-55 6-hourly
giss_E2-H_hist_r1i1p1 GISS CMIP5 Historical Climate Projection
giss_E2-R_rcp26_r1i1p1 GISS CMIP5 Climate Projection RCP2.6
giss_E2-R_rcp45_r1i1p1 GISS CMIP5 Climate Projection RCP4.5
giss_E2-R_rcp60_r1i1p1 GISS CMIP5 Climate Projection RCP6.0
giss_E2-R_rcp85_r1i1p1 GISS CMIP5 Climate Projection RCP8.5
iap-ua_era40_tas1hr IAP-UA Reprocessed ERA-40 1-hr Surface Temperature
iap-ua_eraint_tas1hr IAP-UA Reprocessed ERA-Interim 1-hr Surface Temperature
iap-ua_merra_tas1hr IAP-UA Reprocessed MERRA 1-hr Surface Temperature
iap-ua_nra_tas1hr IAP-UA Reprocessed NRA 1-hr Surface Temperature
merra2_inst1_2d_asm_Nx MERRA2 2D Hourly Single-Level Diagnostics - M2I1NXASM
merra2_inst1_2d_int_Nx MERRA2 2D Hourly Vertically Integrated Diagnostics - M2I1NXINT
nex-dcp30-hist NEX-DCP30 Downscaled Climate Projections Historical
nex-dcp30-rcp26 NEX-DCP30 Downscaled Climate Projections RCP2.6
nex-dcp30-rcp45 NEX-DCP30 Downscaled Climate Projections RCP4.5
nex-dcp30-rcp60 NEX-DCP30 Downscaled Climate Projections RCP6.0
nex-dcp30-rcp85 NEX-DCP30 Downscaled Climate Projections RCP8.5
nex-gddp-hist NEX-GDDP Global Daily Downscaled Climate Projections Historical
nex-gddp-rcp45 NEX-GDDP Global Daily Downscaled Climate Projections RCP4.5
nex-gddp-rcp85 NEX-GDDP Global Daily Downscaled Climate Projections RCP8.5
C-GLORSv5 CREATE-IP CMCC C-GLORSv5 Ocean Reanalysis
ECDAv31 CREATE-IP GFDL ECDAv31 Ocean Reanalysis
GODAS CREATE-IP NCEP GODAS Ocean Reanalysis
MOVE-G2i CREATE-IP MRI MOVE-G2i Ocean Reanalysis
ORAP5 CREATE-IP ECMWF ORAP5 Ocean Reanalysis
ORAS4 CREATE-IP ECMWF ORAS4 Ocean Reanalysis
CLIVAR-GSOP CREATE-IP CLIVAR-GSOP Ocean Reanalysis Ensemble


Additional information on the above datasets is available:
CREATE-IP
CREATE-IP Datasets

EDAS Operations

The NCCS has made the following initial set of operations available through EDAS.  This list will grow substantially as the service matures and resources become available.

Run this WPS GetCapabilities call to get a dynamic list of operations: https://edas.nccs.nasa.gov/wps/cwt?request=GetCapabilities

Operation Type Description EDAS Kernel Name or Workflow
Min Computes the minimum of the array elements along the given axes xarray.min
Max Computes the maximum of the array elements along the given axes xarray.max
Sum Computes the sum of the array elements along the given axes xarray.sum
Difference Computes the point-by-point differences of pairs of arrays xarray.eDiff
Average Computes the area-weighted average of the array elements along the given axes xarray.ave
Mean Computes the unweighted average of the array elements along the given axes xarray.mean
Variance Computes the variance of the array elements along the given axes xarray.var
Median Computes the median of the array elements along the given axes xarray.med
Normalization Normalizes input arrays by centering (computing anomaly) and then dividing by the standard deviation along the given axes xarray.norm
Anomaly Centers the input arrays by subtracting off the mean along the given axes xarray.anomaly
Standard Deviation Computes the standard deviation of the array elements along the given axes xarray.std
Decycle Removes the seasonal cycle from the temporal dynamics xarray.decycle
Lowpass Smooths the input arrays by applying a 1D convolution (lowpass) filter along the given axes xarray.lowpass
Detrend Detrends input arrays by subtracting the result of applying a 1D convolution (lowpass) filter along the given axes xarray.detrend
Teleconnection Produces teleconnection map by computing covariances at each point (in roi) with location specified by 'lat' and 'lon' parameters xarray.telemap
EoF Computes PCs and EOFs along the time axis xarray.eof
Filter Filters input arrays, currently only supports subsetting by month(s) xarray.filter
Cache Cache kernel used to cache input rois for low latency access by subsequest requests xarray.cache
Subset NoOp kernel used to return(subsetted) inputs xarray.subset
NoOp NoOp kernel used to output intermediate products in workflow xarray.noop

Workflows are simply a combination of kernels that are run in succession.  See the Example Code section below.

Plotting Results

The EDAS client provides some rudimentary plotting and printing routines:
mpl_timeplot:  Plot a time series.
mpl_spaceplot:  Plot a lat/long image.
print_Mdata:  Print the resultant metadata.

Downloading Results

The output files are downloaded to the client's system and placed in /tmp.

You will get an output message that gives you the filename that has been downloaded and the location on the same file on our THREDDS server.  Additional OPeNDAP functionality may be available in future releases.  The message will contain the word "HREFS". 

Example Code

Demo Scripts

Hints for easier usage:
Time ranges can be specified either by indices wherein each increment represents a timestep: 'time': {'start':0, 'end':100, 'crs':'indices’} or by dates: time":{"start":"1980-01-01T00:00:00","end":"1980-12-31T23:00:00”,"crs":"timestamps"} 

Start and end times must go from earlier to later.

Lat and long values must go from smaller to larger.

Compute Average of Yearly US Cloud Cover for Solar Panel Installation

Compute Max to Find High Precipitation Events

Compute Sum of Precipitation Events

Compute Maximum Temperature to find Heat Waves

Compute Maximum Temperature to Investigate 2003 European Heat Wave

Compute Minimum Temprature in Rocky Mountain National Park

When can I use it?

Version 1.0 of the EDAS API is now available.

Errors and Messages

EDAS is currently somewhat verbose.  Jupyter Notebooks have been known to repeat EDAS messages if cells aren't cleared and the kernel restarted, making the verbosity more noticeable.

Important Informational Messages: [2017-10-26 18:28:43,031][wps.py[execute:476]] HREFS: - This message will provide you with the location of the output file on the NCCS THREDDS server. [2017-10-12 13:41:36,866][wps.py[download_result:416]] STATUS: QUEUED - The EDAS server is busy running a previously submitted command.  Your command has been queued and will run shortly.  If you command is queued for an excessively long period of time, 10 minutes or more, please contact the NCCS at support@nccs.nasa.gov Please include "EDAS" in the subject of your email. [2017-10-12 13:41:36,866][wps.py[download_result:416]] STATUS: EXECUTING - The EDAS server is executing your command. [2017-10-12 13:41:37,880][wps.py[download_result:416]] STATUS: COMPLETED - Your job has completed and your results are available.

Syntax errors are provided as needed.

// SUPPORT 

E-Mail us at support@nccs.nasa.gov with subject line: EDAS.