// EARTH DATA ANALYTICS SERVICE (EDAS) 

BIG DATA ANALYTICS FRAMEWORK


Please note, EDAS is currently unavailable due to a hardware problem. We expect the service to be back online by August 29, 2019. Thank you for your understanding.

As the availability and volume of Earth data grow, researchers spend more time downloading and processing their data than doing science. The NCCS has developed the Earth Data Analytics Service (EDAS), a high-performance big data analytics framework built on Dask/xarray, to allow researchers to leverage our compute power to analyze large datasets located at the NCCS through a web-based interface, thereby eliminating the need to download the data.

EDAS provides access to a suite of “canonical operations”—min, max, sum, difference, average, root mean square, anomaly, and standard deviation— that researchers can combine to develop various workflows. EDAS uses a dynamic caching architecture, a custom framework, and a streaming parallel in-memory workflow for efficiently processing huge datasets within limited memory spaces at interactive response times. These operations and datasets can be accessed via a Web Processing Service (WPS) API using applications written by the user.

EDAS allows users to compute close to the data. Performance tests of commonly used workflows produced results 15 to 50 times faster than standard tools in our environment. EDAS is a local NCCS implementation of the Earth System Grid Federation's (ESGF) Compute Working Team (CWT) project to expose ESGF distributed compute resources via an API and a set of analytical operations.

// ARCHITECTURAL COMPONENTS  

 
  1. Client-run software, either Jupyter Notebook or Python scripts, that is installed on the user's system via conda.
  2. War file that runs out of tomcat.  It listens for external requests, parses them, and submits them to the analytics server.
  3. Backend analytics code that utilizes Dask/xarray to partition the work to the available worker nodes.

The user invokes the client to access the tomcat server which forwards the request to the backend server.  Results are returned to the user's system as NetCDF files.


//  AVAILABLE OPERATIONS

The NCCS has created a set of operations available through EDAS:

Operation Type Description EDAS Kernel Name or Workflow
Min Computes the minimum of the array elements along the given axes xarray.min
Max Computes the maximum of the array elements along the given axes xarray.max
Sum Computes the sum of the array elements along the given axes xarray.sum
Difference Computes the point-by-point differences of pairs of arrays xarray.eDiff
Average Computes the area-weighted average of the array elements along the given axes xarray.ave
Mean Computes the unweighted average of the array elements along the given axes xarray.mean
Variance Computes the variance of the array elements along the given axes xarray.var
Median Computes the median of the array elements along the given axes xarray.med
Normalization Normalizes input arrays by centering (computing anomaly) and then dividing by the standard deviation along the given axes xarray.norm
Anomaly Centers the input arrays by subtracting off the mean along the given axes xarray.anomaly
Standard Deviation Computes the standard deviation of the array elements along the given axes xarray.std
Decycle Removes the seasonal cycle from the temporal dynamics xarray.decycle
Lowpass Smooths the input arrays by applying a 1D convolution (lowpass) filter along the given axes xarray.lowpass
Detrend Detrends input arrays by subtracting the result of applying a 1D convolution (lowpass) filter along the given axes xarray.detrend
Teleconnection Produces teleconnection map by computing covariances at each point (in roi) with location specified by 'lat' and 'lon' parameters xarray.telemap
EoF Computes PCs and EOFs along the time axis xarray.eof
Filter Filters input arrays, currently only supports subsetting by month(s) xarray.filter
Cache Cache kernel used to cache input rois for low latency access by subsequest requests xarray.cache
Subset NoOp kernel used to return(subsetted) inputs xarray.subset
NoOp NoOp kernel used to output intermediate products in workflow xarray.noop

Run this WPS GetCapabilities call to get a dynamic list of operations https://edas.nccs.nasa.gov/wps/cwt?request=GetCapabilities


EDAS COLLECTIONS


See the Earth data holdings available through EDAS


Data Collections

// USING EDAS 

analytics icon new[1]

Everything you need to know about how to work with ADAPT in one place.

  • Getting started
  • Best practices
  • Datasets
  • Instructionals

LEARN MORE