// ASTG Python Courses

The Advanced Software Technology Group (ASTG) provides a number of software and hardware support services to the NASA Goddard community. ASTG services include Level 2 help desk support, user training, code migration, performance tuning, parallelization, algorithmic development, software engineering, and code modernization. ASTG works closely with NCCS to assess code performance and system configuration as new hardware systems are integrated. Additionally, ASTG supports developmental projects to research beneficial impacts of emerging technologies on Earth science code performance and advancing software tools to enhance community use of NASA models.

ASTG aims to provide training opportunities in areas such as high-level computing languages, debugging tools, and parallelization of codes. Beginning in September 2020, ASTG plans to provide a series of Python training classes that focus on:

  1. Knowledge of the language
  2. Data manipulation and visualization
  3. Data Science
  4. Machine Learning
  5. Data Parallelism

All the classes are meant for people who are already familiar with another programming language and quickly want to learn Python. Some of the courses here could be taken as SATERN credits and the registration process will be handled through SATERN.

Fall 2021 Python Courses

// Python Beginner Course for Programmers



This course is designed for participants who quickly want to learn the basic concepts of the Python language and be able to use Python related tools for their work. We cover the following topics: data types, conditional statements, loops, functions, modules, Datetime module, and basic IO with text file.

Prerequisites: Ability to program in another language (C, C++, Fortran, Java, Matlab, IDL, etc.). Participants are also expected to be able manipulate a web browser, to open command prompt window or terminal window and edit text files. A gmail account is needed.

Materials for this course are available at: https://astg606.github.io/py_courses/beginner_python/

// Introduction to Numpy

SATERN Course ID: GSFC-600-ITNUMPY

When: Tuesday, September 7, 2021 - 13:00 to 16:00 US EST
Where: Virtual
Registration: Click HERE

NumPy (Numerical Python) is a library consisting of multidimensional array objects and a collection of routines for processing those arrays. It is meant to provide an array object that is at least an order of magnitude faster than traditional Python lists. Using NumPy, mathematical and logical operations on arrays can be efficiently performed. This course introduces the structure of Numpy arrays, show various ways to create them to facilitate numerical calculations. We will also present how to perform array slicing, in-place arithmetic, etc. and how to use built-in mathematical functions for faster computations.

Prerequisites: Familiarity with Python. You will be asked to provide your gmail userid (two days before the beginning the class) in order to be granted access to the NASA Center for Climate Simulation (NCCS) Science Data Managed Cloud Environment (SMCE).

// Introduction to Pandas

SATERN Course ID: GSFC-600-ITP

When: Tuesday, September 14, 2021 - 13:00 to 16:00 US EST
Where: Virtual
Registration: Click HERE

Pandas is an open-source, Python library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. It is built on top of Numpy and is one of the most important Python tools for data analyses. In this course, we will introduce Pandas Series and DataFrames and show how to manipulate them. We will also show how to use Pandas to manipulate and visualize real datasets.

Prerequisites: Numpy and Datetime module. You will be asked to provide your gmail userid (two days before the beginning the class) in order to be granted access to the NASA Center for Climate Simulation (NCCS) Science Data Managed Cloud Environment (SMCE).

// Scalable Computing with Dask

When:Tuesday, September 21, 2021 - 13:00 to 16:00 US EST
Where: Virtual
Registration: CLICK HERE

Dask is a flexible parallel computing library that provides multi-core and distributed parallel execution on larger-than-memory datasets. It distributes data across multi cores and provides ways to scale Pandas, Scikit-Learn and Numpy workflows natively. We will introduce the basic concepts of Dask and learn it can be used to accelerate calculations with Numpy and Pandas objects through Dask dataframes. We will also show how to use the dynamic task scheduling.

Prerequisites: Numpy, Pandas. You will be asked to provide your gmail userid (two days before the beginning the class) in order to be granted access to the NASA Center for Climate Simulation (NCCS) Science Data Managed Cloud Environment (SMCE).

// Interactive Visualization with HoloViews

SATERN Course ID: GSFC-600-IVH

When: Tuesday, September 28, 2021 - 13:00 to 16:00 US EST
Where: Virtual
Registration: Click HERE

HoloViews is an open-source Python library designed to make data analysis and visualization seamless and simple. It helps users understand their data better, by letting them work seamlessly with both the data and its graphical representation. HoloViews focuses on bundling users’ data together with the appropriate metadata to support both analysis and visualization, making users’ raw data and its visualization equally accessible at all times. In this course, we will learn how to create various plots (line plot, scatter plot, histogram, etc.) with HoloViews.

Prerequisites: Numpy and Pandas. You will be asked to provide your gmail userid (two days before the beginning the class) in order to be granted access to the NASA Center for Climate Simulation (NCCS) Science Data Managed Cloud Environment (SMCE).

// Manipulating Geolocated Data with Xarray

SATERN Course ID: GSFC-600-MGX

When: Tuesday, October 5, 2021 - 13:00 to 16:00 US EST
Where: Virtual
Registration: Click HERE

Xarray is an open-source library providing high-level, easy-to-use data structures and analysis tools for working with multidimensional labeled datasets and arrays in Python. Xarray combines the convenience of labeled data structures inspired by Pandas with the multi-dimensional arrays of NumPy and parallel out-of-core computation from Dask to provide an intuitive, powerful and scalable platform for scientific analysis. Xarray introduces labels in the form of dimensions, coordinates and attributes on top of raw Numpy-like arrays. It is particularly tailored to working with netCDF files. In this course, we will cover the main concepts of Xarray and will learn how to read multidimensional data from scientific data formatted files, manipulate the data and perform visualizations.

Prerequisites: Numpy and Pandas. You will be asked to provide your gmail userid (two days before the beginning the class) in order to be granted access to the NASA Center for Climate Simulation (NCCS) Science Data Managed Cloud Environment (SMCE).

// Accessing Web Resources with Python

SATERN Course ID: GSFC-600-ARP

When: Tuesday, October 12, 2021 - 13:00 to 16:00 US EST
Where: Virtual
Registration: Click HERE

In this presentation, we use the Python packages Requests and Beautiful Soup in order to access and manipulate data from web pages. The Requests module lets you integrate your Python programs with web services, while the Beautiful Soup module is designed for pulling data out of HTML and XML files. We will explain how to access a web page, read and collect data, and work with the textual information available there. We will also introduce the JavaScript Object Notation (JSON) and learn how to get JSON objects from the web. We provide examples on how to access both publicly available data as well as those requiring authentification.

Prerequisites: Familiarity with Python programming and basic HTML syntax. You will be asked to provide your gmail userid (two days before the beginning the class) in order to be granted access to the NASA Center for Climate Simulation (NCCS) Science Data Managed Cloud Environment (SMCE).

// Hackathon

SATERN Course ID: GSFC-600-ARP

When: Thursday, October 28, 2021 - 09:00 to 16:00 US EST
Where: Virtual


In this event, we will write Python applications to retrieve remote NASA data files, read the files, manipulate the data and perform visualizations.

Prerequisites: Familiarity with Python programming, Numpy, Pandas, Xarray, Web Scrapping, HoloViews.

// Seaborn and Exploratory Data Analysis

SATERN Course ID: GSFC-SEDA

When:Friday, November 9, 2021 - 13:00 to 16:00 US EST
Where: Virtual
Registration: CLICK HERE

Exploratory Data Analysis (EDA) is the process of preparing your data for modeling. EDA aims to find trends in the data and to answer initial questions. In this presentation, we will define the main principles of EDA and apply those principles to find insights in a dataset. We will first introduce Seaborn (a plotting tool for statistical analysis).

Prerequisites: Numpy, Matplotlib, Pandas. You will be asked to provide your gmail userid (two days before the beginning the class) in order to be granted access to the NASA Center for Climate Simulation (NCCS) Science Data Managed Cloud Environment (SMCE).

// Basic Machine Learning Modeling with Scikit-Learn

SATERN Course ID: GSFC-600-BMLMS

When:Tuesday, November 23, 2021 - 13:00 to 16:00 US EST
Where: Virtual
Registration: CLICK HERE

Scikit-Learn (sklearn) is Python’s general-purpose Machine Learning (ML) library. Its versatility makes it the best starting place for most ML problems. In this presentation, we first introduce the basic concepts of ML. We then utilize the dataset from the Exploratory Data Analysis lecture to build two regression models. We finally implement cross validation computations using k-fold to determine the best ML algorithm for the available data.

Prerequisites: Pandas, Exploratory Data Analysis. You will be asked to provide your gmail userid (two days before the beginning the class) in order to be granted access to the NASA Center for Climate Simulation (NCCS) Science Data Managed Cloud Environment (SMCE).

// Basic Machine Learning Modeling with TensorFlow

When:Tuesday, December 7, 2021 - 13:00 to 16:00 US EST
Where: Virtual
Registration: CLICK HERE

TensorFlow is an open-source software library for Machine Learning (ML) across a range of tasks. It is a symbolic math library, and also used as a system for building and training neural networks to detect and decipher patterns and correlations, analogous to human learning and reasoning. In this presentation, we will use real datasets and follow ML steps to create models, validate them and make predictions.

Prerequisites: Pandas, Exploratory Data Analysis. You will be asked to provide your gmail userid (two days before the beginning the class) in order to be granted access to the NASA Center for Climate Simulation (NCCS) Science Data Managed Cloud Environment (SMCE).



Online Tutorials


Getting started with TensorFlow ML on NCCS platforms


These Jupyter Notebook tutorials will introduce the steps required to enable TensorFlow machine learning on NCCS resources. The links will open in Google Colaboratory, so will require you to login to your Google account.

Tutorial Interactive Link
TensorFlow on Discover Open In Colab
TensorFlow on ADAPT JupyterHub Open In Colab
TensorFlow on ADAPT Open In Colab


Past Python Classes

Spring 2021 Python Courses

// Python Beginner Course for Programmers



This course is designed for participants who quickly want to learn the basic concepts of the Python language and be able to use Python related tools for their work. We cover the following topics: data types, conditional statements, loops, functions, modules, Datetime module, and basic IO with text file.

Prerequisites: Ability to program in another language (C, C++, Fortran, Java, Matlab, IDL, etc.). Participants are also expected to be able manipulate a web browser, to open command prompt window or terminal window and edit text files. A gmail account is needed.

Materials for this course are available at: https://astg606.github.io/py_courses/beginner_python/

// Python Scripting and Packaging

SATERN Course ID: GSFC-600-PSP

When: Tuesday, January 26, 2021 - 13:00 to 16:00 US EST
Where: Virtual
Registration: Click HERE

In this course, we will create from scratch a Python application to solve a specific problem. The process will involve writing Python scripts that will evolve into modules. Then, the modules will be combined to form a package. We will use a YAML file to pass parameters to the Python application. Participants will learn the processes needed for creating their own Python package.

Prerequisites: Familiarity with Python programming, ability to use a file editor. You might be asked to provide your gmail userid (two days before the beginning the class) in order to be granted access to the NASA Center for Climate Simulation (NCCS) Science Data Managed Cloud Environment (SMCE).

// Object Oriented Programming with Python

SATERN Course ID: GSFC-600-OOOP

When: Tuesday, February 9, 2021 - 13:00 to 16:00 US EST
Where: Virtual
Registration: Click HERE

Python has been created an object-oriented language. Because of this, creating and using classes and objects are downright easy. This course presents the principles of object-oriented programming (OOP) using Python. We use simple examples to learn the benefits of OOP and see how the main OOP concepts (such as Encapsulation, Data Abstraction, Polymorphism and Inheritance) can be used in Python applications.

Prerequisites: Familiarity with Python programming. You might be asked to provide your gmail userid (two days before the beginning the class) in order to be granted access to the NASA Center for Climate Simulation (NCCS) Science Data Managed Cloud Environment (SMCE).

// Accessing Web Resources with Python

SATERN Course ID: GSFC-600-ARP

When: Tuesday, February 23, 2021 - 13:00 to 16:00 US EST
Where: Virtual
Registration: Click HERE

In this presentation, we use the Python packages Requests and Beautiful Soup in order to access and manipulate data from web pages. The Requests module lets you integrate your Python programs with web services, while the Beautiful Soup module is designed for pulling data out of HTML and XML files. We will explain how to access a web page, read and collect data, and work with the textual information available there. We will also introduce the JavaScript Object Notation (JSON) and learn how to get JSON objects from the web.

Prerequisites: Familiarity with Python programming and basic HTML syntax.

// Introduction to Numpy

SATERN Course ID: GSFC-600-ITNUMPY

When: Tuesday, March 9, 2021 - 13:00 to 16:00 US EST
Where: Virtual
Registration: Click HERE

NumPy (Numerical Python) is a library consisting of multidimensional array objects and a collection of routines for processing those arrays. It is meant to provide an array object that is at least an order of magnitude faster than traditional Python lists. Using NumPy, mathematical and logical operations on arrays can be efficiently performed. This course introduces the structure of Numpy arrays, show various ways to create them to facilitate numerical calculations. We will also present how to perform array slicing, in-place arithmetic, etc. and how to use built-in mathematical functions for faster computations.

Prerequisites: Familiarity with Python. You might be asked to provide your gmail userid (two days before the beginning the class) in order to be granted access to the NASA Center for Climate Simulation (NCCS) Science Data Managed Cloud Environment (SMCE).

// Introduction to Pandas

SATERN Course ID: GSFC-600-ITP

When: Tuesday, March 16, 2021 - 13:00 to 16:00 US EST
Where: Virtual
Registration: Click HERE

Pandas is an open-source, Python library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. It is built on top of Numpy and is one of the most important Python tools for data analyses. In this course, we will introduce Pandas series and dataframes and show how to manipulate them. We will also show how to use Pandas to manipulate and visualize real datasets.

Prerequisites: Numpy and Datetime module. You might be asked to provide your gmail userid (two days before the beginning the class) in order to be granted access to the NASA Center for Climate Simulation (NCCS) Science Data Managed Cloud Environment (SMCE).

// Interactive Data Visualization with HoloViews

SATERN Course ID: GSFC-600-IVH

When: Thursday, March 25, 2021 - 13:00 to 16:00 US EST
Where: Virtual
Registration: Click HERE

HoloViews is an open-source Python library designed to make data analysis and visualization seamless and simple. It helps users understand their data better, by letting them work seamlessly with both the data and its graphical representation. HoloViews focuses on bundling users’ data together with the appropriate metadata to support both analysis and visualization, making users’ raw data and its visualization equally accessible at all times. In this course, we will learn how to create various plots (line plot, scatter plot, histogram, etc.) with HoloViews.

Prerequisites: Numpy and Pandas.

// Manipulating Geolocated Data with Xarray

SATERN Course ID: GSFC-600-MGX

When: Tuesday, April 13, 2021 - 13:00 to 16:00 US EST
Where: Virtual
Registration: Click HERE

Xarray is an open-source library providing high-level, easy-to-use data structures and analysis tools for working with multidimensional labeled datasets and arrays in Python. Xarray combines the convenience of labeled data structures inspired by Pandas with the multi-dimensional arrays of NumPy and parallel out-of-core computation from Dask to provide an intuitive, powerful and scalable platform for scientific analysis. Xarray introduces labels in the form of dimensions, coordinates and attributes on top of raw Numpy-like arrays. It is particularly tailored to working with netCDF files. In this course, we will cover the main concepts of Xarray and will learn how to read multidimensional data from scientific data formatted files, manipulate the data and perform visualizations.

Prerequisites: Numpy and Pandas.

// RAPIDS CuPy, CuDF and CuML

SATERN Course ID: GSFC-600-RCCC

When: Tuesday, April 27, 2021 - 13:00 to 16:00 US EST
Where: Virtual
Registration: CLICK HERE

RAPIDS is open source software for GPU acceleration, focusing on data science workflows. Some the main components of RAPIDS are CuPy (an implementation of NumPy-compatible multi-dimensional array on CUDA), CuDF (GPU DataFrame library that provides a pandas-like API for loading, joining, aggregating, filtering, and manipulating data.) and CuML (provides GPU versions of machine learning functions in Sckit-Learn). All these tools make it simple to port Python codes to RAPIDS. In this course, we will provide simple examples to learn how to use CuPy, CuDF and CuML on GPUs.

Prerequisites: Numpy and Pandas.

// Accelerating (Numba) and Scaling (Dask) Python Codes on CPUs

SATERN Course ID: GSFC-600-SDCPUS

When: Tuesday, May 4, 2021 - 13:00 to 16:00 US EST
Where: Virtual
Registration: CLICK HERE

Numba is a Python tool that is used to speed up Python applications that heavily use Numpy arrays. Dask is a flexible parallel computing library that provides multi-core and distributed parallel execution on larger-than-memory datasets. In this course, we will present how the two tools can be used to speed and scale Python applications on CPUs.

Prerequisites: Numpy and Pandas.

// Accelerating (Numba) and Scaling (Dask) Python Codes on GPUs

SATERN Course ID: GSFC-600-SDGPUS

When: Tuesday, May 18, 2021 - 13:00 to 16:00 US EST
Where: Virtual

Numba is a Python tool that is used to speed up Python applications that heavily use Numpy arrays. Dask is a flexible parallel computing library that provides multi-core and distributed parallel execution on larger-than-memory datasets. In this course, we will present how the two tools can be used to speed and scale Python applications on GPUs.

Prerequisites: Numpy and Pandas.

Fall 2020 Virtual Python Classes

// Introduction to Version Control with Git

SATERN Course ID: GSFC-600-IVCG

When: Thursday, September 10, 2020 - 13:00 to 16:00 US EST
Where: Virtual through Microsoft Teams (link will be provided)
Registration: Click HERE

Version control is a system that allows you to keep track of changes made to your code over time. It is useful for reverting back to specific versions of your code, and for collaborating on the same work with contributors. Git is a popular distributed version control system. In this lecture, we will have a quick introduction of version control and will present the features of Git. Through examples, we will learn how to create a local Git repository. Finally, we will create remote repositories in GitHub.com and do some operations (clone, stage, commit, push, pull, etc.).

Prerequisites: Participants are expected to be able manipulate a web browser, to open command prompt window or terminal window and edit text files. A gmail account is needed.

// Python Beginner Course for Programmers

SATERN Course ID: GSFC-600-PBCP

When: Tuesday, September 15, 2020 - 08:30 to 17:30 US EST
Where: Virtual through Microsoft Teams (link will be provided)
Registration: Click HERE

This accelerated course is designed for participants who quickly want to learn the basic concepts of the Python language and be able to use Python related tools for their work. We will cover the following topics: data types, conditional statements, loops, functions, modules, Datetime module, and basic IO with text file.

Prerequisites: Ability to program in another language (C, C++, Fortran, Java, Matlab, IDL, etc.). Participants are also expected to be able manipulate a web browser, to open command prompt window or terminal window and edit text files. A gmail account is needed.

// Introduction to Numpy and Matplotlib

SATERN Course ID: GSFC-600-INM

When: Monday, September 21, 2020 - 13:00 to 16:00 US EST
Where: Virtual
Registration: Click HERE

NumPy (Numerical Python) is a library consisting of multidimensional array objects and a collection of routines for processing those arrays. It is meant to provide an array object that is at least an order of magnitude faster than traditional Python lists. Using NumPy, mathematical and logical operations on arrays can be efficiently performed. This course introduces the structure of Numpy arrays, show various ways to create them to facilitate numerical calculations. We will also present how to perform array slicing, in-place arithmetic, etc. and how to use built-in mathematical functions for faster computations. Matplotlib is one of the most popular Python packages used for data visualization. It is a cross-platform library for making 2D plots from data in arrays. It can be used in python scripts, shell, web application servers and other graphical user interface toolkits. In this course, we will present the anatomy of a Matplotlib figure and show to produce various 2D plots with Matplotlib.

Prerequisites: Familiarity with Python. You might be asked to provide your gmail userid (two days before the beginning the class) in order to be granted access to the NASA Center for Climate Simulation (NCCS) Science Data Managed Cloud Environment (SMCE).

// Introduction to Pandas

SATERN Course ID: GSFC-600-ITP

When: Wednesday, September 23, 2020 - 13:00 to 16:00 US EST
Where: Virtual
Registration: Click HERE

Pandas is an open-source, Python library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. It is built on top of Numpy and is one of the most important Python tools for data analyses. In this course, we will introduce Pandas series and dataframes and show how to manipulate them. We will also show how to use Pandas to manipulate and visualize real datasets.

Prerequisites: Numpy and Datetime module. You might be asked to provide your gmail userid (two days before the beginning the class) in order to be granted access to the NASA Center for Climate Simulation (NCCS) Science Data Managed Cloud Environment (SMCE).

// Seaborn and Exploratory Data Analysis

SATERN Course ID: GSFC-SEDA

When:Friday, September 25, 2020 - 13:00 to 16:00 US EST
Where: Virtual
Registration: CLICK HERE

Exploratory Data Analysis (EDA) is the process of preparing your data for modeling. EDA aims to find trends in the data and to answer initial questions. In this presentation, we will define the main principles of EDA and apply those principles to find insights in a dataset. We will first introduce Seaborn (a plotting tool for statistical analysis).

Prerequisites: Numpy, Matplotlib, Pandas. You might be asked to provide your gmail userid (two days before the beginning the class) in order to be granted access to the NASA Center for Climate Simulation (NCCS) Science Data Managed Cloud Environment (SMCE).

// Basic Machine Learning Modeling with Scikit-Learn

SATERN Course ID: GSFC-600-BMLMS

When:Tuesday, October 13, 2020 - 13:00 to 16:00 US EST
Where: Virtual
Registration: CLICK HERE

Scikit-Learn (sklearn) is Python’s general-purpose Machine Learning (ML) library. Its versatility makes it the best starting place for most ML problems. In this presentation, we first introduce the basic concepts of ML. We then utilize the dataset from the Exploratory Data Analysis lecture to build two regression models. We finally implement cross validation computations using k-fold to determine the best ML algorithm for the available data.

Prerequisites: Pandas, Exploratory Data Analysis. You might be asked to provide your gmail userid (two days before the beginning the class) in order to be granted access to the NASA Center for Climate Simulation (NCCS) Science Data Managed Cloud Environment (SMCE).

// Basic Machine Learning Modeling with TensorFlow

When:Tuesday, October 27, 2020 - 13:00 to 16:00 US EST
Where: Virtual
Registration: CLICK HERE

TensorFlow is an open-source software library for Machine Learning (ML) across a range of tasks. It is a symbolic math library, and also used as a system for building and training neural networks to detect and decipher patterns and correlations, analogous to human learning and reasoning. In this presentation, we will use real datasets and follow ML steps to create models, validate them and make predictions.

Prerequisites: Pandas, Exploratory Data Analysis. You might be asked to provide your gmail userid (two days before the beginning the class) in order to be granted access to the NASA Center for Climate Simulation (NCCS) Science Data Managed Cloud Environment (SMCE).

// Interactive Visualization with Bokeh

When:Thursday, November 5, 2020 - 13:00 to 16:00 US EST
Where: Virtual
Registration: Will start on October 22.

Bokeh is a Python library for creating interactive data visualizations in a web browser. Bokeh renders its plots using HTML and JavaScript, and proves to be extremely useful for developing web based dashboards. Unlike Matplotlib and Seaborn which produce static plots, Bokeh creates interactive plots that change when the user interacts with them. We will learn how to create various interactive plots with Bokeh.

Prerequisites: Numpy, Matplotlib. You might be asked to provide your gmail userid (two days before the beginning the class) in order to be granted access to the NASA Center for Climate Simulation (NCCS) Science Data Managed Cloud Environment (SMCE).

// Accelerating Data Science Workflows with RAPIDS

When:Thursday, November 5, 2020 - 10:00 to 15:00 US EST
Where: Virtual
Registration: CLICK HERE

Data science workflow has traditionally been slow and cumbersome when it came to loading, filtering and manipulating data, as well as ML training itself. These processes were constrained to slow, CPU-based compute, and resulted in lengthy cycle times impacting data science productivity. RAPIDS delivers GPU accelerated machine learning and data analytics libraries, deployed on NVIDIA GPU platforms, for maximized data science productivity, performance and insights. Bring the power of GPU acceleration to your research and decrease your time to new discoveries with RAPIDS. In this course, you will learn how to GPU-accelerate your data science applications by:

  • Utilizing key RAPIDS libraries like cuDF (GPU-enabled Pandas-like dataframes) and cuML (GPU-accelerated machine learning algorithms)
  • Learning techniques and approaches to end-to-end data science, made possible by rapid iteration cycles created by GPU acceleration.


Prerequisites: Python, Numpy, Pandas, Scikit-Learn.

// Scalable Computing with Dask

When:Tuesday, November 10, 2020 - 13:00 to 16:00 US EST
Where: Virtual
Registration: CLICK HERE

Dask is a flexible parallel computing library that provides multi-core and distributed parallel execution on larger-than-memory datasets. It distributes data across multi cores and provides ways to scale Pandas, Scikit-Learn and Numpy workflows natively. We will introduce the basic concepts of Dask and learn it can be used to accelerate calculations with Numpy and Pandas objects through Dask dataframes. We will also show how to use the dynamic task scheduling.

Prerequisites: Numpy, Pandas, Bokeh. You might be asked to provide your gmail userid (two days before the beginning the class) in order to be granted access to the NASA Center for Climate Simulation (NCCS) Science Data Managed Cloud Environment (SMCE).

// Introduction to Numba

When:Tuesday, December 1, 2020 - 13:00 to 16:00 US EST
Where: Virtual
Registration: CLICK HERE

Numba is a just-in-time compiler for Python that works best on code that uses NumPy arrays and functions, and loops. The most common way to use Numba is through its collection of decorators that can be applied to your functions to instruct Numba to compile them. In this lecture, we plan to present the main features of Numba and show of it can be used to speedup Python applications that heavily use Numpy arrays.

Prerequisites: Numpy. You might be asked to provide your gmail userid (two days before the beginning the class) in order to be granted access to the NASA Center for Climate Simulation (NCCS) Science Data Managed Cloud Environment (SMCE).

Spring 2020 Virtual Python Classes

// Introduction to Pandas

When: Tuesday, March 24, 2020 - 10:00 to 11:30

Pandas is a Python library that provides extensive means for data analysis. Pandas makes it very convenient to load, process, and analyze tabular data using SQL-like queries. In this presentation, we will introduce the two Pandas data types and show examples on how Pandas is used to read csv data files, to manipulate the data, and do basic visualizations.

Prerequisites: Numpy and Matplotlib

// Exploratory Data Analysis with Pandas

When: Thursday, March 26, 2020 - 10:00 to 11:30

Exploratory Data Analysis (EDA) is the process of preparing your data for modeling. EDA aims to find trends in the data and and to answer initial questions. In this presentation, we will define the main principles of EDA and apply those principles to find insights in a dataset.

Prerequisites: Numpy, Matplotlib and Pandas

// Serialization and Deserialization with Python

When: Tuesday, March 31, 2020 - 10:00 to 11:30

Serialization is a process of converting an object into a sequence of bytes which can be persisted to a disk or database or can be sent through streams. The reverse process of creating object from sequence of bytes is called deserialization. In this presentation, we will show how we can serialize/deserialize objects using the Python modules Pickle and Json. We will also explain the context in which each of the modules needs to used to perform serialization and deserialization.

Prerequisites: Python programming

// Accessing Web Resources with Python

When: Thursday, April 2, 2020 - 10:00 to 11:30

In this presentation, we use the Python packages Requests and Beautiful Soup in order to access and manipulate data from web pages. The Requests module lets you integrate your Python programs with web services, while the Beautiful Soup module is designed for pulling data out of HTML and XML files. We will explain how to access a web page, read and collect data, and work with the textual information available there.

Prerequisites: Python programming and basic HTML syntax

// Basic Machine Learning Modeling with Scikit-Learn

When: Tuesday, April 7, 2020 - 10:00 to 11:30

Scikit-Learn (sklearn) is Python’s general-purpose Machine Learning (ML) library. Its versatility makes it the best starting place for most ML problems. In this presentation, we first introduce the basic concepts of ML. We then utilize the dataset from the Exploratory Data Analysis lecture to build two regression models. We finally implement cross validation computations using k-fold to determine the best ML algorithm for the available data.

Prerequisites: Numpy, Pandas

// Basic Machine Learning Modeling with TensorFlow

When: Thursday, April 9, 2020 - 10:00 to 11:30

TensorFlow is an open-source software library for machine learning across a range of tasks. It is a symbolic math library, and also used as a system for building and training neural networks to detect and decipher patterns and correlations, analogous to human learning and reasoning. In this presentation, we will create a synthetic dataset, split the data into two sets (training and testing), use TensorFlow to create a machine learning model, test it and and make predictions.

Prerequisites: Numpy, Pandas

// Introduction to OOP with Python

When: Tuesday, April 14, 2020 - 10:00 to 11:30

Python has been created an object-oriented language. Because of this, creating and using classes and objects are downright easy. This lecture presents the principles of object-oriented programming (OOP) using Python. We use simple examples to introduce the main OOP concepts such as Encapsulation, Data Abstraction, Polymorphism and Inheritance.

Prerequisites: Familiarity with Python programming

// Python Decorators

When: Thursday, April 16, 2020 - 10:00 to 11:30

A decorator is a function that takes another function and extends the behavior of the latter function without explicitly modifying it. Decorators allow you to define reusable building blocks that can change or extend the behavior of other functions. And they let you do that without permanently modifying the wrapped function itself. The function’s behavior changes only when it’s decorated. In this lecture, we will use a step by step approach to create decorators and show how they can be used in Python applications.

Prerequisites: OOP with Python

// Command Line Interface with Click

When: Tuesday, April 21, 2020 - 10:00 to 11:30

Click is a Python package for creating beautiful command line interfaces in a composable way with as little code as necessary. It’s highly configurable and user's friendly tool that defines commands through decorators. Values are passed to the commands via options or arguments. In this lecture, we use Click to pass parameters to Python applications through the command line.

Prerequisites: Familiarity with Python programming and Decorators

// Configuration File Parser (ConfigParser)

When: Thursday, April 23, 2020 - 10:00 to 11:30

ConfigParser is a Python package that manages user-editable configuration files for an application. The configuration files are organized into sections, and each section contains name-value pairs for configuration data. We will learn how to create configuration files and how to read the parameters set in the files to drive examples of Python applications.

Prerequisites: Familiarity with Python programming and Click

// List Comprehension

When: Tuesday, April 28, 2020 - 10:00 to 11:30

List comprehensions provide a concise way to create lists. Common applications are to make new lists where each element is the result of some operations applied to each member of another sequence or iterable, or to create a subsequence of those elements that satisfy a certain condition. In this lecture, we will present different methods of creating lists. We will introduce lambda, filter, reduce and map functions as effective tools to create list.

Prerequisites: Familiarity with Python programming

// Optimizing your Python Application

When: Thursday, April 30, 2020 - 10:00 to 11:30

This course is meant for Python programmers who want to learn techniques needed to speed up their Python applications. We will first introduce profiling tools and then present programming tips (that are partly within the realm of Python) to consider in order to accelerate loops, perform arithmetic operations, list creations, quickly manipulate Numpy arrays, reduce memory footprint, etc.

Prerequisites: Familiarity with Python programming

REGISTER

Registration is closed.

// Introduction to Numba

When: Tuesday, May 5, 2020 - 10:00 to 11:30

Numba is a just-in-time compiler for Python that works best on code that uses NumPy arrays and functions, and loops. The most common way to use Numba is through its collection of decorators that can be applied to your functions to instruct Numba to compile them. In this lecture, we plan to present the main features of Numba and show of it can be used to speedup Python applications that heavily use Numpy arrays.

Prerequisites: Familiarity with Python programming, Numpy and Decorators

REGISTER

Registration is closed.

// Datetime Module

When: Thursday, May 7, 2020 - 10:00 to 11:30

The datetime module provides various classes and functions for working with date and time parsing, formatting, and arithmetic. In this lecture, we will learn datetime functions to create datetime objects, extract the year, the month, the day, etc. from the objects, get the difference between two dates and times, etc. We will also learn how datetime objects can be combined with Pandas dataframes to build time series data.

Prerequisites: Familiarity with Python programming

REGISTER

Registration is closed.

// Introduction to Version Control with Git

When: Tuesday, May 12, 2020 - 10:00 to 11:30

Version control is a system that allows you to keep track of changes made to your code over time. It is useful for reverting back to specific versions of your code, and for collaborating on the same work with contributors. Git is a popular distributed version control system. In this lecture, we will have a quick introduction of version control and will present the features of Git. Through examples, we will learn how to create a local Git repository. Finally, we will create remote repositories in GitHub.com and do some operations (clone, stage, commit, push, pull, etc.).

Prerequisites: None

REGISTER

Registration is now closed.


For further information please contact either Bruce Van Aartsen (bruce.vanaartsen@nasa.gov) or Jules Kouatchou (jules.kouatchou@nasa.gov).

Read More About Instructors

We want to thank Brent Smith and Megan Damon who contributed to the creation of some of the materials used here.