// ASTG Python Courses

The Advanced Software Technologies Group (ASTG) provides a number of software and hardware support services to the NASA Goddard community. ASTG services include Level 2 help desk support, user training, code migration, performance tuning, parallelization, algorithmic development, software engineering, and code modernization. ASTG works closely with NCCS to assess code performance and system configuration as new hardware systems are integrated. Additionally, ASTG supports developmental projects to research beneficial impacts of emerging technologies on Earth science code performance and advancing software tools to enhance community use of NASA models.

ASTG aims to provide training opportunities in areas such as high-level computing languages, debugging tools, and parallelization of codes. In May of 2019, ASTG conducted a Code 600 wide survey to estimate training course needs, and the results of the survey have shown a high interest in Python related topics. In response, from September 2019 to May 2020, ASTG is providing a series of Python training classes that focus on three main areas:

  1. Knowledge of the language
  2. Data manipulation and visualization
  3. Best coding practices

All the classes are meant for people who are already familiar with another programming language and quickly want to learn Python. Every other month we will offer an accelerated one-day Python course for beginners, and every month, a special topic course. If you have never been exposed to Python, we strongly recommend that you take one of the accelerated classes (offered on Sep. 24, 2019, Nov. 12, 2019, Jan. 14, 2020, Mar. 17, 2020, and May 12, 2020) before attempting any other one.

ASTG is partnering with outside organizations to organize webinars and face-to-face classes on advanced Python topics. For instance, beginning in October 2019, Intel plans to provide a series of webinars on Machine Learning.

The goal of this first series is to establish strong Python foundations within the Goddard community. Later, we intend to add more building blocks by offering courses in areas such as Big Data, Machine Learning, Web Programming, and Parallel Programming.

Special Virtual Python Classes

Due to the current Covid-19 situation, we will be offering virtual courses. All the classes will be provided within the Microsoft Teams environment.

Please note the following:

  • Activate your Microsoft Teams account using your NASA credentials. We will only contact participants using their NASA email addresses. For each presentation, we will create an internal Microsoft Teams team for the virtual classroom.
  • Each presentation will start of 10:00 am and will take at most one hour 30 minutes. You must open the Teams application, and go to team "ASTG (Code 606) Virtual Classes" to join the meeting.
  • You can find a link to the course materials under the Wiki tab of the ASTG Virtual Classes team.
  • Have a gmail account. We will use Google Colaboratory (Cloud-based Jupiter Notebook) to present the materials and run examples.
  • Registration opens shortly after the previous class ends and closes 30 minutes prior to each presentation.

Select class name below to view more information, prerequisites, and register

// Introduction to Pandas

When: Tuesday, March 24, 2020 - 10:00 to 11:30

Pandas is a Python library that provides extensive means for data analysis. Pandas makes it very convenient to load, process, and analyze tabular data using SQL-like queries. In this presentation, we will introduce the two Pandas data types and show examples on how Pandas is used to read csv data files, to manipulate the data, and do basic visualizations.

Prerequisites: Numpy and Matplotlib

// Exploratory Data Analysis with Pandas

When: Thursday, March 26, 2020 - 10:00 to 11:30

Exploratory Data Analysis (EDA) is the process of preparing your data for modeling. EDA aims to find trends in the data and and to answer initial questions. In this presentation, we will define the main principles of EDA and apply those principles to find insights in a dataset.

Prerequisites: Numpy, Matplotlib and Pandas

// Serialization and Deserialization with Python

When: Tuesday, March 31, 2020 - 10:00 to 11:30

Serialization is a process of converting an object into a sequence of bytes which can be persisted to a disk or database or can be sent through streams. The reverse process of creating object from sequence of bytes is called deserialization. In this presentation, we will show how we can serialize/deserialize objects using the Python modules Pickle and Json. We will also explain the context in which each of the modules needs to used to perform serialization and deserialization.

Prerequisites: Python programming

// Accessing Web Resources with Python

When: Thursday, April 2, 2020 - 10:00 to 11:30

In this presentation, we use the Python packages Requests and Beautiful Soup in order to access and manipulate data from web pages. The Requests module lets you integrate your Python programs with web services, while the Beautiful Soup module is designed for pulling data out of HTML and XML files. We will explain how to access a web page, read and collect data, and work with the textual information available there.

Prerequisites: Python programming and basic HTML syntax

REGISTER

Registration is now closed.

// Basic Machine Learning Modeling with Scikit-Learn

When: Tuesday, April 7, 2020 - 10:00 to 11:30

Scikit-Learn (sklearn) is Python’s general-purpose Machine Learning (ML) library. Its versatility makes it the best starting place for most ML problems. In this presentation, we first introduce the basic concepts of ML. We then utilize the dataset from the Exploratory Data Analysis lecture to build two regression models. We finally implement cross validation computations using k-fold to determine the best ML algorithm for the available data.

Prerequisites: Numpy, Pandas

REGISTER

Registration is now active through 9:30 am Tuesday, April 7.

// Basic Machine Learning Modeling with TensorFlow

When: Thursday, April 9, 2020 - 10:00 to 11:30

TensorFlow is an open-source software library for machine learning across a range of tasks. It is a symbolic math library, and also used as a system for building and training neural networks to detect and decipher patterns and correlations, analogous to human learning and reasoning. In this presentation, we will create a synthetic dataset, split the data into two sets (training and testing), use TensorFlow to create a machine learning model, test it and and make predictions.

Prerequisites: Numpy, Pandas

REGISTER

Registration will become active at noon on Tuesday, April 7.


For further information please contact either Bruce Van Aartsen (bruce.vanaartsen@nasa.gov) or Jules Kouatchou (jules.kouatchou@nasa.gov).

Read More About Instructors


Upcoming Face-to-Face Python Classes



Important Information:
  • Face-to-face classes will be rescheduled when appropriate.
  • Registration will begin one week prior to the start of any given course. Registration is only allowed for the next available course.
  • There is no waiting list. Registration will automatically close when the targeted number of participants is reached.
  • Participants are expected to bring their own laptops and have Gmail accounts.

Select class name to view more information, prerequisites, and register

// Python Beginner Course for Programmers

When: Tuesday, March 17, 2020 - 08:30 to 17:30
Where: Building 28, Room E210
Registration Open: Tuesday, March 10, 2020 - 12:00 (noon) through March 16

This accelerated course is designed for participants who quickly want to learn the basic concepts of the Python language and be able to use Python related tools for their work. In the morning, we will cover the foundations of Python (data types, conditional statements, loops, functions, modules). In the afternoon, we will present various Python tools (netCDF4, h5py, Matplotlib, Pandas, etc.) that are used to read/write files (of different formats), manipulate data and perform visualization.

Prerequisites: Ability to program in another language (C, C++, Fortran, Java, Matlab, IDL, etc.) and knowledge of at least one file format (such as csv, hdf, netcdf). Participants are also expected to be able manipulate a web browser, to open command prompt window or terminal window and edit text files.

REGISTER

*** This class has been cancelled, due to the Coronavirus situation. We hope to reschedule this class for a later date. ***

// Hands on with Pangeo: Cloud-native Open-Source Python Tools for Scalable Analysis of Big Data in the Geosciences

Speaker: Joe Hamman, National Center for Atmospheric Research
When: Wednesday, March 18, 2020 - 13:30 to 16:30
Where: TBD
Registration: By invitation only

Pangeo is both a community and an integrated open source software ecosystem. For the past three years, the Pangeo community has been working to promote open, reproducible, and scalable science (read more at https://pangeo.io). Participants in this three hour tutorial will learn how to employ Pangeo and the greater open source scientific Python ecosystem to analyze large geoscientific datasets in the cloud. Particular emphasis will be given to Jupyter, Xarray, Dask, and Intake software libraries for analysis of multidimensional modeling and remote sensing data sets. Participants will familiarize themselves with writing code in Jupyter Notebooks that can be run on clusters running on the Cloud, bypassing a common bottleneck of downloading ever-increasing volumes of remote sensing or modeling data. The tutorial will include both introductory materials and advanced examples used in peer-reviewed research in the fields of oceanography, hydrology, and solid earth geophysics.

Agenda

  • 13:30-14:00: Introduction to Pangeo project, software ecosystem, and data access
  • 14:00-15:30: Hands-on interactive tutorial of Python tools: Xarray, Dask, and Intake.
  • 15:30-16:30: Example scientific workflows with satellite imagery, climate, and hydrologic modeling.


Prerequisites: Familiarity with Python, Numpy

// Passing Parameters to Python Applications

When: Tuesday, March 24, 2020 - 13:00 to 16:00
Where: Building 28, Room E210
Registration Open: Tuesday, March 17, 2020 - 12:00 (noon) through March 23

Command-line arguments and/or configuration files are needed to run many Python science/engineering applications. In this course we will learn (using the Python packages click and configparser) how to write user-friendly command-lines interfaces and create user-editable files to set parameters to drive Python applications. We will first have a quick introduction on how to create decorators in Python.

Prerequisites: Familiarity with Python

REGISTER

*** This class has been cancelled, due to the Coronavirus situation. We hope to reschedule this class for a later date. ***

// Packaging and Deploying your Python Code

When: Tuesday, April 14, 2020 - 13:00 to 16:00
Where: Building 28, Room E210
Registration Open: Tuesday, April 7, 2020 - 12:00 (noon) through April 13

We provide the steps needed to package a Python application. We show how to add the necessary files and structure to create a package, how to build a package, and how to deploy it in your local system and in the Python Package Index.

Prerequisites: Familiarity with Python

REGISTER

Registration opens April 7 at 12-noon.

// Python Beginner Course for Programmers

When: Tuesday, May 12, 2020 - 08:30 to 17:30
Where: Building 28, Room E210
Registration Open: Tuesday, May 5, 2020 - 12:00 (noon) through May 11

This accelerated course is designed for participants who quickly want to learn the basic concepts of the Python language and be able to use Python related tools for their work. In the morning, we will cover the foundations of Python (data types, conditional statements, loops, functions, modules). In the afternoon, we will present various Python tools (netCDF4, h5py, Matplotlib, Pandas, etc.) that are used to read/write files (of different formats), manipulate data and perform visualization.

Prerequisites: Ability to program in another language (C, C++, Fortran, Java, Matlab, IDL, etc.) and knowledge of at least one file format (such as csv, hdf, netcdf). Participants are also expected to be able manipulate a web browser, to open command prompt window or terminal window and edit text files.

REGISTER

Registration opens May 5 at 12-noon.

// Optimizing your Python Application

When: Tuesday, May 19, 2020 - 13:00 to 16:00
Where: Building 28, Room E210
Registration Open: Tuesday, May 12, 2020 - 12:00 (noon) through May 18

This course is meant for Python programmers who want to learn techniques needed to speed up their Python applications. We will first introduce profiling tools and then present programming tips (that are partly within the realm of Python) to consider in order to accelerate loops, perform arithmetic operations, list creations, quickly manipulate Numpy arrays, reduce memory footprint, etc.

Prerequisites: Familiarity with Python programming

REGISTER

Registration opens May 12 at 12-noon.

Past Python Courses


// Python Beginner Course for Programmers

When: Tuesday, September 24, 2019 - 08:30 to 17:30
Where: Building 28, Room E210

This accelerated course is designed for participants who quickly want to learn the basic concepts of the Python language and be able to use Python related tools for their work. In the morning, we will cover the foundations of Python (data types, conditional statements, loops, functions, modules). In the afternoon, we will present various Python tools (netCDF4, h5py, Matplotlib, Pandas, etc.) that are used to read/write files (of different formats), manipulate data and perform visualization.

Prerequisites: Ability to program in another language (C, C++, Fortran, Java, Matlab, IDL, etc.) and knowledge of at least one file format (such as csv, hdf, netcdf). Participants are also expected to be able manipulate a web browser, to open command prompt window or terminal window and edit text files.

Course Agenda

// Introduction to Numpy and SciPy

When: Monday, September 30, 2019 - 13:00 to 16:00
Where: Building 28, Room E210
Registration Open: Tuesday, September 24, 2019 - 12:00 (noon) through Sept. 30

NumPy (Numeric Python) and SciPy (Scientific Python) are add-on modules to Python that provide common mathematical and numerical routines in pre-compiled, fast functions. NumPy supplies basic routines for manipulating large arrays and matrices of numeric data. Numpy is the most basic and a powerful package for working with data in Python. SciPy extends the functionality of NumPy with a substantial collection of useful algorithms, like minimization, Fourier transformation, regression, and other applied mathematical techniques.

In this course, we introduce NumPy and SciPy to allow participants to get familiar with some broad functionalities of the two modules and their use in real applications.

Prerequisites: Python programming language

Course Agenda

// Webinar: Introduction to Intel Distribution for Python

When: Tuesday, October 8, 2019 - 13:00 to 15:00
Where: Online

Achieving native code performance in Python can sound near-to-impossible, however, thanks to the run-time optimization libraries such as Intel MKL and Intel TBB that have made it possible to squeeze most performance out of the hardware to enable high performance in Python applications. In this presentation, we will go over Intel Distribution for Python that has leveraged Intel’s optimized libraries to accelerate AI applications “out-of-the-box” at scale. Topics include:

  • Introduction to accelerated numeric and scientific computing libraries numpy, scipy.
  • Introduction to Intel data analytics library Intel® DAAL and Intel-optimized scikit-learn.

Join Skype Meeting
Join by Phone: +1(916)356-2663 (or your local bridge access #) Choose bridge 5.,,448468685# (Global)
Conference ID: 448468685

A room (Section 1) has been reserved in the Goddard Library (Building 21) for people who want to attend the event as a group.


Click HERE to have access to the slides (use the access code: 0VDe0Sai).

// Special Python Beginner Course for Programmers (GISS employees only)

When: Monday, October 21, 2019 - Tuesday, October 22, 2019
Where: GISS

This accelerated 1.5-day course is designed for participants who quickly want to learn the basic concepts of the Python language and be able to use Python related tools for their work. We will cover the foundations of Python (data types, conditional statements, loops, functions, modules, basic IO) and the fundamental package for scientific computing with Python (Numpy). We will present Python tools to manipulate scientific data format file (netCDF4), perform visualization (Matplotlib and Cartopy). If time permits, we will also make presentations on topics such as Packaging & Deployment and Creating & Maintaining a Python Distribution.

Prerequisites: Ability to program in another language (C, C++, Fortran, Java, Matlab, IDL, etc.) and knowledge of at least one file format (such as csv, hdf, netcdf). Participants are also expected to be able manipulate a web browser, to open command prompt window or terminal window and edit text files.


// Advanced Visualization with Bokeh and Plotly

When: Tuesday, October 29, 2019 - 13:00 to 16:00
Where: Building 28, Room E210
Registration Open: Tuesday, October 22, 2019 - 12:00 (noon) through October 28

Bokeh and Plotly are Python tools for interactive data visualization. They render their graphics using JSON, HTML, and JavaScripts. This facilitates the creation of high-quality custom plots and the build of web-based dashboards and applications. In this module, you will learn how to prepare your data, customize your visualizations, and add interactivity.

Prerequisites: Numpy, Pandas


Course Agenda and Materials



// Manipulating Scientific Data Format Files with Python

When: Tuesday, November 19, 2019 - 13:00 to 16:00
Where: Building 28, Room E210
Registration Open: Tuesday, November 12, 2019 - 12:00 (noon) through November 18

Many scientific applications require the use of scientific format data files (containing metadata and multi-dimensional arrays) such as netCDF and HDF5. This course will train participants how to manipulate such data format files. We learn how to use the Python tools NetCDF4 and h5py to read and create files.

Prerequisites: Numpy and knowledge of the NetCDF and HDF5 file formats.

Course Agenda and Materials

// Python Beginner Course for Programmers

When: Tuesday, January 14, 2020 - 08:30 to 17:30
Where: Building 28, Room E210
Registration Open: Tuesday, January 7, 2020 - 12:00 (noon) through January 13

This accelerated course is designed for participants who quickly want to learn the basic concepts of the Python language and be able to use Python related tools for their work. In the morning, we will cover the foundations of Python (data types, conditional statements, loops, functions, modules). In the afternoon, we will present various Python tools (netCDF4, h5py, Matplotlib, Pandas, etc.) that are used to read/write files (of different formats), manipulate data and perform visualization.

Prerequisites: Ability to program in another language (C, C++, Fortran, Java, Matlab, IDL, etc.) and knowledge of at least one file format (such as csv, hdf, netcdf). Participants are also expected to be able manipulate a web browser, to open command prompt window or terminal window and edit text files.


Course Agenda and Materials


// Data Retrieval with Python

When: Tuesday, January 21, 2020 - 13:00 to 16:00
Where: Building 28, Room E210
Registration Open: Tuesday, January 14, 2020 - 12:00 (noon) through January 20

This course is designed for those who seek to learn how to retrieve remote files/data using utilities such as ftp and wget within the Python framework. We learn how to use the Python modules ftplib, urllib, request, json, etc. to transfer files, download files from the Internet, manipulate JavaScript Object Notation, and perform web scraping. We will first learn how to do serialization and deserialization with pickle.

Prerequisites: Basic concepts of Python programming


Course Agenda and Materials


// Plotting Geolocated Data with Cartopy

When: Tuesday, January 28, 2019 - 13:00 to 16:00
Where: Building 28, Room E210
Registration Open: Tuesday, January 21, 2020 - 12:00 (noon) through January 27

Cartopy is a Python package designed for geospatial data processing in order to produce maps and other geospatial data analyses. The key features of Cartopy are its projection definitions and its ability to transform points, lines, vectors, polygons, and images between those projections. In this course, we learn how to select map projections, do contour plots, select color maps (or create your own color map), create colorbars, etc. We will manipulate NetCDF and shapefile files.

Prerequisites: Numpy, Matplotlib


Course Materials


// Python Coding Standards

When: Tuesday, February 18, 2020 - 13:00 to 16:00
Where: Building 28, Room E210
Registration Open: Tuesday, February 11, 2020 - 12:00 (noon) through February 17

This course is for programmers who want to use Python coding standards to make their applications more readable, maintainable, and sharable. We use the Python Enhancement Proposal 8 (PEP8) document to provide guidelines and best practices on how to write Python code. PEP8 addresses topics such as name conventions, code layout, indentation, comments, etc. At the end of the session, we will apply (using a Python GUI editor) the standard to a simple application.

Prerequisites: Python programming

// Webinar: Intel-optimized Tensorflow and Intel Data Analytics Library

When: Wednesday, February 26, 2020 - 13:00 to 15:00
Where: Online

Intel has been delivering several software optimizations not only for HPC but also for AI workloads. Some of the key libraries include Intel Distribution for Python, Intel-optimized Tensorflow and Intel Data Analytics Library. In the session, we will walk through the basics of these libraries, and how to enable out-of-the-box performance for machine learning and deep learning use cases on Intel Second Generation processor along with live demos. Additionally, we will cover a brief introduction on Intel’s new unified programming model OneAPI, for diverse AI workloads.

AI Center of Excellence Kick-Off Meeting at the Goddard Library (Building 21)

  • 12:00-13:00 Machine Learning community meet-and-greet lunch hour
  • 13:00-15:00 Webinar: Intel-optimized Tensorflow and Intel Data Analytics Library on the Big Screen at the Library
  • 15:00-15:30 Post-webinar discussion: Q&A, discuss collaborative ideas, propose future webinar topics and ideas for activities