// NCCS Task Farming: Tool for Running Independent Tasks in Parallel

NCCS Task Farming (NCCS-TF) is a Python application that allows users to execute independent tasks concurrently across nodes on multicore clusters. The package consists of a set of Python scripts working together through two simple text-based interfaces. NCCS-TF does not require any knowledge of the individual tasks (serial and even parallel) and does not make any assumptions about the underlying applications. As a matter of fact, the tasks to be executed can be from different applications. It can be seen as a task parallelism tool where multiple concurrent independent tasks are executed in parallel.

NCCS-TF consists of two independent front-end Python script through which a user provides a list of tasks to be performed. The front ends are:

  • gpa_tf.py: Relies on GNU Parallel. This is best if you have more independent tasks to run than available cores.
  • srun_tf.py: Relies on native SLURM srun command. This option needs to be considered if you reserve as many cores as the number of independent tasks to executed.

Regardless of the front-end used, NCCS-TF determines the list of nodes reserved by the user and connects to individual nodes to distribute the workload (independent tasks). If tasks are available, each node receives as many of them as it has cores (if the user chooses to employ all the cores within the node).

How to Use NCCS-TF

To use NCCS-TF, users are expected to first create an ASCII execution file where they list (one task per line) the independent tasks to execute in parallel. For instance, the content of such a file can be:

/full/path/to/executable/task1 arg11 arg21
/full/path/to/executable/task2 arg21
/full/path/to/executable/task3
/full/path/to/executable/task4 arg41 arg42 arg43

It is important to provide the full path (the relative path may prevent the tool access the necessary files you need) to each script/executable you plan to run.

Command Line Arguments

The main command line arguments of NCCS-TF are:

  • -x or --exec: Executable file consisting of independent tasks separated by a new line.
  • -n or --cores_per_node: Integer representing the number of cores per node.
  • -m or --multi-prog: Option to utilize SLURM srun --multi-prog feature.

Command Line Examples

/usr/local/other/nccs_taskfarming/nccs_taskfarming/gpa_tf.py -x execution_file [-n num_cores_per_node]
/usr/local/other/nccs_taskfarming/nccs_taskfarming/srun_tf.py -x execution_file [-n num_cores_per_node] [-m Y]

Note that with srun, you need to make sure that the number of available cores is at least equal to the number of tasks to run. For instance, if you reserve 2 nodes and select 15 cores per node, then you are required to have up to 30 tasks (2 times 15) in your execution file. This means that all the tasks are distributed at once. If the "-m Y" command line option is included, then the tool will use "srun --multi_prog".