Large Resource Request

Issue to be resolved


My job requires more time or resources than are currently allowed.

Solution


Email the NCCS at support@nccs.nasa.gov and detail the resources constraint(s) that prevent your job from running. Provide an assessment of the resources, compute (CPU or GPU), RAM, and/or time, that would be required to complete your work. We will likely reach out to discuss your use case so that we can determine how best to meet your needs.

Note, the NCCS employs constraints on jobs for many reasons, key among them:
1. To prevent runaway jobs from consuming all available resources until systems staff can kill the offending job
2.To prevent users from running days long jobs with no restart capability that fail due to an external cause, resulting in the loss of days of compute and requiring the job to be restarted from the beginning
3. To prevent a large set of jobs from dominating the queue, thereby preventing other users from accessing the resources for days at a time

For further information:
Slurm Best Practices on Discover


Category:

Tags: