How do I run jobs in the queue?
From the login node you can submit jobs using Slurm, which will then run on the batch worker nodes.
Writing a Job Script
The commands you want to run, as well as the queue parameters for your job can be saved as a simple text file (for example, to a file called: jobscript.sh).
Useful job parameters include:
||Queue name (eg, batch, gpu)|
||A short name for this job|
||Number of CPUs allocated to the job
- default: 1
||Memory required for the whole job
- if the job goes over this memory, the job will be killed
- default: 10G per CPU requested
- example values: 750M, 250G, 1T
- most nodes have ~250MB. Larger jobs are allocated to nodes with more memory (up to ~1TB)
||Total time requested for your job
- if the job goes over this time, the job will be killed
- default: 7 days
- example values (dd-hh:mm:ss)
- 0-00:05:00 ie, 5 minutes
- 0-12:00:00 ie, 12 hours
- 2-00:00:00 ie, 2 days
||Filename to send STDOUT
- use "%j & %x" to set jobName and jobID as the filename, eg. "%j_%x.out"
||Filename to send STDERR
- use "%j & %x" to set jobName and jobID as the filename, eg. "%j_%x.err"
||Email address for notifications|
||When to send notifications (eg, begin, end, fail, all)|
A typical job script might look like this:
#!/bin/bash #SBATCH --partition=batch #SBATCH --job-name=bioinf_job
#SBATCH --ntasks=1 #SBATCH --mem=10G #SBATCH --time=00-12:00:00 #SBATCH --output=%j_%x.out #SBATCH --error=%j_%x.err #SBATCH --firstname.lastname@example.org #SBATCH --mail-type=end,fail cd /path/to/your/working/dir module load your_required_module your_command(s)
Submitting a job
Basic job submission is with
'sbatch', so a simple minimal job submission could be just:
$ sbatch ./jobscript.sh
but you can also specify a partition (queue), number of nodes and amount of memory - if you have not already done so inside the script itself - like so:
$ sbatch -p batch --ntasks=1 --mem=10G
The standard nodes have ~250GB of memory and 24 cores, so asking for 120GB of memory means you'll only get two jobs at a time on the node etc.
Viewing the Queue Status
Once your job is submitted, you can monitor its progress by looking at the queue.
The queue status can be accessed using the command,
'squeue', and looks something like this:
Interesting jobs states are
'R' for running and
'PD' for pending. The last column shows either the node the job is running on, or the reason it's not running (yet). In this example
'Priority' are OK - the job is just waiting for either free space to run, or waiting behind a higher priority job, but
'PartitionConfig' means the job has asked for something it can't have (like 25 cores).
To see just you own jobs (and not the whole queue):
$ squeue --me
Cancelling a Job
To stop a job, use
'scancel' and the relevant JOBID
To see a list of the available resources and partitions (queues), use the
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST test up infinite 1 idle cbrgwn000p gpu up infinite 1 mix cbrgwngpu01 batch* up infinite 14 mix cbrgbigmem01p,cbrgbigmem02p,cbrgwn002p,cbrgwn004p,cbrgwn005p,cbrgwn006p,cbrgwn007p, cbrgwn008p,cbrgwn009p,cbrgwn010p,cbrgwn011p,cbrgwn013p,cbrgwn014p,cbrgwn021p batch* up infinite 6 idle cbrgwn012p,cbrgwn015p,cbrgwn017p,cbrgwn019p,cbrgwn022p,cbrgwn023p
There's a handy multi batch system quick reference available at https://slurm.schedmd.com/rosetta.pdf
The Slurm user guide is available at https://slurm.schedmd.com/quickstart.html