How do I run jobs in the queue?
From the login node you can submit jobs using Slurm, which will then run on the batch worker nodes.
To get you started, to see a list of the available resources and partitions (queues), use the
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST test up infinite 1 idle cbrgwn000p gpu up infinite 1 mix cbrgwngpu01 batch* up infinite 14 mix cbrgbigmem01p,cbrgbigmem02p,cbrgwn002p,cbrgwn004p,cbrgwn005p,cbrgwn006p,cbrgwn007p, cbrgwn008p,cbrgwn009p,cbrgwn010p,cbrgwn011p,cbrgwn013p,cbrgwn014p,cbrgwn021p batch* up infinite 6 idle cbrgwn012p,cbrgwn015p,cbrgwn017p,cbrgwn019p,cbrgwn022p,cbrgwn023p
Writing a Job Script
The commands you want to run, as well as the queue parameters for your job can be saved as a simple text file (for example, to a file called: jobscript.sh).
Useful job parameters include:
||Queue name (eg, batch, gpu)|
||A short name for this job|
Number of CPUs allocated per task
- default: 1
Number of nodes required
- default: 1
Memory required for the whole job
- if the job goes over this memory, the job will be killed
- default: 30M (ie 30MB)
- example values: 750M, 250G, 1T
- most nodes have ~250MB. Larger jobs are allocated to nodes with more memory (up to ~1TB)
Total time requested for your job
- if the job goes over this time, the job will be killed
- default: 7 days
- example values (dd-hh:mm:ss)
- 0-00:05:00 ie, 5 minutes
- 0-12:00:00 ie, 12 hours
- 2-00:00:00 ie, 2 days
Filename to send STDOUT
- use "%j & %x" to set jobName and jobID as the filename, eg. "%j_%x.out"
Filename to send STDERR
- use "%j & %x" to set jobName and jobID as the filename, eg. "%j_%x.err"
||Email address for notifications|
||When to send notifications (eg, begin, end, fail, all)|
A typical job script might look like this:
#!/bin/bash #SBATCH --partition=batch #SBATCH --job-name=bioinf_job
#SBATCH --cpus-per-task=1 #SBATCH --nodes=1 #SBATCH --mem=128G #SBATCH --time=00-12:00:00 #SBATCH --output=%j_%x.out #SBATCH --error=%j_%x.err #SBATCH --email@example.com #SBATCH --mail-type=end,fail cd /path/to/your/working/dir module load your_required_module your_command(s)
Submitting a job
Basic job submission is with
'sbatch', so a simple minimal job submission could be just:
$ sbatch ./jobscript.sh
but you can also specify a partition (queue), number of nodes and amount of memory - if you have not already done so inside the script itself - like so:
$ sbatch -p batch --cpus-per-task=
1 --mem=120G ./jobscript.sh
The standard nodes have ~250GB of memory and 24 cores, so asking for 120GB of memory means you'll only get two jobs at a time on the node etc.
Viewing the Queue Status
Once your job is submitted, you can monitor its progress by looking at the queue.
The queue status can be accessed using the command,
'squeue', and looks something like this:
Interesting jobs states are
'R' for running and
'PD' for pending. The last column shows either the node the job is running on, or the reason it's not running (yet). In this example
'Priority' are OK - the job is just waiting for either free space to run, or waiting behind a higher priority job, but
'PartitionConfig' means the job has asked for something it can't have (like 25 cores).
Cancelling a Job
To stop a job, use
'scancel' and the relevant JOBID
There's a handy multi batch system quick reference available at https://slurm.schedmd.com/rosetta.pdf
The Slurm user guide is available at https://slurm.schedmd.com/quickstart.html
On deva (note: soon to be retired)
The CCB servers use the GridEngine batch-queuing system for distributed resource management. If you intend to running any jobs using NGS data, or any other long running job, you should submit your job to the queue.
In brief, to submit a job you need to use a shell script. First copy the default script to the directory from which you want to run the job:
cp /package/cbrg/templates/qsub.sh .
Edit this file:
- change the username to your CCB username
- add in the command you want to run
To submit your job to the queue type:
To see the status of the queue, type:
Click for a full explanation of how to run jobs in the queue (pdf).