Cookies on this website
We use cookies to ensure that we give you the best experience on our website. If you click 'Continue' we'll assume that you are happy to receive all cookies and you won't see this message again. Click 'Find out more' for information on how to change your cookie settings.

About Slurm

From the login node you can submit jobs using Slurm, which will then run on the batch worker nodes.

To get you started, to see a list of the available resources and partitions (queues), use the 'sinfo' command:

PARTITION AVAIL TIMELIMIT NODES STATE NODELIST 
test      up    infinite  1     idle  cbrgwn000p 
gpu       up    infinite  1     mix   cbrgwngpu01 
batch*    up    infinite  14    mix   cbrgbigmem01p,cbrgbigmem02p,cbrgwn002p,cbrgwn004p,cbrgwn005p,cbrgwn006p,cbrgwn007p,
                                        cbrgwn008p,cbrgwn009p,cbrgwn010p,cbrgwn011p,cbrgwn013p,cbrgwn014p,cbrgwn021p
batch*    up    infinite  6     idle  cbrgwn012p,cbrgwn015p,cbrgwn017p,cbrgwn019p,cbrgwn022p,cbrgwn023p

 

  

Writing a Job Script

The commands you want to run, as well as the queue parameters for your job can be saved as a simple text file (for example, to a file called: jobscript.sh).

Useful job parameters include: 

--partition Queue name (eg, batch, gpu)
--job-name A short name for this job
--nodes Number of nodes required
- default: 1
--mem Memory required for the whole job
- if the job goes over this memory, the job will be killed
- default: 30M (ie 30MB)
- example values: 750M, 250G, 1T
- most nodes have ~250MB. Larger jobs are allocated to nodes with more memory (up to ~1TB)
--time Total time requested for your job
- if the job goes over this time, the job will be killed
- default: 7 days
- example values (dd-hh:mm:ss)
   - 0-00:05:00 ie, 5 minutes
   - 0-12:00:00 ie, 12 hours
   - 2-00:00:00 ie, 2 days
--output Filename to send STDOUT
   - use "%j & %x" to set jobName and jobID as the filename, eg. "%j_%x.out"
--error Filename to send STDERR
   - use "%j & %x" to set jobName and jobID as the filename, eg. "%j_%x.err"
--mail-user Email address for notifications
--mail-type When to send notifications (eg, begin, end, fail, all)

 

A typical job script might look like this:
#!/bin/bash
#SBATCH --partition=batch
#SBATCH --job-name=bioinf_job
#SBATCH --nodes=1
#SBATCH --mem=128G
#SBATCH --time=00-12:00:00
#SBATCH --output=%j_%x.out
#SBATCH --error=%j_%x.err
#SBATCH --mail-user=your_email_address@gmail.com
#SBATCH --mail-type=end,fail

cd /path/to/your/working/dir

module load your_required_module

your_command(s)


Submitting a job

Basic job submission is with 'sbatch', so a simple minimal job submission could be just:

$ sbatch ./jobscript.sh

but you can also specify a partition (queue), number of nodes and amount of memory - if you have not already done so inside the script itself - like so:

$ sbatch -p batch -n 12 --mem=120G ./jobscript.sh

The standard nodes have ~250GB of memory and 24 cores, so asking for 120GB of memory means you'll only get two jobs at a time on the node etc. 

 

Viewing the Queue Status

Once your job is submitted, you can monitor its progress by looking at the queue. 

The queue status can be accessed using the command, 'squeue', and looks something like this:

 squeue_cmd.png

Interesting jobs states are 'R' for running and 'PD' for pending. The last column shows either the node the job is running on, or the reason it's not running (yet). In this example 'Resources' and 'Priority' are OK - the job is just waiting for either free space to run, or waiting behind a higher priority job, but 'PartitionConfig' means the job has asked for something it can't have (like 25 cores).

 

Cancelling a Job

To stop a job, use 'scancel' and the relevant JOBID
scancel 342

 

Further Information

There's a handy multi batch system quick reference available at https://slurm.schedmd.com/rosetta.pdf

The Slurm user guide is available at https://slurm.schedmd.com/quickstart.html

 


 

On deva (note: soon to be retired)

The CCB servers use the GridEngine batch-queuing system for distributed resource management. If you intend to running any jobs using NGS data, or any other long running job, you should submit your job to the queue.

In brief, to submit a job you need to use a shell script. First copy the default script to the directory from which you want to run the job:
cp /package/cbrg/templates/qsub.sh .

Edit this file:

  • change the username to your CCB username
  • add in the command you want to run

To submit your job to the queue type:
qsub qsub.sh

To see the status of the queue, type:
qstat

Click for a full explanation of how to run jobs in the queue (pdf).