Cookies on this website
We use cookies to ensure that we give you the best experience on our website. If you click 'Continue' we'll assume that you are happy to receive all cookies and you won't see this message again. Click 'Find out more' for information on how to change your cookie settings.

About Slurm

From the login node you can submit jobs using Slurm, which will then run on the batch worker nodes.

To get you started, to see a list of the available resources and partitions (queues), use the 'sinfo' command:

PARTITION AVAIL TIMELIMIT NODES STATE NODELIST 
test      up    infinite  1     idle  cbrgwn000p 
gpu       up    infinite  1     mix   cbrgwngpu01 
batch*    up    infinite  14    mix   cbrgbigmem01p,cbrgbigmem02p,cbrgwn002p,cbrgwn004p,cbrgwn005p,cbrgwn006p,cbrgwn007p,
                                        cbrgwn008p,cbrgwn009p,cbrgwn010p,cbrgwn011p,cbrgwn013p,cbrgwn014p,cbrgwn021p
batch*    up    infinite  6     idle  cbrgwn012p,cbrgwn015p,cbrgwn017p,cbrgwn019p,cbrgwn022p,cbrgwn023p

 

  

Writing a Job Script

The commands you want to run, as well as the queue parameters for your job can be saved as a simple text file (for example, to a file called: jobscript.sh).

Useful job parameters include: 

--partition Queue name (eg, batch, gpu)
--job-name A short name for this job
--nodes Number of nodes required
- default: 1
--mem Memory required for the whole job
- if the job goes over this memory, the job will be killed
- default: 30M (ie 30MB)
- example values: 750M, 250G, 1T
- most nodes have ~250MB. Larger jobs are allocated to nodes with more memory (up to ~1TB)
--time Total time requested for your job
- if the job goes over this time, the job will be killed
- default: 7 days
- example values (dd-hh:mm:ss)
   - 0-00:05:00 ie, 5 minutes
   - 0-12:00:00 ie, 12 hours
   - 2-00:00:00 ie, 2 days
--output Filename to send STDOUT
   - use "%j & %x" to set jobName and jobID as the filename, eg. "%j_%x.out"
--error Filename to send STDERR
   - use "%j & %x" to set jobName and jobID as the filename, eg. "%j_%x.err"
--mail-user Email address for notifications
--mail-type When to send notifications (eg, begin, end, fail, all)

 

A typical job script might look like this:
#!/bin/bash
#SBATCH --partition=batch
#SBATCH --job-name=bioinf_job
#SBATCH --nodes=1
#SBATCH --mem=128G
#SBATCH --time=00-12:00:00
#SBATCH --output=%j_%x.out
#SBATCH --error=%j_%x.err
#SBATCH --mail-user=your_email_address@gmail.com
#SBATCH --mail-type=end,fail

cd /path/to/your/working/dir

module load your_required_module

your_command(s)


Submitting a job

Basic job submission is with 'sbatch', so a simple minimal job submission could be just:

$ sbatch ./jobscript.sh

but you can also specify a partition (queue), number of nodes and amount of memory - if you have not already done so inside the script itself - like so:

$ sbatch -p batch -n 12 --mem=120G ./jobscript.sh

The standard nodes have ~250GB of memory and 24 cores, so asking for 120GB of memory means you'll only get two jobs at a time on the node etc. 

 

Viewing the Queue Status

Once your job is submitted, you can monitor its progress by looking at the queue. 

The queue status can be accessed using the command, 'squeue', and looks something like this:

 squeue_cmd.png

Interesting jobs states are 'R' for running and 'PD' for pending. The last column shows either the node the job is running on, or the reason it's not running (yet). In this example 'Resources' and 'Priority' are OK - the job is just waiting for either free space to run, or waiting behind a higher priority job, but 'PartitionConfig' means the job has asked for something it can't have (like 25 cores).

 

Cancelling a Job

To stop a job, use 'scancel' and the relevant JOBID
scancel 342

 

Further Information

There's a handy multi batch system quick reference available at https://slurm.schedmd.com/rosetta.pdf

The Slurm user guide is available at https://slurm.schedmd.com/quickstart.html

 


 

On deva (note: soon to be retired)

The CCB servers use the GridEngine batch-queuing system for distributed resource management. If you intend to running any jobs using NGS data, or any other long running job, you should submit your job to the queue.

In brief, to submit a job you need to use a shell script. First copy the default script to the directory from which you want to run the job:
cp /package/cbrg/templates/qsub.sh .

Edit this file:

  • change the username to your CCB username
  • add in the command you want to run

To submit your job to the queue type:
qsub qsub.sh

To see the status of the queue, type:
qstat

Click for a full explanation of how to run jobs in the queue (pdf).

Read more...

Many common programs are pre-loaded on the CCB system, but some need to be loaded before you use them. You can use the module utility to do this.

module is a utility, which is used to manage the working environment in preparation for running applications. By loading the module for a certain installed application, the environment variables that are relevant for that application are automatically defined or modified.

Display the modules available on CCB by typing:
module avail

You should see (something like) the following:
module_avail.PNG

Load a module by typing:
module add module-name
For example...
module add bowtie

The above command will load the default version of an application. To load a particular version of an application use:
module add module-name/version
For example...
module add bowtie2/2.3.2

Modules that are already loaded by users can be displayed with the command:
module list

A module can be "unloaded" with the commands: unload or rm, for example:
module unload bowtie
..or:
module rm bowtie

You can display the help for module by typing:
module help
You should see the following:
module_help.PNG

If you have any queries on how to use this utility please email genmail.

Read more...

BASIC USAGE

If you just want to get up and running with our curated set of commonly used bioinformatics packages, you can do so with a single command:
$ module load python-cbrg
or
$ module load R-cbrg

Note that the Spyder IDE is included in the standard Python installations, and that R-Studio is a separate module.
To see all modules available for use, use the command:
$ module avail

 

ADVANCED USAGE

The setup uses the following system:

  • python-base and R-base contain fixed, unchanging installations of the base languages. This is for safety – they cannot be accidentally overwritten causing unexpected changes of behaviour.
  • python-cbrg and R-cbrg contain separate package and library repositories for each version of Python and R. Because packages and library versions also change over time, we take a snapshot of the state on a monthly basis and then lock this to prevent changes causing unexpected behaviour. A single current version for each provides a continual rolling ‘head’ where changes are applied.

 

Loading the python-cbrg or R-cbrg module will automatically pull in the latest stable base and all packages or libraries:
$ module load python-cbrg
Loading python-cbrg/current
  Loading requirement: python-base/3.8.3

$ module list
Currently Loaded Modulefiles:

  1) python-base/3.8.3(default) 2) python-cbrg/current(default)

 

However, if you want to use a different version of the base, you can do that by loading it manually first:
$ module load python-base/3.6.10
$ module load python-cbrg

$ module list
Currently Loaded Modulefiles:
   1) python-base/3.6.10 2) python-cbrg/current(default)

Read more...

To find information concerning your quota, type
getquota.

You should see something resembling:

       user/group     ||           size          ||    chunk files    
      name     |  id  ||    used    |    hard    ||  used   |  hard   
 --------------|------||------------|------------||---------|---------
        jbsmith|  0001||   1.69 TiB |   3.00 TiB ||     8268| 31876689
          

The key information is contained in the middle two columns - this indicates a 3TB total quota, of which the user has so far used 1.69TB.

Read more...

Option 1. - OxFile
OxFile is an easy way for members of Oxford University to share files with other members, and with people outside the University. Individual files up to 25Gb in size may be uploaded and kept on OxFile for up to 30 days.
OxFile can be found at: https://oxfile.ox.ac.uk/oxfile/

 

Option 2. - public space at CCB
We have public directories that are available on the web, so any file placed in:
/public/username/
...will be available online, at: http://userweb.molbiol.ox.ac.uk/public/username/

For example, if the user, "manager" wanted to share a file called example.txt, it could be placed on the system at:
/public/manager/example.txt
...and be available online at:
http://userweb.molbiol.ox.ac.uk/public/manager/example.txt

Important - rather than copying files into the public directory, we generally recommend making symbolic links to avoid duplicating the file contents. This is particularly useful for large files, such as BAM and BigWig files, when used to display tracks in the UCSC genome browser

Public directories are not made automatically when your account is created - if you need access to a public directory, please email CCB to request the creation of a public directory for your account.

We do not have directory indexing enabled so you cannot automatically see a list of available files on the web. If you want people to be able to download multiple files they will need a list of them all, perhaps as an extra file in your public space.

Read more...

Files can be transferred to and from the server using Filezilla. 

Download FileZilla from: https://filezilla-project.org

 

Scenario 1: moving files between your local machine (PC or Mac) and cbrglogin1.

Once installed, enter the following information to establish a 'Quick Connection' to the server, cbrglogin1:

filezilla_cbrglogin1.PNG

  • Host: cbrglogin1.molbiol.ox.ac.uk
  • Username: [your CCB username]
  • Password [your CCB password]
  • Port: 22

Click: filezilla_2.png to establish the connection to cbrglogin1.

The files in your home directory on cbrglogin1 will be displayed in the lower right panel of FileZilla. These can be dragged to the left panel in order to copy files from cbrglogin1 to your local machine. Similarly, dragging files from the left panel to the right panel will copy them from your local machine to a directory on cbrglogin1.

 

 Scenario 2: moving files between cbrglogin1 and a third-party file server.

FileZilla is already installed on cbrglogin1 and can therefore be used to transfer files between the CCB server and any other remote file server (for example, to transfer fastq files from a sequencing facility).

  • Open a connection to cbrglogin1 using RDP
  • Click the 'Applications' menu  /  click 'Internet'  /  click 'FileZilla'

filezilla_3.png

 

Alternatively, FileZilla can be opened from a terminal within RDP:

  • Type on the command line: filezilla

 Once FileZilla is open, enter the connection details you have been given for the remote server in the Quickconnect settings as above.

Read more...

iGenomes is a collection of reference sequences and annotation files for commonly analyzed organisms. The files were originally generated by Illumina

The files have been downloaded from Ensembl, NCBI, or UCSC, and chromosome names have been changed to be simple and consistent with their download source.

On the CCB servers these files can be found at:
/databank/igenomes/

The Ensembl and UCSC annotation and sequence files for the following organisms are available under these directories:

 

Human /databank/igenomes/Homo_sapiens/
Mouse /databank/igenomes/Mus_musculus/
Rat /databank/igenomes/Rattus_norvegicus/
Zebrafish /databank/igenomes/Danio_rerio/
Drosophila /databank/igenomes/Drosophila_melanogaster/
Chicken /databank/igenomes/Gallus_gallus/
Pig /databank/igenomes/Sus_scrofa/
Chimpanzee /databank/igenomes/Pan_troglodytes/

In these directories you will find sub-directories for Ensembl and UCSC annotation and sequence for different builds, for example
/databank/igenomes/Homo_sapiens/UCSC/hg19/Annotation
/databank/igenomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa

Indices for Bowtie, Bowtie2 & BWA, and fastq format files of sequence are all in the Sequence directory:
/databank/igenomes/Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index
/databank/igenomes/Homo_sapiens/UCSC/hg19/Sequence/BowtieIndex
/databank/igenomes/Homo_sapiens/UCSC/hg19/Sequence/BWAIndex

Annotation files eg gtf files can be found in the Annotation directory:
/databank/igenomes/Homo_sapiens/UCSC/hg19/Annotation/Genes/genes.gtf

If you have any queries on how to use this utility please email genmail

Read more...

Using BaseMount it is possible to mount your BaseSpace account on the CCB file system. You can find full details how to do this at the BaseSpace HelpCenter.

The BaseMount application is installed on the CCB servers so watch the tutorial videos from Step 3. First Time Launch and onwards or go straight to Mounting Your BaseSpace Account on the same page.

Once mounted, you can copy the files you want to the directory of your choice. Since you only have 1TB space in BaseSpace available, it is recommended you do this after each run to free up space otherwise you may be charged for excessive BaseSpace usage.

Read more...

The Gene Expression Omnibus (GEO) is a public functional genomics data repository supporting MIAME-compliant data submissions.

Click for illustrated instructions on how to download fastq format data from GEO (pdf) using a GEO accession code.

 

Read more...

Unix is a command line environment, which means that primarily you have to enter commands on the keyboard instead of using point-and-click mouse.

Unix will only do something if you tell it to by giving it a command.

The basic structure of a Unix command is:

command [options...] [arguments...]

Read more...

options (also called flags) are generally single characters which in some way modify the action of the command. There may be no options or there may be several acting on the same command.

Options are preceded by a hyphen character ( - ) but there is no consistent rule among Unix commands as to how options should be grouped. Some commands allow a list of options with just a single hyphen at the beginning of the list (e.g. -apfg). Other commands require that each option is introduced by its own hyphen (e.g. -a -p -f -g)

Some options allow a value, often a filename, to be given following the option. Again, there is no consistent manner in which this is allowed, with some options requiring the value to be placed immediately following the option letter, while others expecting a space between the option letter and the value.

Unix is case-sensitive throughout. The exact combination of upper and lower case letters used in a command, option or filename is important.
For example, the options -p and -P in the same command will have different meanings.

 

Read more...

There are man pages associated with (almost) every Unix command (man stands for manual)

To read a man page simply type man command e.g. to learn about the Unix command "ls", we could type:
man ls

This will bring up all sorts of information associated with that command. Much of it may not be of any interest to you but it will tell you what the command does, what flags are available and will give an example (ususally near the bottom)

If you find that the man pages scroll up so you cannot read them then remember to use the pipe symbol ("|") and "pipe through less" - this will allow you to scroll with the up-down arrows keys e.g.
man ls | less

 

Read more...

There are a few core Unix commands that are used routinely - after a while you find that these come as second nature and you no longer have to think "What's the command for that?". These common commands involve moving around your account, creating, copying and deleting files and directories. It is a good idea for the novice Unix user to keep a list of common commands at hand until you have learnt them:

http://mally.stanford.edu/~sr/computing/basic-unix.html

 

There are many tricks to saving time on the command line and cutting down the number of keystrokes. Here are just a few.

Moving around:

cd

move to home directory

cd ..

move up one directory

cd ../..

move up two directories

cd -

move back to the directory I was just in

cd ../another_dir

move up one directory and then down to another_dir

cd ~/another_dir

move to home directory and the down to another_dir

Use the Tab button to fill in filenames

When typing on the command line you can use the keyboard tab button to fill in the rest of a filename for you. Examples:

  1. If I had a file called humamph1.tfa and I had no other file beginning with "hum" in the directory I am in presently, I can type:
    less hum
    and then press the tab button. As there are no other files beginning with "hum" the machine knows the name of the rest of the filename and fills it in... giving me:
    less humamph1.tfa
  2. If I have two files called humamph1.tfa and humamph2.tfa. If I type:
    less hum
    and press the tab button, the machine will fill in up to the point that it knows e.g.
    less humamph
    It then stops because it does not know whether I want 1 or 2.
    If I then type 2 and press tab again it will finish off the file name for me....
    less humamph2.tfa

Read more...