How to use (FAQ)
From the login node you can submit jobs using Slurm, which will then run on the batch worker nodes.
Writing a Job Script
The commands you want to run, as well as the queue parameters for your job can be saved as a simple text file (for example, to a file called: jobscript.sh).
Useful job parameters include:
||Queue name (eg, batch, gpu)|
||A short name for this job|
||Number of CPUs allocated to the job
- default: 1
||Memory required for the whole job
- if the job goes over this memory, the job will be killed
- default: 10G per CPU requested
- example values: 750M, 250G, 1T
- most nodes have ~250MB. Larger jobs are allocated to nodes with more memory (up to ~1TB)
||Total time requested for your job
- if the job goes over this time, the job will be killed
- default: 7 days
- example values (dd-hh:mm:ss)
- 0-00:05:00 ie, 5 minutes
- 0-12:00:00 ie, 12 hours
- 2-00:00:00 ie, 2 days
||Filename to send STDOUT
- use "%j & %x" to set jobName and jobID as the filename, eg. "%j_%x.out"
||Filename to send STDERR
- use "%j & %x" to set jobName and jobID as the filename, eg. "%j_%x.err"
||Email address for notifications|
||When to send notifications (eg, begin, end, fail, all)|
A typical job script might look like this:
#!/bin/bash #SBATCH --partition=batch #SBATCH --job-name=bioinf_job
#SBATCH --ntasks=1 #SBATCH --mem=10G #SBATCH --time=00-12:00:00 #SBATCH --output=%j_%x.out #SBATCH --error=%j_%x.err #SBATCH --email@example.com #SBATCH --mail-type=end,fail cd /path/to/your/working/dir module load your_required_module your_command(s)
Submitting a job
Basic job submission is with
'sbatch', so a simple minimal job submission could be just:
$ sbatch ./jobscript.sh
but you can also specify a partition (queue), number of nodes and amount of memory - if you have not already done so inside the script itself - like so:
$ sbatch -p batch --ntasks=1 --mem=10G
The standard nodes have ~250GB of memory and 24 cores, so asking for 120GB of memory means you'll only get two jobs at a time on the node etc.
Viewing the Queue Status
Once your job is submitted, you can monitor its progress by looking at the queue.
The queue status can be accessed using the command,
'squeue', and looks something like this:
Interesting jobs states are
'R' for running and
'PD' for pending. The last column shows either the node the job is running on, or the reason it's not running (yet). In this example
'Priority' are OK - the job is just waiting for either free space to run, or waiting behind a higher priority job, but
'PartitionConfig' means the job has asked for something it can't have (like 25 cores).
To see just you own jobs (and not the whole queue):
$ squeue --me
Cancelling a Job
To stop a job, use
'scancel' and the relevant JOBID
To see a list of the available resources and partitions (queues), use the
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST test up infinite 1 idle cbrgwn000p gpu up infinite 1 mix cbrgwngpu01 batch* up infinite 14 mix cbrgbigmem01p,cbrgbigmem02p,cbrgwn002p,cbrgwn004p,cbrgwn005p,cbrgwn006p,cbrgwn007p, cbrgwn008p,cbrgwn009p,cbrgwn010p,cbrgwn011p,cbrgwn013p,cbrgwn014p,cbrgwn021p batch* up infinite 6 idle cbrgwn012p,cbrgwn015p,cbrgwn017p,cbrgwn019p,cbrgwn022p,cbrgwn023p
There's a handy multi batch system quick reference available at https://slurm.schedmd.com/rosetta.pdf
The Slurm user guide is available at https://slurm.schedmd.com/quickstart.html
Once you know how to submit jobs to the cluster using Slurm, the trickiest thing to work out is how to get your science done as effectively as possible. This isn’t a trivial task – firstly you need to know what your jobs are doing so that you can understand where the bottlenecks are and then you need to figure out how to eliminate those bottlenecks. Thankfully, we provide a simple, powerful tool to help you do just that.
All jobs on our cluster are automatically run through a lightweight job profiler. Job profiling is a technique in which a small helper program tracks how much time, CPU and memory your job uses and provides statistics when it completes. You’ll find these statistics automatically added to the end of your job output. Using this, you can submit a single job with your best guesses, look at the output, and then have a reasonable idea of what to use for subsequent jobs of a similar type.
Slurm job profiling - an introductory guide
Once you understand the output, you can move on to optimising your jobs for time, CPU and memory. This is something that varies depending on the programs you’re using and the data they’re processing, so you’ll probably end up being your own expert in the long run. At the same time, we've been able to put together a set of general guidelines (see: Slurm top tips) that you can apply to get your work done as speedily as possible whilst also making sure that you don't accidentally ask for resources that don't get used. These should make a good starting point.
Many common programs are pre-loaded on the CCB system, but some need to be loaded before you use them. You can use the module utility to do this.
module is a utility, which is used to manage the working environment in preparation for running applications. By loading the module for a certain installed application, the environment variables that are relevant for that application are automatically defined or modified.
Display the modules available on CCB by typing:
You should see (something like) the following:
Load a module by typing:
module add module-name
module add bowtie
The above command will load the default version of an application. To load a particular version of an application use:
module add module-name/version
module add bowtie2/2.3.2
Modules that are already loaded by users can be displayed with the command:
A module can be "unloaded" with the commands:
rm, for example:
module unload bowtie
module rm bowtie
You can display the help for module by typing:
You should see the following:
If you have any queries on how to use this utility please email genmail.
If you just want to get up and running with our curated set of commonly used bioinformatics packages, you can do so with a single command:
$ module load python-cbrg
$ module load R-cbrg
The setup uses the following system:
R-basecontain fixed, unchanging installations of the base languages. This is for safety – they cannot be accidentally overwritten causing unexpected changes of behaviour.
R-cbrgcontain separate package and library repositories for each version of Python and R. Because packages and library versions also change over time, we take a snapshot of the state on a monthly basis and then lock this to prevent changes causing unexpected behaviour. A single current version for each provides a continual rolling ‘head’ where changes are applied.
R-cbrg module will automatically pull in the latest stable base and all packages or libraries:
$ module load python-cbrg
Loading requirement: python-base/3.8.3
$ module list
Currently Loaded Modulefiles:
1) python-base/3.8.3(default) 2) python-cbrg/current(default)
However, if you want to use a different version of the base, you can do that by loading it manually first:
$ module load python-base/3.6.10
$ module load python-cbrg
$ module list
Currently Loaded Modulefiles:
1) python-base/3.6.10 2) python-cbrg/current(default)
To find information concerning your quota, type
You should see something resembling:
user/group || size || chunk files name | id || used | hard || used | hard --------------|------||------------|------------||---------|--------- jbsmith| 0001|| 1.69 TiB | 3.00 TiB || 8268| 31876689
The key information is contained in the middle two columns - this indicates a 3TB total quota, of which the user has so far used 1.69TB.
Files can be transferred to and from the server using Filezilla.
Download FileZilla from: https://filezilla-project.org
Scenario 1: moving files between your local machine (PC or Mac) and cbrglogin1.
Once installed, enter the following information to establish a 'Quick Connection' to the server, cbrglogin1:
[your CCB username]
[your CCB password]
Click: to establish the connection to cbrglogin1.
The files in your home directory on cbrglogin1 will be displayed in the lower right panel of FileZilla. These can be dragged to the left panel in order to copy files from cbrglogin1 to your local machine. Similarly, dragging files from the left panel to the right panel will copy them from your local machine to a directory on cbrglogin1.
Scenario 2: moving files between cbrglogin1 and a third-party file server.
FileZilla is already installed on cbrglogin1 and can therefore be used to transfer files between the CCB server and any other remote file server (for example, to transfer fastq files from a sequencing facility).
- Open a connection to cbrglogin1 using RDP
- Click the 'Applications' menu / click 'Internet' / click 'FileZilla'
Alternatively, FileZilla can be opened from a terminal within RDP:
- Type on the command line:
Once FileZilla is open, enter the connection details you have been given for the remote server in the Quickconnect settings as above.
iGenomes is a collection of reference sequences and annotation files for commonly analyzed organisms. The files were originally generated by Illumina
The files have been downloaded from Ensembl, NCBI, or UCSC, and chromosome names have been changed to be simple and consistent with their download source.
On the CCB servers these files can be found at:
The Ensembl and UCSC annotation and sequence files for the following organisms are available under these directories:
In these directories you will find sub-directories for Ensembl and UCSC annotation and sequence for different builds, for example
Indices for Bowtie, Bowtie2 & BWA, and fastq format files of sequence are all in the Sequence directory:
Annotation files eg gtf files can be found in the Annotation directory:
If you have any queries on how to use this utility please email genmail
Using BaseMount it is possible to mount your BaseSpace account on the CCB file system. You can find full details how to do this at the BaseSpace HelpCenter.
The BaseMount application is installed on the CCB servers so watch the tutorial videos from Step 3. First Time Launch and onwards or go straight to Mounting Your BaseSpace Account on the same page.
Once mounted, you can copy the files you want to the directory of your choice. Since you only have 1TB space in BaseSpace available, it is recommended you do this after each run to free up space otherwise you may be charged for excessive BaseSpace usage.
Unix is a command line environment, which means that primarily you have to enter commands on the keyboard instead of using point-and-click mouse.
Unix will only do something if you tell it to by giving it a command.
The basic structure of a Unix command is:
command [options...] [arguments...]
options (also called flags) are generally single characters which in some way modify the action of the command. There may be no options or there may be several acting on the same command.
Options are preceded by a hyphen character ( - ) but there is no consistent rule among Unix commands as to how options should be grouped. Some commands allow a list of options with just a single hyphen at the beginning of the list (e.g. -apfg). Other commands require that each option is introduced by its own hyphen (e.g. -a -p -f -g)
Some options allow a value, often a filename, to be given following the option. Again, there is no consistent manner in which this is allowed, with some options requiring the value to be placed immediately following the option letter, while others expecting a space between the option letter and the value.
Unix is case-sensitive throughout. The exact combination of upper and lower case letters used in a command, option or filename is important.
For example, the options
-P in the same command will have different meanings.
There are man pages associated with (almost) every Unix command (man stands for manual)
To read a man page simply type man command e.g. to learn about the Unix command "ls", we could type:
This will bring up all sorts of information associated with that command. Much of it may not be of any interest to you but it will tell you what the command does, what flags are available and will give an example (ususally near the bottom)
If you find that the man pages scroll up so you cannot read them then remember to use the pipe symbol ("|") and "pipe through less" - this will allow you to scroll with the up-down arrows keys e.g.
man ls | less
There are a few core Unix commands that are used routinely - after a while you find that these come as second nature and you no longer have to think "What's the command for that?". These common commands involve moving around your account, creating, copying and deleting files and directories. It is a good idea for the novice Unix user to keep a list of common commands at hand until you have learnt them:
There are many tricks to saving time on the command line and cutting down the number of keystrokes. Here are just a few.
move to home directory
move up one directory
move up two directories
move back to the directory I was just in
move up one directory and then down to another_dir
move to home directory and the down to another_dir
Use the Tab button to fill in filenames
When typing on the command line you can use the keyboard tab button to fill in the rest of a filename for you. Examples:
- If I had a file called humamph1.tfa and I had no other file beginning with "hum" in the directory I am in presently, I can type:
and then press the tab button. As there are no other files beginning with "hum" the machine knows the name of the rest of the filename and fills it in... giving me:
- If I have two files called humamph1.tfa and humamph2.tfa. If I type:
and press the tab button, the machine will fill in up to the point that it knows e.g.
It then stops because it does not know whether I want 1 or 2.
If I then type 2 and press tab again it will finish off the file name for me....
Unix does not understand spaces, dots or slashes in filenames! If you have filenames with spaces in them, then you are advised to change the name of the file to something without spaces. To do this, you need to use the
mv command, and surround your filename containing spaces with quotation marks. e.g.
mv "a filename with spaces" a_filename_without_spaces
Files that start with a dot
These files are usually special files and are hidden, i.e. they are not listed when you use the
ls command. To see hidden files in your directory use the
-a flag with the
ls command (
ls -a). These files are usually important and it is a good idea to leave these files alone unless you are sure of what you are doing. If you have any doubt please contact CCB before editing or deleting any of these dot files. Examples of dot files include:
You can imagine a Unix machine as a large file folder containing other folders and documents. Analogous to real filing systems, these folders can contain both documents and other folders. The Unix word for folder is "directory" and the term for document is "file".
A diagram of they way files are stored on a Unix system looks like a tree. The bottom of the tree is called root and is represented by a forward slash
For the machine to be able to find different documents (files) or directories (file folders), it sometimes has to be told explicitly where they are with reference to the bottom of the tree. i.e. root.
So, for example, the full path to the directory called "sue" is
/ usr users user1 sue
That is, first you start at root (
/), then go through the directory usr, then the directory users, then the directory user1, and then you find the directory sue.
Unix doesn't like spaces though, so to describe this full path on a Unix machine, you separate each term with a forward slash (/). So, the full path to "sue" becomes:
If you don't know the full path to where you are in your account, just type
pwd (Present Working Directory) on the command line. This returns the full path to the directory from which you typed the command.
There are various ways to view a text file. To view the file contents page by page, use the command
To view graphical files, except for postscript files, try using the command
To view a postscript file, try the command
To remove a file, you need to use the
rm (Remove) command. For example:
You will then be prompted to see if you really want to delete this file.
To delete a number of files that all have something in common, you can employ "wildcards". For example, to delete all files that end in the letters "seq", you could type:
You will then be prompted for each file individually to see if you really want to delete it.
If you have a large number of files to delete, and you are SURE you want to delete them, you can use a backslash before the
rm command. This removes the safety feature of being prompted about each file to be deleted (be careful using this option!!). e.g.
If you wish to remove a directory that contains no files in it, you can use the
rmdir command e.g.
If you wish to remove a directory containing files, and remove all files within that directory, you must use the
-rf flag with the
rm command e.g.
rm -rf directoryname
You will then be presented with the name of each file inside the directory, and asked to confirm you wish to delete it.
This can be quite tedious. So if you are really SURE you wish to delete this directory and its contents, you can forgo this safetly mechanism by preceding the command with a backslash (be VERY careful using this option!!). e.g.
\rm -rf directoryname
This is done using the command
mv old_filename new_filename
mv old_directoryname new_directoryname
If the file called "new_filename" already exists, you will be asked whether you really want to overwrite it.Editing text files
If you need to edit a text file, there are a number of text editors on our system. A user friendly, GraphicalUserInterface-based editor is nedit. Type:
...and type some text in the window that appears. You can cut and paste this text and you can save it in whichever folder you like (under your account space) by using the save as option from the file drop down menu.
This editor also lets you open any other plain text files for editing. It does not open word documents, as these are full of binary code. We also have other text editors, such as pico (simple GUI with commands), vi (entirly command line driven), and xemacs (probably best left to programmers).