How do I optimise my cluster jobs to speed up my research?
Slurm job profiling and top tips.
Once you know how to submit jobs to the cluster using Slurm, the trickiest thing to work out is how to get your science done as effectively as possible. This isn’t a trivial task – firstly you need to know what your jobs are doing so that you can understand where the bottlenecks are and then you need to figure out how to eliminate those bottlenecks. Thankfully, we provide a simple, powerful tool to help you do just that.
All jobs on our cluster are automatically run through a lightweight job profiler. Job profiling is a technique in which a small helper program tracks how much time, CPU and memory your job uses and provides statistics when it completes. You’ll find these statistics automatically added to the end of your job output. Using this, you can submit a single job with your best guesses, look at the output, and then have a reasonable idea of what to use for subsequent jobs of a similar type.
Slurm job profiling - an introductory guide
Once you understand the output, you can move on to optimising your jobs for time, CPU and memory. This is something that varies depending on the programs you’re using and the data they’re processing, so you’ll probably end up being your own expert in the long run. At the same time, we've been able to put together a set of general guidelines (see: Slurm top tips) that you can apply to get your work done as speedily as possible whilst also making sure that you don't accidentally ask for resources that don't get used. These should make a good starting point.