Cookies

We use cookies to ensure that we give you the best experience on our website. You can change your cookie settings at any time. Otherwise, we'll assume you're OK to continue.

Durham University

Advanced Research Computing

Running Jobs

Once logged into Hamilton, the Linux commands you type in at the prompt are run on one of the service's two login nodes. Although these are relatively powerful computers, they are a resource shared between all the users using Hamilton and should not be used for running demanding programs. Light interactive work, downloading and compiling software, and short test runs using a few CPU cores are all acceptable.

Care should be taken not to overload the login nodes: we reserve the right to stop programs that interfere with other people's use of the service.

The majority of the CPU cores and RAM are only available by interacting with Hamilton's queuing system, 'Slurm', and packaging your work into units called jobs.

The basic idea is that a shell script is written using a text editor such as nano, containing the commands the job will run in sequence. In addition, some specially formatted comment lines are added to the file, describing how much CPU, RAM, time, and other things that the job needs. This is called a job submission script and, once submitted to the queuing system, Hamilton will run the job as resources become free when other jobs finish. A number of example job submission scripts can be found on the Example job scripts tab.

When a job script is submitted using the sbatch command, the system will provide you with a job number, or job id. This number is how the system identifies your job; it can be used to see if the job has completed running yet, to cancel it, etc. If you need to contact us about a problem with a job, please include this number as it is essential when diagnosing problems.

Using the example job script for a serial job (see Example job scripts), and a fictional user account foobar22:

[foobar22@hamilton2 ~]$ sbatch my_serial_job.sh
Submitted batch job 3141717

[foobar22@hamilton2 ~]$ squeue -u foobar22
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
           3141717    shared my_seria   foobar22 PD       0:00      1 (Resources)

The fifth column (ST) shows what state the job is in. R means that the job is running and PD means the job is Pending, i.e. waiting for its turn in the queue. While it is pending, the NODELIST(REASON) column will show why it is not running:

  • (Resources) - normal. The job is waiting for nodes to become free and allow it to run
  • (Priority) - normal. The job is waiting in the queue as there are higher-priority jobs ahead of it
  • (PartitionNodeLimit) - job will not run. The job submission script has asked for too many resources for the queue

When the job has started running, a file called slurm-<jobid>.out will be created. This contains any output printed by the commands in your job script. If the batch scheduled has to kill your job, for example because it tried to use more time or memory than requested, this will be noted at the bottom of this file.

Once the job has finished running, it will no longer appear in the output of squeue. Details about a finished job can be obtained from the command sacct -j <jobid>

Useful commands

The core commands to interact with the Slurm scheduling system are:

  • sfree - show what resources are available
  • sinfo - summary of the system and status
  • sbatch <jobscript> - submit a job to the queue
  • squeue -u <username> - see the status of jobs in the queue
  • scancel <jobid> - remove jobs from the queue
  • sacct -j <jobid> - show details of a job that has finished
  • srun - start an interactive job

Available queues and job limits

Compute nodes are organised into queues (also known as partitions). Hamilton currently has 5 queues:

QueueDescriptionNode TypeNode QuantityJob Limits
shared Default queue, intended for jobs that can share nodes Standard 119 nodes(*) 3 days
multi For jobs requiring one or more whole nodes Standard 119 nodes(*) 3 days
long For jobs requiring >3 days to run Standard 1 node(*) 7 days
bigmem For jobs requiring > 250GB memory High-memory 2 nodes 3 days
test For short test jobs Standard 1 node 15 minutes

(*) The shared, multi and long queues share a single pool of 119 nodes.

Types of compute node:

  • Standard - 128 CPU cores, 400GB temporary disk space, 250GB RAM
  • High-memory - 128 CPU cores, 400GB temporary disk space, 1.95TB RAM

Most work on Hamilton is done in the form of batch jobs, but it is also possible to run interactive jobs via the 'srun' command. Both types of job can be submitted to any queue.

Job resources and options

Unless you specify otherwise, jobs will be submitted to the shared queue and allocated the following resources:

  • 1 hour (15 minutes for the test queue)
  • 1 CPU core
  • 1GB memory
  • 1GB temporary disk space ($TMPDIR)

Further resources can be allocated using sbatch or srunoptions, which can be included either on the command line (e.g. sbatch -n 1 <job_script>, or by embedding them in your job script (e.g. adding the line #SBATCH -n 1). If both are done, the command line takes precedence. Useful options include:

OptionDescription
-p <QUEUE> Submit job to <QUEUE> (also known as a partition)
-t <TIME> Run job for a maximum time of <TIME>, in the format dd-hh:mm:ss
-N <NODES> Allocate <NODES> compute nodes to the job
-c <CORES> For multi-threaded jobs: allocate <CORES> CPU cores to the job
-n <CORES> For multi-process jobs: allocate <CORES> CPU cores to the job
--mem=<MEM> Allocate <MEM> RAM to the job, e.g. 1G
--gres=tmp:<TMPSPACE> Allocate <TMPSPACE> temporary disk space on the compute node(s)
--array=<START>-<END> Run job several times, from indexes <START> to <END>
--mail-user=<EMAIL> Send job notifications to email address <EMAIL> (for batch jobs only)
--mail-type=<TYPE> Types of job notifications to send, e.g. BEGIN, END, FAIL, ALL (recommended: END,FAIL)
(for batch jobs only)

Environment variables

Slurm sets a number of environment variables that can be helpful to, for example, match the behaviour of a job to its resource allocation. These are detailed on the sbatch and srun man pages.

The four additional environment variables below are set to match the value given in '#SBATCH -c <number>', to help automate the behaviour of multi-threaded programs. This should be reasonable in most cases, but the values can be changed in job scripts if desired.

OMP_NUM_THREADS

OPENBLAS_NUM_THREADS

MKL_NUM_THREADS

BLIS_NUM_THREADS

1) Serial jobs (1 CPU core)

Programs that aren't parallel, which includes most programs, are known as serial or sequential programs. They only use one CPU core at a time, and so many can run at the same time on one of Hamilton's multi-core compute nodes.

An example job script to run a program called my_serial_program would be:

#!/bin/bash

# Request resources:
#SBATCH -c 1           # 1 CPU core
#SBATCH --mem=1G       # memory required, up to 250G on standard nodes.
#SBATCH --gres=tmp:1G # temporary disk space required on the compute node ($TMPDIR), up to 400G #SBATCH --time=1:0:0 # time limit for job (format: days-hours:minutes:seconds) # Run in the 'shared' queue (job may share node with other jobs) #SBATCH -p shared # Commands to be run: module load my_module ./my_serial_program

If saved in a file called my_serial_job.sh, this can be submitted to the queue with the command sbatch my_serial_job.sh

2) Shared memory job (multiple CPU cores on one node)

Some programs can use more than one CPU core at a time, but are limited to a single compute node. These typically use programming techniques such as OpenMP or threading to achieve this. We call them shared memory programs, because the parallelisation requires that all CPU cores have access to the same RAM/memory.

An example job script to run a program called my_sharedmemory_program, would be:

#!/bin/bash

# Request resources:
#SBATCH -c 2          # number of CPU cores, one per thread, up to 128
#SBATCH --mem=1G # memory required, up to 250G on standard nodes
#SBATCH --gres=tmp:1G # temporary disk space required on the compute node ($TMPDIR), up to 400G #SBATCH --time=1:0:0 # time limit for job (format: days-hours:minutes:seconds) # Run in the 'shared' queue (job may share node with other jobs) #SBATCH -p shared # Commands to be run: module load my_module ./my_sharedmemory_program

If saved in a file called my_shared_job.sh, this can be submitted to the queue with the command sbatch my_shared_job.sh

3) High memory job

Jobs that require >250GB memory (per node) should run in the 'bigmem' queue. The nodes in this queue each have 1.95TB memory. An example job script my_bigmem_job.sh might be:

#!/bin/bash

# Request resources:
#SBATCH -c 1            # number of CPU cores, up to 128 for shared-memory programs
#SBATCH --mem=260G      # memory required, up to 1.95T
#SBATCH --gres=tmp:1G # temporary disk space required on the compute node ($TMPDIR), up to 400G #SBATCH -t 1:0:0 # time limit for job (format: days-hours:minutes:seconds) # Run in the bigmem queue (job may share node with other jobs) #SBATCH -p bigmem # Commands to be run: module load my_module ./my_bigmem_program

If saved in a file called my_bigmem_job.sh, this can be submitted to the queue with the command sbatch my_bigmem_job.sh

4) Distributed memory job (multiple CPUs across one or more nodes)

Programs can be written to take advantage of CPU cores and memory spread across multiple compute nodes. They typically use the low-level library called MPI (Message Passing Interface) to allow communication between many copies of the same program, each with access to its own CPU core and memory. We call this a distributed memory programming model.

An example job script to run an MPI program called my_mpi_program would be:

#!/bin/bash

# Request resources:
#SBATCH -n 1 # number of MPI ranks (1 per CPU core) #SBATCH --mem=1G # memory required per node, in units M, G or T
#SBATCH --gres=tmp:1G # temporary disk space required on each allocated compute node ($TMPDIR)
#SBATCH -N 1 # number of compute nodes.
#SBATCH -t 1:0:0 # time limit for job (format: days-hours:minutes:seconds)

# Smaller jobs can run in the shared queue.
# Larger jobs that will occupy one or more whole nodes should use hte multi queue.
#SBATCH -p shared

# Commands to be run.
# Note that mpirun will automatically launch the number of ranks specified above
module load my_module
mpirun ./my_mpi_program

If saved in a file called my_dist_job.sh, this can be submitted to the queue with the command sbatch my_dist_job.sh

5) Hybrid distributed and shared memory job (multiple CPUs across one or more nodes)

Writers of distributed memory programs have discovered that using a mixed MPI / OpenMP model have their benefits (for example, to reduce the memory and computation dedicated to halo exchanges between different processes in grid-based codes).

For these codes, we recommend running one MPI rank per CPU socket (two MPI ranks per compute node on Hamilton). An example job script would be:

#!/bin/bash

# Request resources:
#SBATCH -n 1                    # number of MPI ranks
#SBATCH -c 1 # number of threads per rank (one thread per CPU core) #SBATCH --ntasks-per-socket=1 # number of MPI ranks per CPU socket
#SBATCH -N 1 # number of compute nodes.
#SBATCH --mem=1G # memory required per node, in units M, G or T
#SBATCH --gres=tmp:1G # temporary disk space required on each allocated compute node ($TMPDIR)
#SBATCH -t 1:0:0 # time limit for job (format: days-hours:minutes:seconds)

# Smaller jobs can run in the shared queue.
# Larger jobs that will occupy one or more whole nodes should use hte multi queue.
#SBATCH -p shared

# Commands to be run.
# Note that mpirun will automatically launch the number of ranks specified above
module load my_module o provide the CPUs requested)
mpirun ./my_hybrid_program

If saved in a file called my_hybrid_job.sh, this can be submitted to the queue with the command sbatch my_hybrid_job.sh

6) Job arrays

Sometimes it is necessary to run a large number of very similar jobs. In order to avoid having to write a job script for each of these jobs, the batch queue system provides a technique called Job arrays, to allow a single job script to be used to be run many times. Each run is called a task.

This feature can be combined with any of the above examples, but here is a serial job example that runs the command ./my_program 32 times, with the arguments input_file_1.txt to input_file_32.txt

#!/bin/bash

# Request resources (per task):
#SBATCH -c 1           # 1 CPU core
#SBATCH --mem=1G       # 1 GB RAM
#SBATCH --time=1:0:0   # 6 hours (hours:minutes:seconds)

# Run on the shared queue
#SBATCH -p shared

# Specify the tasks to run:
#SBATCH --array=1-32   # Create 32 tasks, numbers 1 to 32

# Each separate task can be identified based on the SLURM_ARRAY_TASK_ID
# environment variable:

echo "I am task number $SLURM_ARRAY_TASK_ID"

# Run program:
module load my_module
./my_program input_file_${SLURM_ARRAY_TASK_ID}.txt