Skip to main content

Login nodes

Once you have logged into Hamilton, the Linux commands you type in at the prompt are run on one of the service's two login nodes. Although these are relatively powerful computers, they are a resource shared between all the users using Hamilton and should not be used for running demanding programs. Light interactive work, downloading and compiling software, and short test runs using a few CPU cores are all acceptable.

Care should be taken not to overload the login nodes: we reserve the right to stop programs that interfere with other people's use of the service.

Running intensive computations

The majority of the CPU cores and RAM on Hamilton are in its compute nodes, which are accessed via the queuing system, Slurm. Most work on Hamilton is done as non-interactive batch jobs that are scheduled by Slurm to run when space becomes available. However, interactive work is also possible through Slurm.

Batch jobs

A batch job is typically written using a login node and is submitted to Slurm from there. It is composed as a script, written with a text editor such as nano, that contains two things:

  • instructions to Slurm describing the resources (CPU, memory, time, etc) needed for the job and any other Slurm settings
  • the commands the job will run, in sequence.

The Example job scripts page has sample scripts for various types of jobs, and the Software pages have additional advice on configuring jobs for certain applications. All batch jobs are submitted using the command:

sbatch <job script name>

Once a job has been submitted to the queuing system, it will be scheduled and run as resources become free.

When a job script is submitted using the sbatch command, the system will provide you with a job number, or job id. This number is how the system identifies the job; it can be used to see if the job has completed running yet, to cancel it, etc. If you need to contact us about a problem with a job, please include this number as it is essential when diagnosing problems.

Using the example job script for a serial job (see Example job scripts), and a fictional user account foobar22:

[foobar22@login1 ~]$ sbatch my_serial_job.sh
Submitted batch job 3141717

[foobar22@login1 ~]$ squeue -u foobar22
             JOBID PARTITION     NAME       USER ST       TIME  NODES NODELIST(REASON)
           3141717    shared my_seria   foobar22 PD       0:00      1 (Resources)

  • The fifth column (ST) shows what state the job is in. R means that the job is running and PD means the job is pending, i.e. waiting for its turn in the queue. While it is pending, the NODELIST(REASON) column will show why it is not running, for example:
  • (Resources) - normal. The job is waiting for nodes to become free and allow it to run
    (Priority) - normal. The job is waiting in the queue as there are higher-priority jobs ahead of it
    (PartitionNodeLimit) - job will not run. The job submission script has asked for too many resources for the queue

When the job has started running, a file called slurm-<jobid>.out will be created. This contains any output printed by the commands in your job script. If the batch scheduled has to kill your job, for example because it tried to use more time or memory than requested, this will be noted at the bottom of this file.

Once the job has finished running, it will no longer appear in the output of squeue. Details about a finished job can be obtained from the command sacct -j <jobid>

Interactive jobs

Interactive jobs are useful when, for example, work needs to be done interactively but is too intensive for a login node, or for testing software's behaviour in a Slurm environment. The Slurm command srun will start an interactive job. For example, to start an interactive shell on a compute node, use:

srun --pty bash

Jobs run through srun are subject to the same controls as batch jobs. If you need extra resources, such as CPU cores, memory or time, request them in the same way as with sbatch (see Queueing system). For example:

srun --pty --mem=2G -c 2 -p test bash

Instead of starting an interactive shell on a compute node, other commands can also be run through srun, e.g:

srun --mem=2G -c 2 -p test <mycommand>

Queueing system

Useful commands

The core commands to interact with the Slurm scheduling system are:

  • sfree - show what resources are available
  • sinfo - summary of the system and status
  • sbatch <jobscript> - submit a job to the queue
  • squeue -u <username> - see the status of jobs in the queue
  • scancel <jobid> - remove jobs from the queue
  • sacct -j <jobid> - show details of a job that has finished
  • srun - start an interactive job

Available queues and job limits

Compute nodes are organised into queues (also known as partitions). Hamilton currently has 5 queues:

Queue Description Node type Node quantity Job limits
shared Default queue, intended for jobs that can share nodes Standard 119(*) 3 days
multi For jobs requiring one or more whole nodes Standard 119(*) 3 days
long For jobs requiring >3 days to run Standard 1(*) 7 days
bigmem For jobs requiring > 250GB memory High-memory 2 3 days
test For short test jobs Standard 1 15 minutes

(*) The shared, multi and long queues share a single pool of 119 nodes.

Types of compute node:

  • Standard - 128 CPU cores, 400GB temporary disk space, 250GB RAM
  • High-memory - 128 CPU cores, 400GB temporary disk space, 1.95TB RAM

Most work on Hamilton is done in the form of batch jobs, but it is also possible to run interactive jobs via the srun command. Both types of job can be submitted to any queue.

Job resources and options

Unless you specify otherwise, jobs will be submitted to the shared queue and allocated the following resources:

  • 1 hour (15 minutes for the test queue)
  • 1 CPU core
  • 1GB memory
  • 1GB temporary disk space ($TMPDIR)

Further resources can be allocated using sbatch or srun options, which can be included either on the command line (e.g. sbatch -n 1 <job_script>, or by embedding them in your job script (e.g. adding the line #SBATCH -n 1). If both are done, the command line takes precedence. Useful options include:

Option Description
-p <QUEUE> Submit job to <QUEUE> (queues are also known as partitions)
-t <TIME> Run job for a maximum time of <TIME>, in the format dd-hh:mm:ss
-c <CORES> For multi-core jobs: allocate <CORES> CPU cores to the job
-n <CORES> For MPI jobs: allocate <CORES> CPU cores to the job
-N <NODES> Allocate <NODES> compute nodes to the job
--mem=<MEM> Allocate <MEM> RAM to the job, e.g. 1G
--gres=tmp:<TMPSPACE> Allocate <TMPSPACE> temporary disk space on the compute node(s)
--array=<START>-<END> Run job several times, from indexes <START> to <END>
--mail-user=<EMAIL> Send job notifications to email address <EMAIL> (for batch jobs only; not needed to send to submitter's Durham address)
--mail-type=<TYPE> Types of job notifications to send, e.g. BEGIN, END, FAIL, ALL (recommended: END,FAIL).  For batch jobs only.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Environment variables

Slurm sets a number of environment variables that can be helpful to, for example, match the behaviour of a job to its resource allocation. These are detailed on the sbatch and srun man pages.

The four additional environment variables below are set to match the value given in #SBATCH -c <number>, to help automate the behaviour of multi-threaded programs. This should be reasonable in most cases, but the values can be changed in job scripts if desired.

  • $OMP_NUM_THREADS
  • $OPENBLAS_NUM_THREADS
  • $MKL_NUM_THREADS
  • $BLIS_NUM_THREADS

Example jobs

1) Serial jobs (1 CPU core)

Programs that aren't parallel, which includes most programs, are known as serial or sequential programs. They only use one CPU core at a time, and so many can run at the same time on one of Hamilton's multi-core compute nodes.

An example job script to run a program called my_serial_program would be:

#!/bin/bash

# Request resources:
#SBATCH -c 1           # 1 CPU core
#SBATCH --mem=1G       # memory required, up to 250G on standard nodes.
#SBATCH --time=1:0:0   # time limit for job (format:  days-hours:minutes:seconds)
#SBATCH --gres=tmp:1G  # temporary disk space required on the compute node ($TMPDIR),
# up to 400G
# Run in the 'shared' queue (job may share node with other jobs)
#SBATCH -p shared

# Commands to be run:
module load my_module
./my_serial_program

If saved in a file called my_serial_job.sh, this can be submitted to the queue with the command sbatch my_serial_job.sh

2) Shared memory job (multiple CPU cores on one node)

Some programs can use more than one CPU core at a time, but are limited to a single compute node. These typically use programming techniques such as OpenMP or threading to achieve this. We call them shared memory programs, because the parallelisation requires that all CPU cores have access to the same RAM/memory.

An example job script to run a program called my_sharedmemory_program, would be:

#!/bin/bash

# Request resources:
#SBATCH -c 2          # number of CPU cores, one per thread, up to 128
#SBATCH --mem=1G      # memory required, up to 250G on standard nodes
#SBATCH --time=1:0:0  # time limit for job (format:  days-hours:minutes:seconds)
#SBATCH --gres=tmp:1G # temporary disk space required on the compute node ($TMPDIR),
# up to 400G
# Run in the 'shared' queue (job may share node with other jobs)
#SBATCH -p shared

# Commands to be run:
module load my_module
./my_sharedmemory_program

If saved in a file called my_shared_job.sh, this can be submitted to the queue with the command sbatch my_shared_job.sh

3) High memory job

Jobs that require >250GB memory (per node) should run in the bigmem queue. The nodes in this queue each have 1.95TB memory. An example job script my_bigmem_job.sh might be:

#!/bin/bash

# Request resources:
#SBATCH -c 1            # number of CPU cores, up to 128 for shared-memory programs
#SBATCH --mem=260G      # memory required, up to 1.95T
#SBATCH --time=1:0:0   # time limit for job (format:  days-hours:minutes:seconds)
#SBATCH --gres=tmp:1G   # temporary disk space required on the compute node ($TMPDIR),
# up to 400G
# Run in the bigmem queue (job may share node with other jobs)
#SBATCH -p bigmem

# Commands to be run:
module load my_module
./my_bigmem_program

If saved in a file called my_bigmem_job.sh, this can be submitted to the queue with the command sbatch my_bigmem_job.sh

4) Distributed memory job (multiple CPUs across one or more nodes)

Programs can be written to take advantage of CPU cores and memory spread across multiple compute nodes. They typically use the low-level library called MPI (Message Passing Interface) to allow communication between many copies of the same program, each with access to its own CPU core and memory. We call this a distributed memory programming model.

An example job script to run an MPI program called my_mpi_program would be:

#!/bin/bash

# Request resources:
#SBATCH -n 1           # number of MPI ranks (1 per CPU core)
#SBATCH --mem=1G       # memory required per node, in units M, G or T
#SBATCH --time=1:0:0   # time limit for job (format:  days-hours:minutes:seconds)
#SBATCH --gres=tmp:1G  # temporary disk space required on each compute node ($TMPDIR)
#SBATCH -N 1           # number of compute nodes. 

# Smaller jobs can run in the shared queue.  
# Larger jobs that will occupy one or more whole nodes should use the multi queue.
#SBATCH -p shared 

# Commands to be run.  
# Note that mpirun will automatically launch the number of ranks specified above 
module load my_module
mpirun ./my_mpi_program

If saved in a file called my_dist_job.sh, this can be submitted to the queue with the command sbatch my_dist_job.sh

5) Hybrid distributed and shared memory job (multiple CPUs across one or more nodes)

Writers of distributed memory programs have discovered that using a mixed MPI / OpenMP model have their benefits (for example, to reduce the memory and computation dedicated to halo exchanges between different processes in grid-based codes).

For these codes, we recommend running one MPI rank per CPU socket (two MPI ranks per compute node on Hamilton). An example job script would be:

#!/bin/bash

# Request resources:
#SBATCH -n 1                    # number of MPI ranks
#SBATCH -c 1                    # number of threads per rank (one thread per CPU core)
#SBATCH --ntasks-per-socket=1   # number of MPI ranks per CPU socket
#SBATCH -N 1                    # number of compute nodes. 
#SBATCH --mem=1G                # memory required per node, in units M, G or T
#SBATCH --gres=tmp:1G           # temporary disk space on each compute node ($TMPDIR)
#SBATCH -t 1:0:0                # time limit for job (format: days-hours:minutes:seconds) 

# Smaller jobs can run in the shared queue. 
# Larger jobs that will occupy one or more whole nodes should use hte multi queue.
#SBATCH -p shared 

# Commands to be run. 
# Note that mpirun will automatically launch the number of ranks specified above 
module load my_module o provide the CPUs requested)  
mpirun ./my_hybrid_program

If saved in a file called my_hybrid_job.sh, this can be submitted to the queue with the command sbatch my_hybrid_job.sh

6) Job arrays

Sometimes it is necessary to run a large number of very similar jobs. In order to avoid having to write a job script for each of these jobs, the batch queue system provides a technique called job arrays, to allow a single job script to be used to be run many times. Each run is called a task.

This feature can be combined with any of the above examples, but here is a serial job example that runs the command ./my_program 32 times, with the arguments input_file_1.txt to input_file_32.txt

#!/bin/bash

# Request resources (per task):
#SBATCH -c 1           # 1 CPU core
#SBATCH --mem=1G       # 1 GB RAM
#SBATCH --time=1:0:0   # 6 hours (hours:minutes:seconds)

# Run on the shared queue
#SBATCH -p shared

# Specify the tasks to run:
#SBATCH --array=1-32   # Create 32 tasks, numbers 1 to 32

# Each separate task can be identified based on the SLURM_ARRAY_TASK_ID
# environment variable:

echo "I am task number $SLURM_ARRAY_TASK_ID"

# Run program:
module load my_module
./my_program input_file_${SLURM_ARRAY_TASK_ID}.txt