Cookies

We use cookies to ensure that we give you the best experience on our website. You can change your cookie settings at any time. Otherwise, we'll assume you're OK to continue.

Durham University

COSMA

DINE

Get the right data to the right node at the right time.

The cost of data movement - both runtime and energy - is predicted to be one major showstopper on our road to exascale. As computers driving data centres, supercomputers and machine learning farms become faster, their interconnects, i.e. communication devices, grow into a limiting factor; even worse, they also face the omnipresent unreliability that will arise. One way to improve them is to make them smart – to make them learn how to route data flows, how to meet security constraints, or even to deploy computations into the network. Smart network devices can take ownership of the data movement, bring data into the right format before it is delivered, care about security and resiliency, and so forth.

The Durham Intelligent NIC Environment (DINE) supercomputer is a small cluster equipped with Mellanox Smart NICs.These Smart NICs will enable direct access to remote memory to improve the performance of massively parallel codes, in preparation for future exascale systems, and will provide researchers with a test-bed facility development of new and novel computing paradigms. A Mellanox smart switch will also allow investigations into in-network computing to be carried out, which when combined with the smart NICs will lead to a novel HPC computing environment.

As an experimental device, the system has been purchased by Durham's Department of Computer Science as part of a strategic research equipment purchase. It was installed in collaboration with DiRAC/COSMA. The PIs behind this project are Alastair Basden and Tobias Weinzierl.

“The DINE supercomputer will allow researchers to probe novel technologies in preparation for running advanced codes on exascale machines, enabling a step change in model resolution in fields such as weather forecasting, climate change and cosmology, with a huge scientific benefit.”

Alastair Basden, HPC Technical Manager (COSMA)

The system

The Durham Intelligent NIC Environment (DINE) supercomputing facility is hosted alongside by COSMA, and is used by Computer Science researchers, DiRAC researchers and international collaborators.

A key feature of DINE is the Mellanox BlueField smart NIC cards which provide a programmable network offload capability, allowing network functions to be accelerated, and freeing up compute cores for other tasks.

DINE is comprised of 16 nodes each containing:

  • Dual 16-core AMD EPYC 7302 ROME processors (3GHz)
  • 128GB RAM
  • BlueField-1 Smart NIC (25GBit/s)
  • Mellanox Spectrum Ethernet switch


Students will also benefit from working with cutting-edge technologies, designing algorithms and investigating ideas which will carried forward into future UK and international facilities.

Access

Access is available free of charge to Durham scientists. High priority will be given to fundamental research (no production runs).

Students will also be given access and hence benefit from working with cutting-edge technologies. This will help them to design algorithms and investigate ideas which will carried forward into future UK and international facilities.

We are willing to give collaborators and external scientists access to the system as well to allow them to prototype novel algorithms and write new software using smart network devices.

To get access, please follow these instructions to apply for an account, signing up to project hpcicc, and then send a message to cosma-support@durham.ac.uk mentioning your interest in BlueField.

The system has one node available for direct login (from a COSMA login node), b101. The SLURM workload manager should then be used to submit jobs to the other nodes, using the bluefield1 queue.

Associated Projects

ExaClaw - Clawpack-enabled ExaHyPE for heterogeneous hardware

http://www.peano-framework.org/index.php/projects/exaclaw-clawpack-enabled-exahype-for-heterogeneous-hardware/

Durham project funded by EPSRC under the ExCALIBUR programme.

ExaHyPE - an Exascale Hyperbolic PDE Engine

www.exahype.org

EU H2020 FET HPC project with partners from Munich (Technische Universitat Munchen and Ludwig-Maximilians Universitat), Trento and Frankfurt.

“In Peano and ExaHyPE, we have been suffering from a lack of MPI progress and, hence, algorithmic latency for quite a while and invested significant compute effort to decide how to place our tasks on the system. We hope that BlueField will help us to realise these two things way more efficiently. Actually, we started to write software that does this for us on the BlueField in a blackbox way.”

Tobias Weinzierl, Project PI

Relevant links

Publications

Acknowledgement

This work has used Durham University's DINE cluster. DINE has been purchased through Durham University’s Research Capital Equipment Fund 19_20 Allocation, led by the Department of Computer Science. It is installed in collaboration and as addendum to DiRAC@Durham facility managed by the Institute for Computational Cosmology on behalf of the STFC DiRAC HPC Facility (www.dirac.ac.uk). DiRAC equipment was funded by BEIS capital funding via STFC capital grants ST/P002293/1, ST/R002371/1 and ST/S002502/1, Durham University and STFC operations grant ST/R000832/1. DiRAC is part of the National e-Infrastructure.