The Lustre file system
/cosma6, /cosma7, /cosma8 and /snap7, /snap8 are LUSTRE file systems. These are parallel file systems which can write different parts of a file to different physical hard drives, thereby speeding up read and write access.
By default, Lustre is configured to write files to a single logical hard drive. However, for large files (typically >100MB), improved performance can be achieved if the file is striped across multiple disks. This must be performed when the file is first created: It is not possible to re-stripe a Lustre file (though the file can be migrated).
/cosma7 uses a progressive file layout by default (which you can change): The first GB is written to a single stripe, up to 16GB are then written to 8 logical disk units, and the remainder of files larger than this are written to 16 disk units.
/cosma6 also uses a progressive file layout by default: The first 64kB are written to the metadata server (speeding up access for small files). Up to 128MB are then written to a single stripe, up to 1GB are striped across 8 disks, and the remainder of files larger than this are striped across all disks.
/cosma8 similarly uses a progressive file layout by default, similar to /cosma8.
Lustre files inherit their striping from the directory in which they are written. Therefore if you have a data directory containing only large files, the simplest method to stripe these files is to simply stripe the directory when it is created (i.e. before any files are written to it).
To stripe a directory:
- lfs setstripe -S 4M -c -1 /path/to/directory
which will stripe across all disks, in 4MB chunks.
To stripe a file before it is created (i.e. to touch it):
- lfs setstripe -S 4M -c -1 /path/to/new/file
This will create an empty file, which can then be written to.
For very large files, striping is essential if the file won’t fit on a single logical drive.
A C API also exists, so that striping can be performed programmatically. However, don't reinvent the wheel, and use e.g. parallel HDF5 instead (with correct settings).
The Lustre file systems also allow progressive striping (striping larger files more, automatically). This is easy to set up, but please refer to Lustre documentation for more details.
The /snap7 and /snap8 file systems have no built in redundancy, and so if a disk fails, data will be lost. However, it is possible to set up RAID-1 (mirroring) for data protection on a per-directory or a file-by-file basis. The lfs mirror command can be used to do this. For example, to mirror a directory to 2 disks:
lfs mirror create -N -p pool1 -N -p pool2 mirroredDirectory/
Other options can be added to this, to set striping etc:
lfs mirror create -N -p pool1 -c 2 -N -p pool2 -c 2 /mirroredStripedDirectory/
(the -c 2 means to strip across 2 disks for each mirror)
Note, it is important to use the -p pool1/2 option as otherwise it is possible for both mirrors of a file to be on the same disk (which obviously is not ideal for redundancy!)
More advanced options are available, for example multiple mirrors, different striping on different mirrors, etc.
Once you have written a file, you need to manually synchronise it:
lfs mirror resync file1
If you have many files, they can be done at the same time, e.g.:
lfs mirror resync *
To check flles, you can use lfs mirror verify file1.
If you are using the /snap file systems for restart/checkpoint files, please ensure that you either mirror, or alternatively, copy ocassionally to /cosma6, /cosma7 or /cosma8 (as appropriate). It will not usually be necessary to copy every set of restart files, but advisable to copy every few days to minimise wasted compute time in the event that files are lost. While copying these files, please be sure to maintain consistency, i.e. do not overwrite an old set until the new set has been copied across and verified.
If you are rsync-ing data from elsewhere onto the Lustre file system, and have files >1GB in size, the rsyncLustre module should be used. This is a patched version of rsync which will automatically stripe all files >1GB in size.