The Frontier Supercomputer will have the world's fastest Storage system, with read speeds up to 75 TB/s

Credit: OLCF

The frontier supercomputer is set to go online at the end of 2021. It is an exascale supercomputer, and it is massive, with a target computation speed of ~1.5 exaFLOPS and 15 billion IOPS. Being the USA's first exascale supercomputer, being built at a cost of $600 million. Being built by Oak Ridge National Lab at the Oak Ridge Leadership Computing Facility. Frontier will fly past in terms of power when compared to its predecessors, Titan and Summit, which have a peak performance of 27 PetaFLOPS and 200 PetaFLOPS respectively. 

With this much power, it will need a very fast storage subsystem and would have 700 petabytes of storage, with a throughput of 75 Tb/s.

The supercomputer will test some serious I/O capabilities, comprising of 5,000 NVMe devices for reading and write speeds up to 10 Tb/s and over 2 million random-read IOPs. The NVMe drives will be presenting a capacity of 11.5 petabytes as well and also there will be 480 more NVMe devices to handle the mixed workload and file systems, which will add another 10 Pb of capacity. And that is just for the NVMe devices, which are just for the speed, the bulk storage will be on hard disks, and there will be 47,700 perpendicular magnetic recording devices, which will provide 679 Pb of capacity, at peak read speeds of 5.5 Tb/s and peak write speeds of 4.6 Tb/s and also with more than 2 million random-read IOPS. 

The Frontier will use both PCIe 4.0 and hard disk drives as it relies on the Cray ClusterStor multi-tier architecture. The PCIe 4.0 links provide peak read speeds of 75 Tb/s and peak write speeds of 35 Tb/s and more than 15 billion random-read IOPS. 

“Orion is pushing the envelope of what is possible technically due to its extreme scale and hard disk/NVMe hybrid nature,” said Dustin Leverman, leader of the OLCF’s High-Performance Computing Storage and Archive Group. “This is a complex system, but our experience and best practices will help us create a resource that allows our users to push science boundaries using Frontier.”

The ClusterStor machines use AMD EPYC processors and can automatically align data files in a new file system with workload and shift I/O operations between storage tiers, this new file system is named Orion, being the largest and fastest single file POSIX namespace file system in the world. 

According to OLCF (Oak ridge Leadership Computing Facility), Orion will have 40 Lusture metadata server nodes and 450 Lustre object storage service nodes. Lustre is an open-source, parallel file system that supports HPC simulation environments. An extra 160 nodes will also provide read-write speeds of 3.2 TB to all other OLCF resources and platforms. 

Source: OLCF

Comments