Smaller but architecturally identical to world’s fastest, El Capitan
Sandia’s new El Dorado supercomputer has been ranked 20th in the world on the latest Top 500 list, which was released at the 2024 Supercomputing Conference in Atlanta. The machine is smaller in scale but architecturally identical to Lawrence Livermore National Laboratory’s El Capitan supercomputer, which ranked as the fastest in the world in the same survey.
“We at Sandia have invested in preparing many of our engineering and science codes to run effectively on El Capitan and El Dorado,” supercomputing manager Andrew Younge said. “I am looking forward to taking advantage of this massive new capability with El Capitan and enabling a much higher level of fidelity in our simulations.”
According to Andrew, the El Dorado-El Capitan system represents the first leadership-class exascale system designed to support NNSA stockpile stewardship missions. He described El Dorado’s specific functions as an application-readiness test system. “Basically, it is an extra-large on-ramp for Sandia computing codes to build, test, prepare, validate and update, all at Sandia, before running at exascale on El Capitan.”
“Because El Dorado is actually quite large compared to normal application readiness systems, we anticipate it will provide production cycles to Sandia as well,” he said. “As a limited side function, El Dorado is also likely to enable Sandia to do more experimental R&D on the HPC system itself, perhaps exploring new workflows or similar avenues in the future.”
“Part of the magic — I call it the ‘special sauce’ — of the system is the use of the Cray-developed proprietary High-Speed Network called Slingshot,” computing manager Kevin Stroup said. Another plus, he said, is that the compute nodes are direct liquid cooled, meaning that coolant fluid is piped through them and removes the heat generated using fluid-filled heat sinks. “Without this, it would be nearly impossible to operate the system and deal with the heat produced.”
The machine, designed and built by Hewlett Packard Enterprise, is an EX4000 model based on the company’s Cray product line. It consists of three cabinets of compute blades, with a total of 384 MI-300A nodes. The MI-300A is an accelerated processing unit from AMD — “sort of a CPU and GPU put together,” Kevin said.
The El Dorado supercomputer represents another major deliverable by the NNSA Advanced Simulation and Computing program to the nuclear security enterprise.