New method, eclipsing top supercomputer, is Gordon Bell prize finalist
In an era where supercomputers are setting the pace for scientific discovery, a collaborative team has not just broken but shattered a speed barrier in molecular dynamics simulations.
Their innovative simulation, powered by a novel wafer-scale engine, raced past the maximum speed achievable on the world’s fastest supercomputer by an unprecedented 457 times, where speed is measured in simulation timesteps-per-second. This achievement, now a finalist for the esteemed Gordon Bell Prize, could herald a new dawn in molecular dynamics and computational science.
The prize, sponsored annually by the Association for Computing Machinery, recognizes outstanding achievement in high-performance computing. The winner will be announced at SC24, the premier conference for supercomputing, in November in Atlanta.
Researchers from Sandia, Lawrence Livermore and Los Alamos National Laboratories — referred to as the tri-labs — worked together with Cerebras Systems as part of NNSA’s Advanced Memory Technology program. The simulation was performed on Cerebras’ wafer-scale engine technology.
The new paper, “Breaking the Molecular Dynamics Timescale Barrier Using a Wafer-Scale System,” describes the coordinated effort that connected Cerebras’ ground-breaking capabilities to the molecular dynamics simulations already under development at the three NNSA laboratories.
“The ultra-rapid simulation speed provides views for milliseconds instead of nanoseconds, offering a more complete picture of how materials evolve and behave,” said Sandia researcher Siva Rajamanickam, who led the tri-labs’ collaboration with Cerebras. “The ability to perform calculations this rapidly on a general-purpose processor enables the science community to understand chemical processes and material behaviors at timescales previously unachievable in commercially available hardware.”
Potential applications include more detailed studies of the evolution of grain boundaries in metals to create more resilient materials, enabling renewable energy researchers in designing extended-duration energy storage systems and, with further enhancements, the observation of protein folding and drug-target interactions to accelerate the discovery of life-saving therapies.
An innovative approach
This new computing approach achieves its speed by distributing the simulated atoms to the 900,000 cores on a single Cerebras wafer-scale engine. Interactions between the simulated atoms result in communication between the cores on the wafer, rather than many graphics processing units as done on Oak Ridge National Laboratory’s Frontier, the world’s fastest supercomputer.
Sandia molecular dynamics expert and paper author Stan Moore put it this way: “Latencies in current supercomputers — such as the time it takes to send a message across the network — limit the timescales that can be achieved. On the Cerebras system, these latencies are greatly reduced, allowing orders of magnitude speedup in achievable timescales.”
Aidan Thompson, a Sandia co-author of the paper and an expert in molecular dynamics who helped guide Cerebras on the algorithmic details of this project, praised the sheer speed of Cerebras technology.
“The maximum speed of a simulation had remained stubbornly fixed for at least the last decade at around 5 kilosteps per second,” Aidan said. “The Cerebras wafer-scale engine has smashed this barrier by achieving a speed of 699 kilosteps per second. Only highly specialized codes on specialized hardware run faster, and that advantage may not last long.”
“This bodes well for the future impacts of our program and its potential scientific advances,” said James Laros, a distinguished member of technical staff at Sandia and lead of the Advanced Memory Technologies program. “The Advanced Memory Technology-based partnership between the NNSA laboratories and Cerebras Systems reached new heights when the speedup on molecular dynamics simulations exceeded the AMT program’s goal — a 40-times performance improvement — by more than 10 times.”
“We all had our doubts about achieving this goal within the short timeframe, but Cerebras’ technology and new methods from our team helped us exceed this goal by demonstrating unprecedented improvement on molecular dynamics simulations,” Siva said. “These results open opportunities for materials research and science discoveries beyond what we envisioned.”
Innovative architecture enabled the team to surpass the performance level previously achieved. “The tri-labs have been fantastic partners in our journey. It is wonderful to see scientists at the laboratories actively collaborating with our team and pushing our wafer-scale technology to new frontiers,” said Michael James, Cerebras co-founder and chief architect of advanced technologies. “The success in molecular dynamics simulations is hopefully one of many to follow. We are very excited to continue the partnership with NNSA.”
Thuc Hoang, director of NNSA’s Advanced Simulation and Computing program, reflects on the strong partnership between the labs, NNSA and Cerebras.
“The collaboration between Cerebras, Sandia and the tri-lab community illustrates the advantage of industry partnership with the NNSA in the innovative AI technology space,” Hoang said. “It is a great example of the breakthroughs in science that can be achieved together, that would otherwise not be possible by any party on their own. We look forward to seeing continued partnerships with Cerebras and others for both AI and our scientific modeling and simulation missions.”
A sense of perspective
Stan Moore compares the graphics processing units on Frontier to racehorses: There are thousands of them, and paradoxically, that creates difficulties for relatively small problems.
“Imagine a simulation where you are trying to pull a cart,” Stan explained. “If you hook up one horse, it can move the cart, but it is slow. If you hook up eight horses, it goes faster because they can share the weight, but ultimately a horse can only run so fast even if it is pulling little weight. If you hook up 500 horses to the same cart, they just get in each other’s way, so adding too many horses to one cart doesn’t help. In contrast, the Cerebras wafer is like a race car that can pull the cart so much faster.”
Many scientific calculations, while benefiting from the increased speed of Frontier, are small enough that they will benefit even more from Cerebras, Stan said.
Exceeding expectations
“Accomplishing such a result not only takes vision, but some sort of insane confidence in your ideas,” Siva said. “We thought it was a measured risk based on our experience and knowledge, but not everyone would think so. When we proposed it to our lab leadership and NNSA. They trusted us to run with it. That’s because we have an environment where we can take measured risks to solve these big problems that others will shy away from.”
Both James and Siva attribute the Advanced Memory Technology program’s role in taking measured risks as part of the investigatory process as an outcome of the Vanguard program, also led by James through the Advanced Simulation and Computing program. While Advanced Memory Technology works towards proof of concept in a technology space, Vanguard focuses on deploying advanced at-scale prototypes.
Vanguard addresses the gap between successful laboratory experiments and large-scale production by industry, filling that void with imaginative, mid-scale prototype platforms that reduce the odds of anything technically infeasible lurking on the road to full scale. Because it requires relatively small funding, a wider range of experimentation and risk is encouraged. If any fail, that’s provisionally good, because something imaginative was probably tried.
Work such as that exemplified in the Gordon Bell submission could help drive a next generation of prototype systems under existing DOE programs, such as Vanguard.
The work is funded by NNSA’s Advanced Simulation and Computing program. The NNSA is a semiautonomous DOE agency responsible for the management and security of the nation’s nuclear weapons, nuclear nonproliferation and naval reactor programs, as well as responding to nuclear and radiological emergencies in the U.S. and abroad.
A culture tolerating risk makes a big difference
While researcher Siva Rajamanickam praises Cerebras, he also emphasizes the hard work and calculated risks taken by Sandia researchers that made the project possible.
“The AMT program and this team responded when Congress asked, ‘Can you get us 40 times better performance for far less money?’”
The Congressional request, based on emerging possibilities, was to build a program capable of demonstrating significant speedup over the world’s leading exascale supercomputer.
“We accepted the challenge. In two years, we beat Frontier — using a single Cerebras wafer — by 457 times. We took measured risks, knowing it is OK to fail,” Siva said. “We were supported by both NNSA and lab leadership. That culture makes a big difference.”
Even the selection of Cerebras, clearly the right choice in hindsight, was carefully vetted.
“Cerebras has been making their hardware for artificial intelligence applications,” Siva said. “We wanted to use it for scientific computing. This is new to them, new to us, and the first time anyone has done anything like this — that we know of — in the world. We worked on demonstrating the approach together for two years, codesigning where computer scientists and material scientists came together to design a new algorithm. Sandia, Cerebras, LLNL and LANL all came together to develop the method, choose the right problem to solve and execute it perfectly.”