Computing Collaborations

Sandia deploys cutting-edge Cerebras CS-3 testbed for AI workloads

Technician positioning Cerebras wafer enclosure for installation in Kingfisher. Photo taken by Craig Fritz.

In a partnership just reaching two years, Sandia and Cerebras Systems have unveiled a cluster composed of four Cerebras CS-3 systems to be used as a Sandia testbed, that will expand research into AI workloads for national security missions.

The first four Cerebras CS-3 nodes of a planned eight-node system, named Kingfisher, were recently deployed at Sandia, funded by and in support of the NNSA’s Advanced Simulation and Computing Artificial Intelligence for Nuclear Deterrence strategy.

“As part of our ASC AI4ND strategy, the Cerebras CS-3 system positions us to be able to develop large scale trusted AI models on secure internal Tri-lab (Sandia, Lawrence Livermore and Los Alamos Laboratories) data without many of the memory and power challenges that GPU systems face,” said Justin Newcomer, senior manager of the ASC program at Sandia. “Consistent with the Sandia led Advanced Architecture Prototype System program, Vanguard, we are excited to push the boundaries on what is possible with AI systems through this partnership with Cerebras.”

This third-generation wafer scale engine architecture, WSE-3, will expand the capabilities of the NNSA Tri-labs and will allow bleeding edge investigation of future applications of AI to augment the existing ASC mission. In addition to this focus, testing will be conducted to investigate ways the architecture can be applied to Sandia’s traditional modeling and simulation workloads.

“We’re excited to expand our collaboration with Sandia with the deployment of this new Cerebras CS-3 cluster, which consists of four of our 3rd-generation wafer-scale systems,” said Andy Hock, senior vice president of product and strategy at Cerebras. “Building on Sandia’s and Cerebras’ history of record-setting AI and HPC performance and award-winning research, we look forward to seeing how this new, powerful cluster will enable Sandia researchers to uncover new breakthroughs across science, energy, national security and more.”

The system includes Cerebras’ Wafer Scale Engine, which has proven to be a novel alternative to traditional accelerators that have been used for AI. It employs many industry standard semiconductor manufacturing processes also used for general purpose CPUs and accelerators.

Cerebras WSE-3 positioned to show size comparison to that of a dinner plate. Image provided by Cerebras.

Fabrication begins with, a dinner plate sized piece of silicon, a wafer, and goes through a complex series of manufacturing steps. The process uses optical lithography and other methods to etch a large number of transistors, cores, and individual processor ‘dies’ on the wafer. In a more traditional process, the wafer is chopped up into individual dies that are destined to become smaller packaged processors, like those in a laptop or cell phone. In the Cerebras process, however, the wafer remains intact.

The result is a wafer-scale engine, WSE-3, that contains 900,000 processors built for AI+HPC that are tightly integrated with each other and located close to high-performance on-wafer SRAM memory. This novel approach allows for extremely high-performance compute in a single chip – and even greater in a cluster – with extremely fast communication and high memory bandwidth.

Sandia has initiated several Generative AI projects to develop capabilities for science and engineering use cases in the national security domain. Siva Rajamanickam, PI of the new BANYAN Institute focused on Generative AI said “We are excited to have this new system at Sandia. It allows us to evaluate training and finetuning of large multimodal models for our mission. We plan to explore the accuracy and scalability of model training, productivity of model development and understand the energy and power consumption of training workloads.”

This machine arrives as DOE continues to work on its Frontiers in Artificial Intelligence for Science, Security, and Technology initiative. By integrating this state-of-the-art architecture, Sandia not only enhances their current capabilities but also lays the groundwork for pioneering advancements in AI that will support the DOE’s broader strategic objectives.

“The deployment of the Cerebras CS-3 system at Sandia is a significant milestone in our journey to lead in AI and machine learning innovation,” said Jen Gaudioso, director of the ASC Program at Sandia. “This advanced testbed aligns perfectly with the DOE’s FASST initiative, enabling us to explore and develop cutting-edge AI technologies that are crucial for future national security missions.”

Although there is growing excitement for the potential of AI to impact the NNSA mission, modeling and simulation is and will remain critical.

“While CS-3 is designed for AI, Kingfisher will also be used to explore traditional modeling and simulation workloads. Sandia has led a Tri-lab effort under the Advanced Memory Technology program to explore the feasibility of using future versions of the Cerebras Wafer Scale Engine architecture for a combination of Mod-Sim and AI workloads,” said James H. Laros III distinguished member of technical staff and AMT program lead at Sandia.

The machine was installed in October 2024 and has already started on the path of exploration and innovation, made possible through the important collaborations of national laboratories with industry. Gaudioso stated, “This partnership with Cerebras Systems exemplifies our commitment to pushing the boundaries of what is possible in AI research and development.”

From left to right Thuc Hoang, Ann Gentile, Andrew Younge, Si Hammond, James Laros, and Kevin Stroup standing beside Kingfisher, the new Cerebras CS-3 system installed at Sandia National Labs. Photo provided by Kevin Stroup.

About NNSA: Established by Congress in 2000, NNSA is a semi-autonomous agency within the U.S. Department of Energy responsible for enhancing national security through the military application of nuclear science. NNSA maintains and enhances the safety, security, and effectiveness of the U.S. nuclear weapons stockpile; works to reduce the global danger from weapons of mass destruction; provides the U.S. Navy with safe and militarily effective nuclear propulsion; and responds to nuclear and radiological emergencies in the United States and abroad.