The transformative national security impact of early parallel computers
How do you ensure the safety, security and reliability of nuclear weapons without testing?
That was the question the U.S. needed to answer quickly after the 1991 signing of the Strategic Arms Reduction Treaty and the subsequent testing moratorium that accompanied the beginning of negotiations for a Comprehensive Test Ban Treaty.
After the September 1992 Divider test, the U.S. ceased all nuclear explosive testing, ending decades of reliance on the combination of theory, complex experiments and live atmospheric, underground and underwater nuclear testing.
Approaching this new age in nuclear weapons, the nation needed to ensure that its weapons were maintained with confidence in the reliability, performance and safety of the stockpile.
Without testing, the nation turned to computer simulation, technology that was still emerging. This led the DOE to create the Accelerated Simulation and Computing Initiative in 1995, now known as the NNSA Advanced Simulation and Computing program.
‘What had never been done before’
Victor Reis, then Assistant Secretary for Defense Programs, believed the DOE’s strength was in the defense program labs: Los Alamos, Lawrence Livermore and Sandia national laboratories. He pulled together the directors of the labs, their weapons leadership, DOD leaders and others inside and outside the federal government to plan a radically different approach to ensuring that confidence.
At the first ASCI workshop in 1994, Reis delivered a call to action.
“We have a 10-year window; if we do not have sufficient computer simulation capabilities by then, we will need to go back to testing and that will probably not be an option. We must succeed. The laboratories will need to change to being experiment- and computer-driven, rather than test-driven as in the past.”
To accomplish everything the initiative set out to do required three key components. The defense program laboratories, industry and academia would need to work together.
With a history of competition, it was imperative that Sandia, Lawrence Livermore and Los Alamos labs committed to ASCI, a collaboration that its executive committee called “One Program – Three Laboratories.”
Accurate simulation of the physics at the core of nuclear weapons behavior required a deep understanding of both nuclear physics and more fundamental high-energy physics, fields that demanded expertise from the labs and academia.
By 1994, the laboratories had worked closely with industry to design and develop the most modern and powerful computers in existence, but broader contributions were needed in the development of new technologies for highly specialized capabilities.
The success of ASCI relied on deliberate partnerships that eventually changed the world of high-performance computing.
“Because we were under so much pressure, we needed to bring the best ideas together, partners were key to an impactful start for ASCI,” said Thuc Hoang, current director of the NNSA ASC program, who also was one of the first federal staff managers of ASCI at its creation.
Sandia’s ‘scientific novelty’ becomes the foundation
Through an early collaboration with nCube Corporation in the 1980s, Sandia researchers demonstrated the value of massively parallel computers. In fact, Sandians Robert Benner, John Gustafson and Gary Montry won the first Gordon Bell Prize in 1987 running a simulation on the nCUBE 10, capable of 1.9 gigaflops, or 1.9 billion floating point operations per second.
Massively parallel computers require hundreds or thousands of computer processors working in parallel to solve large and complex problems very rapidly.
“Ed Barsis, a director at Sandia during this time, took a huge risk on parallel computing. The technology was challenged, even internally, as not a lot of people had faith in it. It was new and unproven,” said Grant Heffelfinger, a retired Sandia director and computational researcher during this period.
Another early computational researcher, Climate Security Center Director Rob Leland, said, “It was an exciting time. Sandia had a major breakthrough in technology as the nation was looking for a eureka moment. This was truly a time of great innovation that was well timed to meet the needs of ASCI.”
First big ASCI success
In 1994, another Sandia-led team that included researchers from the University of New Mexico and Intel Corp. won a second Gordon Bell prize for their work in parallel computing. Within a month of the award, Sandia and Intel set the world computational speed record at 280 gigaflops.
“Ed Barsis saw that by applying their new parallel computing technology, it was possible to solve signature problems LANL and LLNL were facing in designing nuclear explosive packages. This insight was used as a bargaining chip for Sandia’s inclusion in the ASCI program,” said Heffelfinger.
In 1996, Sandia ASCI Red came on line and became the first major success of the program. It was the first machine to exceed a trillion flops, or 1,000 gigaflops, and the experience gained from ASCI Red contributed to more than a decade of U.S. leadership in supercomputing. The machine displayed incredible scalability, applying thousands of processors with great efficiency, an elusive and highly prized goal in early supercomputers.
The machine’s architecture and software environment were so successful that it operated for a decade, twice the typical lifetime for supercomputers. Sandia’s lightweight operating system was a key contributor to the success. When ASCI Red reached the end of its operational life in 2006, it was replaced with Red Storm in 2006, designed by Sandia and built by Cray Inc., with a similar design philosophy.
Red Storm would go on to score many technical program achievements and produce 124 descendants before its retirement in 2012.
“Bill Camp, Sandia’s director of computing research in that era, had the vision for Red Storm. He worked closely with computer architects to produce a system that not only met the impossible mission need, but also became a dominant force in the field,” Rob said. “Red Storm was among the few fastest machines of its era, and it stood out for its performance on difficult engineering problems like those faced by Sandia. Its influence can be seen in supercomputers today.”
One of the machine’s notable public impacts was in February 2008, when it was used to bring down a defective satellite. Burnt Frost, as the operation was known, required Red Storm to perform hundreds of impact simulations on an errant satellite.
President George W. Bush was briefed on options, derived from months of Red Storm calculations. The mission successfully brought down the 5,000-pound satellite with a single missile shot.
Bill Camp was recognized with the Seymour Cray Computer Engineering Award in 2016d f “for visionary leadership of the Red Storm project, and for decades of leadership of the HPC community.”
The legacy lives in today’s ASC program
The ASC Program at Sandia continues to build on its strong legacy as the nation moves to a new era of high-performance computing in support of national security. In 2018, Sandia’s Vanguard program successfully deployed Astra, the first petaflops computer using Arm-based microprocessors.
Astra’s success has positioned Sandia to continue taking high-risk opportunities in developing emerging leading-edge technologies to prove their viability in the ASC mission — a key component to NNSA’s platform strategy.
Astra’s success has positioned Sandia to continue taking high-risk opportunities in developing emerging leading-edge technologies to prove their viability in the ASC mission — a key component to NNSA’s platform strategy.
“The legacy of the ASC program is a testament to the power of collaboration and innovation,” said Jen Gaudioso, Sandia’s current ASC program director. “Today, we continue to push the boundaries of computational science, ensuring that our national security remains robust and our technological leadership unchallenged.”
The collaboration between industry, academia and the three NNSA labs remains a foundation for the continued success of the ASC program. In addition to nuclear testing simulations and design work for refurbished weapons, the computing, modeling and simulation tools developed through ASC enhance a wide variety of other national security programs to ensure that the U.S. remains at the forefront of technological innovation and nuclear safety.
A deeper dive into ASC
A more comprehensive history of the first 10 years of Advanced Simulation and Computing can be found on Lawrence Livermore’s website, in the publication Delivering Insight. NNSA’s ASC program celebrated its 25th Anniversary in 2020.