Publications

Publications / Conference

Defining and measuring supercomputer Reliability, Availability, and Serviceability (RAS)

Stearley, Jon S.

The absence of agreed definitions and metrics for supercomputer RAS obscures meaningful discussion of the issues involved and hinders their solution. This paper seeks to foster a common basis for communication about supercomputer RAS, by proposing a system state model, definitions, and measurements. These are modeled after the SEMI-E10 specification which is widely used in the semiconductor manufacturing industry.