Publications

21 Results
Skip to search filters

Autonomy and Complexity at Sandia Executive Summary of Academic Alliance Workshop on Autonomy and Complex Systems

Hayden, Nancy K.; Hayden, Nancy K.; Hayden, Nancy K.; Kleban, S.D.

Sandia has identified autonomy as a strategic initiative and an important area for providing national leadership. A key question is, “How might autonomy change how we think about the national security challenges we address and the kinds of solutions we deliver?” Three workshops at Sandia early in 2017 brought together internal stakeholders and potential academic partners in autonomy to address this question. The first focused on programmatic applications and needs. The second explored existing internal capabilities and research and development needs. This report summarizes the outcome of the third workshop, held March 3, 2017 in Albuquerque, NM, which engaged Academic Alliance partners in autonomy efforts at Sandia by discussing research needs and synergistic areas of interest within the complex systems and system modeling domains, and identifying opportunities for partnering on laboratory directed and other joint research opportunities.

More Details

Complex Systems Models and Their Applications: Towards a New Science of Verification, Validation & Uncertainty Quantification

Tsao, Jeffrey Y.; Trucano, Timothy G.; Kleban, S.D.; Naugle, Asmeret B.; Verzi, Stephen J.; Swiler, Laura P.; Johnson, Curtis M.; Smith, Mark A.; Flanagan, Tatiana P.; Vugrin, Eric D.; Gabert, Kasimir G.; Lave, Matthew S.; Chen, Wei C.; DeLaurentis, Daniel D.; Hubler, Alfred H.; Oberkampf, Bill O.

This report contains the written footprint of a Sandia-hosted workshop held in Albuquerque, New Mexico, June 22-23, 2016 on “Complex Systems Models and Their Applications: Towards a New Science of Verification, Validation and Uncertainty Quantification,” as well as of pre-work that fed into the workshop. The workshop’s intent was to explore and begin articulating research opportunities at the intersection between two important Sandia communities: the complex systems (CS) modeling community, and the verification, validation and uncertainty quantification (VVUQ) community The overarching research opportunity (and challenge) that we ultimately hope to address is: how can we quantify the credibility of knowledge gained from complex systems models, knowledge that is often incomplete and interim, but will nonetheless be used, sometimes in real-time, by decision makers?

More Details

Simulating performance sensitivity of supercomputer job parameters

Kleban, S.D.; Kleban, S.D.

We report on the use of a supercomputer simulation to study the performance sensitivity to systematic changes in the job parameters of run time, number of CPUs, and interarrival time. We also examine the effect of changes in share allocation and service ratio for job prioritization under a Fair Share queuing Algorithm to see the effect on facility figures of merit. We used log data from the ASCI supercomputer Blue Mountain and the ASCI simulator BIRMinator to perform this study. The key finding is that the performance of the supercomputer is quite sensitive to all the job parameters with the interarrival rate of the jobs being most sensitive at the highest rates and increasing run times the least sensitive job parameter with respect to utilization and rapid turnaround. We also find that this facility is running near its maximum practical utilization. Finally, we show the importance of the use of simulation in understanding the performance sensitivity of a supercomputer.

More Details

Interstitial computing: Utilizing spare cycles on supercomputers

Proceedings - IEEE International Conference on Cluster Computing, ICCC

Kleban, S.D.; Clearwater, Scott H.

This paper presents an analysis of utilizing unused cycles on supercomputers through the use of many small jobs. What we call "interstitial computing," is important to supercomputer centers for both productivity and political reasons. Interstitial computing makes use of the fact that small jobs are more or less fungible consumers of compute cycles that are more efficient for bin packing than the typical jobs on a supercomputer. An important feature of interstitial computing is that it not have a significant impact on the makespan of native jobs on the machine. Also, a facility can obtain higher utilizations that may only be otherwise possible with more complicated schemes or with very long wait times. The key contribution of this paper is that it provides theoretical and empirical guidelines for users and administrators for how currently unused supercomputer cycles may be exploited. We find that that interstitial computing is a more effective means for increasing machine utilization than increasing native job run times or size.

More Details

Fair share on high performance computing systems: What does fair really mean?

Proceedings - CCGrid 2003: 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid

Kleban, S.D.; Clearwater, Scott H.

We report on a performance evaluation of a Fair Share system at the ASCI Blue Mountain supercomputer cluster. We study the impacts of share allocation under Fair Share on wait times and expansion factor. We also measure the Service Ratio, a typical figure of merit for Fair Share systems, with respect to a number of job parameters. We conclude that Fair Share does little to alter important performance metrics such as expansion factor. This leads to the question of what Fair Share means on cluster machines. The essential difference between Fair Share on a uni-processor and a cluster is that the workload on a cluster is not fungible in space or time. We find that cluster machines must be highly utilized and support checkpointing in order for Fair Share to function more closely to the spirit in which it was originally developed. © 2003 IEEE.

More Details

Quelling queue storms

Proceedings of the IEEE International Symposium on High Performance Distributed Computing

Kleban, S.D.; Clearwater, Scott H.

This paper characterizes "queue storms" in supercomputer systems and discusses methods for quelling them. Queue storms are anomalously large queue lengths dependent upon the job size mix, the queuing system, the machine size, and correlations and dependencies between job submissions. We use synthetic data generated from actual job log data from the ASCI Blue Mountain supercomputer combined with different long-range dependencies. We show the distribution of times from the first storm to occur, which is in a sense the time when the machine becomes obsolete because it represents the time when the machine first fails to provide satisfactory turnaround. To overcome queue storms, more resources are needed even if they appear superfluous most of the time. We present two methods, including a grid-based solution, for reducing these correlations and their resulting effect on the size and frequency of queue storms.

More Details

Collaborative evaluation of early design decisions and product manufacturability

Proceedings of the Hawaii International Conference on System Sciences

Kleban, S.D.; Stubblefield, W.A.; Mitchiner, K.W.; Mitchiner, John L.; Arms, M.

In manufacturing, the conceptual design and detailed design stages are typically regarded as sequential and distinct. Decisions made in conceptual design are often made with little information as to how they would affect detailed design or manufacturing process specification. Many possibilities and unknowns exist in conceptual design where ideas about product shape and functionality are changing rapidly. Few if any tools exist to aid in this difficult, amorphous stage in contrast to the many CAD and analysis tools for detailed design where much more is known about the final product. The Materials Process Design Environment (MPDE) is a collaborative problem solving environment (CPSE) that was developed so geographically dispersed designers in both the conceptual and detailed stage can work together and understand the impacts of their design decisions on functionality, cost and manufacturability.

More Details
21 Results
21 Results