Publications

7 Results
Skip to search filters

Exploring Explicit Uncertainty for Binary Analysis (EUBA)

Leger, Michelle A.; Darling, Michael C.; Jones, Stephen T.; Matzen, Laura E.; Stracuzzi, David J.; Wilson, Andrew T.; Bueno, Denis B.; Christentsen, Matthew C.; Ginaldi, Melissa J.; Hannasch, David A.; Heidbrink, Scott H.; Howell, Breannan C.; Leger, Chris; Reedy, Geoffrey E.; Rogers, Alisa N.; Williams, Jack A.

Reverse engineering (RE) analysts struggle to address critical questions about the safety of binary code accurately and promptly, and their supporting program analysis tools are simply wrong sometimes. The analysis tools have to approximate in order to provide any information at all, but this means that they introduce uncertainty into their results. And those uncertainties chain from analysis to analysis. We hypothesize that exposing sources, impacts, and control of uncertainty to human binary analysts will allow the analysts to approach their hardest problems with high-powered analytic techniques that they know when to trust. Combining expertise in binary analysis algorithms, human cognition, uncertainty quantification, verification and validation, and visualization, we pursue research that should benefit binary software analysis efforts across the board. We find a strong analogy between RE and exploratory data analysis (EDA); we begin to characterize sources and types of uncertainty found in practice in RE (both in the process and in supporting analyses); we explore a domain-specific focus on uncertainty in pointer analysis, showing that more precise models do help analysts answer small information flow questions faster and more accurately; and we test a general population with domain-general sudoku problems, showing that adding "knobs" to an analysis does not significantly slow down performance. This document describes our explorations in uncertainty in binary analysis.

More Details

RAMSeS: Rapid Analysis of Mission Software Systems

Ghormley, Douglas P.; Jones, Stephen T.; Bueno, Denis B.; Leger, Michelle A.; Loffredo, Timothy; Reedy, Geoffrey E.

Over the past few decades, software has become ubiquitous as it has been integrated into nearly every aspect of society, including household appliances, consumer electronics, industrial control systems, public utilities, government operations, and military systems. Consequently, many critical national security questions can no longer be answered convincingly without understanding software, including its purpose, its capabilities, its flaws, its communication, or how it processes and stores data. As software continues to become larger, more complex, and more widespread, our ability to answer important mission questions and reason about software in a timely way is falling behind. Today, to achieve such understanding of third-party software, we rely predominantly on the ability of reverse engineering experts to manually answer each particular mission question for every software system of interest. This approach often requires heroic human effort that nevertheless fails to meet current mission needs and will never scale to meet future needs. The result is an emerging crisis: a massive and expanding gap between the national security need to answer mission questions about software and our ability to do so. Sandia National Laboratories has established the Rapid Analysis of Mission Software Systems (RAMSeS) effort, a collaborative long-term effort aimed at dramatically improving our nation’s ability to answer mission questions about third-party software by growing an ecosystem of tools that augment the human reverse engineer through automation, interoperability, and reuse. Focusing on static analysis of binary programs, we are attempting to identify reusable software analysis components that advance our ability to reason about software, to automate useful aspects of the software analysis process, and to integrate new methodologies and capabilities into a working ecosystem of tools and experts. We aim to integrate existing tools where possible, adapt tools when modest modifications will enable them to interoperate, and implement missing capability when necessary. Although we do hope to automate a growing set of analysis tasks, we will approach this goal incrementally by assisting the human in an ever-widening range of tasks.

More Details

Creating a User-Centric Data Flow Visualization: A Case Study

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Butler, Karin B.; Leger, Michelle A.; Bueno, Denis B.; Cueller, Christopher R.; Haass, Michael J.; Loffredo, Timothy; Reedy, Geoffrey E.; Tuminaro, Julian T.

Vulnerability analysts protecting software lack adequate tools for understanding data flow in binaries. We present a case study in which we used human factors methods to develop a taxonomy for understanding data flow and the visual representations needed to support decision making for binary vulnerability analysis. Using an iterative process, we refined and evaluated the taxonomy by generating three different data flow visualizations for small binaries, trained an analyst to use these visualizations, and tested the utility of the visualizations for answering data flow questions. Throughout the process and with minimal training, analysts were able to use the visualizations to understand data flow related to security assessment. Our results indicate that the data flow taxonomy is promising as a mechanism for improving analyst understanding of data flow in binaries and for supporting efficient decision making during analysis.

More Details

Understanding Data Structures by Extracting Memory Access Graphs

Reedy, Geoffrey E.; Bertels, Alex R.; Sorensen, Asael H.

Understanding the data structures employed by a program is important for reverse engineering activities and can improve the results of automated software analysis techniques. In a compiled binary, access to data structure fields and array indices defined in the source program are replaced by raw pointer arithmetic. We present a representation for capturing the essential details of how a program accesses memory regions, which we call a Memory Access Graph (MAG), and a static analysis for automatically extracting this information from a program binary. The static analysis to extract the MAGs from the program is straightforward and does not require sophisticated integer or pointer analysis. The MAGs are readily understood by reverse engineers; they are generally able to perceive the data structure definition corresponding to a MAG. We briefly discuss automatic extraction of structure definitions outlining some of the difficulties in doing so.

More Details
7 Results
7 Results