Publications

Publications / SAND Report

Defending Against Adversarial Examples

Short, Austin S.; La Pay, Trevor L.; Gandhi, Apurva G.

Adversarial machine learning is an active field of research that seeks to investigate the security of machine learning methods against cyber-attacks. An important branch of this field is adversarial examples, which seek to trick machine learning models into misclassifying inputs by maliciously tampering with input data. As a result of the pervasiveness of machine learning models in diverse areas such as computer vision, health care, and national security, this vulnerability is a rapidly growing threat. With the increasing use of AI solutions, threats against AI must be considered before deploying systems in a contested space. Adversarial machine learning is a problem strongly tied to software security, and just like other more common software vulnerabilities, it exploits a weakness in software, like components of machine learning models. During this project, we attempted to survey and replicate several adversarial machine learning techniques with the goal of developing capabilities for Sandia to advise and defend against these threats. To accomplish this, we scanned state of the art research for robust defenses against adversarial examples and applied them to a machine learning problem. We surveyed 12 peer-reviewed papers on adversarial machine learning, analyzed the results, and applied the most effective attacks and defense sets on our own machine learning testbed. We trained several neural networks against common image recognition problems to a high level of accuracy, and then attempted to degrade their accuracy by passing them data that was perturbed by an attack. We measured attack efficacy by how much it degraded the accuracy of a finely tuned model, and we measured defense efficacy by how well it prevented accuracy degradation using attack data. We also discuss the value of applying software security techniques to machine learning development efforts, such as threat modeling, to understand potential attacks against models in development, and properly address them. We were not able to find any reliable, robust defense to a wide range of adversarial example attacks on machine learning. Our recommendation is to mitigate such attacks by not deploying machine learning solutions in contested or adversarial spaces; if the model must be deployed to a contested space, adversarial tampering should be added as a valid misuse case, and should be armored using adversarial training to hamper attackers. We also advocate for the use of these techniques demonstrated here to test a model against this sort of attacks, to determine the model's exposure. It is also important to apply general cyber security techniques in development of a machine learning pipeline to make tampering with input data more difficult. ACKNOWLEDGEMENTS We would like to thank Kristin Adair for spurring this project, and the guidance and direction she provided throughout the course of the project. We would also like to thank the 10774 organization for supporting the Synapse server at Sandia, which made this work possible.