Effects of Jacobian Matrix Regularization on the Detectability of Adversarial Samples
The well-known vulnerability of Deep Neural Networks to adversarial samples has led to a rapid cycle of increasingly sophisticated attack algorithms and proposed defenses. While most contemporary defenses have been shown to be vulnerable to carefully configured attacks, methods based on gradient regularization and out-of-distribution detection have attracted much interest recently by demonstrating higher resilience to a broad range of attack algorithms. However, no study has yet investigated the effect of combining these techniques. In this paper, we consider the effect of Jacobian matrix regularization on the detectability of adversarial samples on the CIFAR-10 image benchmark dataset. We find that regularization has a significant effect on detectability, and in some cases can make an undetectable attack on a baseline model detectable. In addition, we give evidence that regularization may mitigate the known weaknesses of detectors to high-confidence adversarial samples. The defenses we consider here are highly generalizable, and we believe they will be useful for further investigations to transfer machine learning robustness to other data domains.