Towards Interpretable Defense Against Adversarial Attacks via Causal Inference

Deep learning-based models are vulnerable to adversarial attacks. Defense against adversarial attacks is essential for sensitive and safety-critical scenarios. However, deep learning methods still lack effective and efficient defense mechanisms against adversarial attacks. Most of the existing methods are just stopgaps for specific adversarial samples. This paper adopts causal inference to explore the working mechanism of adversarial samples. A causal model is established to describe the generation and performance of adversarial samples. Related work is published at the third issue of Machine Intelligence Research in 2022. 


From Springer

 

Deep learning methods open a new era of artificial intelligence. In the field of computer vision, deep learning methods have achieved great success in image classification, object detection, and image segmentation. Deep neural networks showcase the powerful capability to perform a nonlinear mapping from raw data to high-level features. However, adversarial samples cast a shadow over the notable success of deep learning. The “powerful” deep learning modules are vulnerable to various adversarial attacking algorithms. Using well-crafted perturbations, attackers can undermine predictions from state-of-the-art models, even though the perturbations cannot be spotted by humans. This problem prevents the application of deep methods in sensitive and safety-critical scenarios. Hence, defense against adversarial attacks is of considerable concern and has become an essential research topic.

 

Numerous studies on defense against adversarial attacks have been reported. However, it is unclear how adversarial samples fool deep learning models. The underlying working mechanism of adversarial samples deserves more exploration and study. Hence, most existing methods are just stopgaps for specific adversarial samples. For example, adversarial training, which introduces adversarial samples into the training process, is widely popular as a defense method. However, the generalization capability of adversarial training-based methods is quite limited, especially for unseen attacks.

 

In order to defend against adversarial attacks, it is necessary to reveal the working mechanism of adversarial samples. In this paper, we adopt causal inference to explore the working mechanism of adversarial samples. Compared to the methodology based on statistics, causal inference models the relationship between variates more naturally and intrinsically.

 

A causal model is established to describe the generation and performance of adversarial samples. The causal model enables us to estimate the causal effects between the outputs of the deep neural network and the subregions of adversarial samples, which cannot be realized by data-driven/statistical methodology. Hence, the tampered predictions can be attributed to the subregions, which means that it is possible to interpret the adversarial sample and reveal its working mechanism.

 

The main contributions of this paper can be summarized as follows:

 

1) We build a causal model to interpret the generation and performance of adversarial samples. The causal model enables us to estimate the causality between the outputs of the deep neural network and the subregions of the input samples.

 

2) Based on causal inference, the working mechanism of the adversarial samples is revealed. The causal effect of different subregions of an adversarial sample can be inconsistent, even opposing. Usually, only a small portion of the adversarial sample plays a decisive role in fooling the recognition model.

 

3) According to these discoveries, we propose simple and effective strategies for defense against adversarial attacks. These strategies enable us to detect and recognize adversarial samples without additional models or training.

 

 

Download full text

Towards Interpretable Defense Against Adversarial Attacks via Causal Inference

Min Ren, Yun-Long Wang, Zhao-Feng He

https://www.mi-research.net/en/article/doi/10.1007/s11633-022-1330-7

https://link.springer.com/article/10.1007/s11633-022-1330-7

  • Share:
Release Date: 2022-06-09 Visited: