Denoised Internal Models: A Brain-inspired Autoencoder Against Adversarial Attacks

Despite its great success, deep learning severely suffers from robustness; i.e., deep neural networks are very vulnerable to adversarial attacks, even the simplest ones. Inspired by recent advances in brain science, we propose the denoised internal models (DIM), a novel generative autoencoder-based model to tackle this challenge. Simulating the pipeline in the human brain for visual signal processing, DIM adopts a two-stage approach. In the first stage, DIM uses a denoiser to reduce the noise and the dimensions of inputs, reflecting the information pre-processing in the thalamus. Inspired by the sparse coding of memory-related traces in the primary visual cortex, the second stage produces a set of internal models, one for each category. We evaluate DIM over 42 adversarial attacks, showing that DIM effectively defenses against all the attacks and outperforms the SOTA on the overall robustness on the MNIST (Modified National Institute of Standards and Technology) dataset.



The great advances in deep learning (DL) techniques bring us a large number of sophisticated models that approach human-level performance in a broad spectrum of tasks, such as image classification, speech recognition, and natural language processing. Despite its success, deep neural network (DNN) models are vulnerable to adversarial attacks. Even with adding human-unrecognizable perturbations, the predictions of the underlying network model could be completely altered. On the other hand, the human brain, treated as an information processing system, enjoys remarkably high robustness. A question naturally arises whether knowledge about the working mechanism of the human brain can help us improve the adversarial robustness of DNN models.


Biological systems keep being an illuminating source of human engineering design. Two famous relevant examples are the perceptron model and the rectified linear unit (ReLU) activation function. Further, the recurrent neural network (RNN) architecture also has its origin in the study of how to process time-series data, like natural language. In this work, we draw inspiration from the visual signal processing paradigm of the human brain and propose a novel model to address the robustness issue in the image classification task.


With the recent progress in neuroscience, we now better understand the information processing pipeline in the human brain′s visual system. Two brain areas are involved in this pipeline: the thalamus and the primary visual cortex. Visual signals from the retina will travel to the lateral ganglion nucleus (LGN) of the thalamus before reaching the primary visual cortex. The LGN is specialized in handling visual information, helping to process different kinds of stimuli. In addition, some vertebrates, like zebrafish, have no visual cortex but still have some neural structure similar to the hypothalamus to receive and process visual signals. This fact highlights the importance of such an information pre-processing module in the biological visual signal processing system. The primary visual cortex is one of the best-studied brain areas, which displays a complex 6-layer structure and provides excellent pattern recognition capacities. An important finding reveals that Layer 2/3 of the primary visual cortex contain the so-called engram cells, which only activate for specific stimuli and related ones, such as Jennifer Aniston′s pictures. In other words, the concepts corresponding to those stimuli are encoded sparsely through the engram cell. Furthermore, artificial activation of the engram cells induces corresponding memory retrieval. Those discoveries suggest that there could be internal generative models for different kinds of concepts in the human brain.


Simulating the pipeline mentioned above, we proposed the denoised internal models (DIM) (see Fig. 1(b)), which consists of a global denoising network (a.k.a., denoiser) and a set of generative autoencoders, one for each category. The denoiser helps pre-process the input data, similar to what LGN does. The autoencoders can be regarded as internal models for specific concepts mimicking the function of engram cells in the primary visual cortex. In order to have a comprehensive evaluation of DIM′s robustness, we conduct our experiments on MNIST (Modified National Institute of Standards and Technology), using DIM against 42 attacks in the foolbox v3.2.1 package and comparing its performance to SOTA models on MNIST. The results show that DIM outperforms those models on the overall robustness and has the most stable performance across the 42 attacks.



Download full text

Denoised Internal Models: A Brain-inspired Autoencoder Against Adversarial Attacks

Kai-Yuan LiuXing-Yu LiYu-Rui LaiHang SuJia-Chen WangChun-Xu GuoHong XieJi-Song GuanYi Zhou

  • Share:
Release Date: 2022-11-11 Visited: