Vision plays a peculiar role in intelligence. Visual information, forming a large part of the sensory information, is fed into the human brain to formulate various types of cognition and behaviours that make humans become intelligent agents. Recent advances have led to the development of brain-inspired algorithms and models for machine vision. One of the key components of these methods is the utilization of the computational principles underlying biological neurons. Additionally, advanced experimental neuroscience techniques have generated different types of neural signals that carry essential visual information. Thus, there is a high demand for mapping out functional models for reading out visual information from neural signals. Here, we briefly review recent progress on this issue with a focus on how machine learning techniques can help in the development of models for contending various types of neural signals, from fine-scale neural spikes and single-cell calcium imaging to coarse-scale electroencephalography (EEG) and functional magnetic resonance imaging recordings of brain signals.
Nowadays, deep neural networks (DNNs) have been equipped with powerful representation capabilities. The deep convolutional neural networks (CNNs) that draw inspiration from the visual processing mechanism of the primate early visual cortex have outperformed humans on object categorization and have been found to possess many brain-like properties. Recently, vision transformers (ViTs) have been striking paradigms of DNNs and have achieved remarkable improvements on many vision tasks compared to CNNs. It is natural to ask how the brain-like properties of ViTs are. Beyond the model paradigm, we are also interested in the effects of factors, such as model size, multimodality, and temporality, on the ability of networks to model the human visual pathway, especially when considering that existing research has been limited to CNNs. In this paper, we systematically evaluate the brain-like properties of 30 kinds of computer vision models varying from CNNs and ViTs to their hybrids from the perspective of explaining brain activities of the human visual cortex triggered by dynamic stimuli. Experiments on two neural datasets demonstrate that neither CNN nor transformer is the optimal model paradigm for modelling the human visual pathway. ViTs reveal hierarchical correspondences to the visual pathway as CNNs do. Moreover, we find that multi-modal and temporal networks can better explain the neural activities of large parts of the visual cortex, whereas a larger model size is not a sufficient condition for bridging the gap between human vision and artificial networks. Our study sheds light on the design principles for more brain-like networks. The code is available at
Adversarial example has been well known as a serious threat to deep neural networks (DNNs). In this work, we study the detection of adversarial examples based on the assumption that the output and internal responses of one DNN model for both adversarial and benign examples follow the generalized Gaussian distribution (GGD) but with different parameters (i.e., shape factor, mean, and variance). GGD is a general distribution family that covers many popular distributions (e.g., Laplacian, Gaussian, or uniform). Therefore, it is more likely to approximate the intrinsic distributions of internal responses than any specific distribution. Besides, since the shape factor is more robust to different databases rather than the other two parameters, we propose to construct discriminative features via the shape factor for adversarial detection, employing the magnitude of Benford-Fourier (MBF) coefficients, which can be easily estimated using responses. Finally, a support vector machine is trained as an adversarial detector leveraging the MBF features. Extensive experiments in terms of image classification demonstrate that the proposed detector is much more effective and robust in detecting adversarial examples of different crafting methods and sources compared to state-of-the-art adversarial detection methods.