Qiongyi Zhou, Changde Du, Huiguang He. Exploring the Brain-like Properties of Deep Neural Networks: A Neural Encoding Perspective. Machine Intelligence Research. https://doi.org/10.1007/s11633-022-1348-x
Citation: Qiongyi Zhou, Changde Du, Huiguang He. Exploring the Brain-like Properties of Deep Neural Networks: A Neural Encoding Perspective. Machine Intelligence Research. https://doi.org/10.1007/s11633-022-1348-x

Exploring the Brain-like Properties of Deep Neural Networks: A Neural Encoding Perspective

doi: 10.1007/s11633-022-1348-x
More Information
  • Author Bio:

    Qiongyi Zhou received the B. Sc. degree in automation from School of Automation Science and Engineering, South China University of Technology, China in 2018. Currently, she is a Ph. D. degree candidate in pattern recognition and intelligence systems at Institute of Automation, Chinese Academy of Sciences, China. Her research interests include deep learning, neural encoding and decoding, cross-modal generation, and brain-inspired intelligence. E-mail: zhouqiongyi2018@ia.ac.cn ORCID iD: 0000-0001-9498-9523

    Changde Du received the Ph. D. degree in technology of computer application from Institute of Automation, Chinese Academy of Sciences, China in 2019. He is currently an assistant professor at Institute of Automation, Chinese Academy of Sciences, China. He has published over 30 peer-reviewed research papers in prestigious conferences and journals. He won the following awards: National Scholarship for Doctoral Students, 2018, President Prize of Chinese Academy of Sciences for Excellent Ph.D. Graduates, 2019. His research interests include deep learning, computational neuroscience, brain-inspired intelligence, computer vision and brain-computer interfaces. E-mail: changde.du@ia.ac.cn

    Huiguang He received the B. Sc. degree in maritime traffic administration and the M. Sc. degree in maritime traffic engineering from Dalian Maritime University (DMU) China, in 1994 and 1997, respectively, and the Ph. D. degree (with honor) in pattern recognition and intelligent systems from Institute of Automation, Chinese Academy of Sciences (CASIA), China in 2002. He was an associate lecturer in Dalian Maritime University (DMU) from 1997 to 1999, and postdoctoral researcher in University of Rochester, USA from 2003 to 2004. He was a visiting professor in University of North Carolina at Chapel Hill, USA from 2014 to 2015. He is currently a full professor with CASIA. His research has been supported by several research grants from National Science Foundation of China, and he has published more than 180 peer-reviewed papers. He won the following awards: Excellent Ph. D. Dissertation of Chinese Academy of Sciences in 2004, National Science & Technology Award in 2003 and 2004, Beijing Science & Technology Award in 2002 and 2003, K.C. Wong Education Prizes in 2007 and 2009, Jia-Xi Lu Young Talent Prize in 2009 and Excellent Member of Youth Innovation Promotion Association, CAS in 2016. He is a Senior Member of the IEEE. His research interests include pattern recognition, medical image processing, and brain computer interface (BCI). E-mail: huiguang.he@ia.ac.cn (Corresponding author) ORCID iD: 0000-0002-0684-1711

  • Received Date: 2022-04-07
  • Accepted Date: 2022-06-10
  • Publish Online: 2022-07-19
  • Nowadays, deep neural networks (DNNs) have been equipped with powerful representation capabilities. The deep convolutional neural networks (CNNs) that draw inspiration from the visual processing mechanism of the primate early visual cortex have outperformed humans on object categorization and have been found to possess many brain-like properties. Recently, vision transformers (ViTs) have been striking paradigms of DNNs and have achieved remarkable improvements on many vision tasks compared to CNNs. It is natural to ask how the brain-like properties of ViTs are. Beyond the model paradigm, we are also interested in the effects of factors, such as model size, multimodality, and temporality, on the ability of networks to model the human visual pathway, especially when considering that existing research has been limited to CNNs. In this paper, we systematically evaluate the brain-like properties of 30 kinds of computer vision models varying from CNNs and ViTs to their hybrids from the perspective of explaining brain activities of the human visual cortex triggered by dynamic stimuli. Experiments on two neural datasets demonstrate that neither CNN nor transformer is the optimal model paradigm for modelling the human visual pathway. ViTs reveal hierarchical correspondences to the visual pathway as CNNs do. Moreover, we find that multi-modal and temporal networks can better explain the neural activities of large parts of the visual cortex, whereas a larger model size is not a sufficient condition for bridging the gap between human vision and artificial networks. Our study sheds light on the design principles for more brain-like networks. The code is available at https://github.com/QYiZhou/LWNeuralEncoding.

     

  • loading
  • [1]
    W. Liu, Z. Wang, X. Liu, N. Zeng, Y. Liu, F. E. Alsaadi. A survey of deep neural network architectures and their applications. Neurocomputing, vol. 234, pp. 11–26, 2017. DOI: 10.1016/j.neucom.2016.12.038.
    [2]
    A. Krizhevsky, I. Sutskever, G. E. Hinton Imagenet classification with deep convolutional neural networks. In Proceedings of Advances in Neural Information Processing Systems, Nevada, USA, vol. 25, pp. 1097–1105, 2012.
    [3]
    A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, N. Houlsby. An image is worth 16 × 16 words: Transformers for image recognition at scale. In Proceedings of the International Conference on Learning Representations, Vienna, Austria, 2021.
    [4]
    Q.-S. Zhang, S.-C. Zhu. Visual interpretability for deep learning: A survey. Frontiers of Information Technology and Electronic Engineering, vol. 19, no. 1, pp. 27–39, 2018. DOI: 10.1631/FITEE.1700808.
    [5]
    U. Güçlü, M. A. van Gerven. Deep neural networks reveal a gradient in the complexity of neural representations across the ventral stream. Journal of Neuroscience, vol. 35, no. 27, pp. 10005–10014, 2015. DOI: 10.1523/JNEUROSCI.5023-14.2015.
    [6]
    C. Conwell, J. S. Prince, G. A. Alvarez, T. Konkle. What can 5.17 billion regression fits tell us about artificial models of the human visual system? In Proceedings of Neural Information Processing Systems, Montreal, Canada, 2021. [Online], Available: https://openreview.net/forum?id=i_xiyGq6FNT.
    [7]
    S. Huang, W. Shao, M.-L. Wang, D.-Q. Zhang. Fmri-based decoding of visual information from human brain activity: A brief review. International Journal of Automation and Computing, vol. 18, no. 2, pp. 170–184, 2021. DOI: 10.1007/s11633-020-1263-y.
    [8]
    M. Khosla, G. H. Ngo, K. Jamison, A. Kuceyeski, M. R. Sabuncu. Cortical response to naturalistic stimuli is largely predictable with deep neural networks. Science Advances, vol. 7, no. 22, Article number 7547, 2021. DOI: 10.1126/sciadv.abe7547.
    [9]
    Y. Xu, M. Vaziri-Pashkam. Limits to visual representational correspondence between convolutional neural networks and the human brain. Nature Communications, vol. 12, no. 1, pp. 1–16, 2021.
    [10]
    P. Bashivan, K. Kar, J. J. DiCarlo. Neural population control via deep image synthesis. Science, vol. 364, no. 6439, Article number 9436, 2019. DOI: 10.1126/science.aav9436.
    [11]
    S. M. Khaligh-Razavi, N. Kriegeskorte. Deep supervised, but not unsupervised, models may explain it cortical representation. PLoS computational biology, vol. 10, no. 11, Article number 1003915, 2014. DOI: 10.1371/journal.pcbi.1003915.
    [12]
    X. Q. Zhang, R. H. Jiang, C. X. Fan, T. Y. Tong, T. Wang, P. C. Huang. Advances in deep learning methods for visual tracking: Literature review and fundamentals. International Journal of Automation and Computing, vol. 18, no. 3, pp. 311–333, 2021. DOI: 10.1007/s11633-020-1274-8.
    [13]
    F. Sultana, A. Sufian, P. Dutta. Evolution of image segmentation using deep convolutional neural network: A survey. Knowledge-Based Systems, vol. 201-202, Article number 106062, 2020. DOI: 10.1016/j.knosys.2020.106062.
    [14]
    N. Sharma, V. Jain, A. Mishra. An analysis of convolutional neural networks for image classification. Procedia computer Science, vol. 132, pp. 377–384, 2018. DOI: 10.1016/j.procs.2018.05.198.
    [15]
    K. Hermann, T. Chen, S. Kornblith. The origins and prevalence of texture bias in convolutional neural networks. In Proceedings of Advances in Neural Information Processing Systems, Vancouver, Canada, vol. 33, pp. 19000–19015, 2020.
    [16]
    S. Tuli, I. Dasgupta, E. Grant, T. L. Griffiths: Are convolutional neural networks or transformers more like human vision? [Online], Available: https://arxiv.org/abs/2105.07197, 2021.
    [17]
    D. Li, C. Du, H. He. Semi-supervised cross-modal image generation with generative adversarial networks. Pattern Recognition, vol. 100, Article number 107085, 2020. DOI: 10.1016/j.patcog.2019.107085.
    [18]
    C, Du, C. Du, L. Huang, H. Wang, H. He. Structured neural decoding with multitask transfer learning of deep neural network representations. IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 2, pp. 600–614, 2022. DOI: 10.1109/TNNLS.2020.3028167.
    [19]
    H. Wang, L. Huang, C. Du, D. Li, B. Wang, H. He. Neural encoding for human visual cortex with deep neural networks learning “what” and “where”. IEEE Transactions on Cognitive and Developmental Systems, vol. 13, no. 4, pp. 827–840, 2020.
    [20]
    R. M. Cichy, A. Khosla, D. Pantazis, A. Torralba, A. Oliva. Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence. Scientific Reports. vol.6, no.1, Article number 1, 2016.
    [21]
    G. St-Yves, T. Naselaris. The feature-weighted receptive field: an interpretable encoding model for complex feature spaces. NeuroImage, vol. 180, pp. 188–202, 2018. DOI: 10.1016/j.neuroimage.2017.06.035.
    [22]
    C. Du, C. Du, L. Huang, H. He. Reconstructing perceived images from human brain activities with bayesian deep multiview learning. IEEE transactions on neural networks and learning systems, vol. 30, no. 8, pp. 2310–2323, 2018.
    [23]
    R. Geirhos, K. Narayanappa, B. Mitzkus, T. Thieringer, M. Bethge, F. A. Wichmann, W. Brendel. Partial success in closing the gap between human and machine vision. In Proceedings of Advances in Neural Information Processing Systems, Montreal, Canada, vol. 34, pp.23885–23899, 2021.
    [24]
    K. Han, Y. Wang, H. Chen, X. Chen, J. Guo, Z. Liu, Y. Tang, A. Xiao, C. Xu, Y. Xu, Z. Yang, Y Zhang, D Tao. A survey on vision transformer. IEEE Transactions on Pattern Analysis and Machine Intelligence, to be publised, 2022. DOI: 10.1109/TPAMI.2022.3152247
    [25]
    Y. Wang, X. Zhang, T. Yang, J. Sun. Anchor detr: Query design for transformer-based detector. [Online], Available: https://arxiv.org/abs/2109.07107, 2021.
    [26]
    H. Zhang, F. Li, S. Liu, L. Zhang, H. Su, J. Zhu, L. M. Ni, H. Y. Shum. DINO: DETR with improved denoising anchor boxes for end-to-end object detection. [Online], Available: https://arxiv.org/abs/2203.03605, 2022.
    [27]
    W. Wang, E. Xie, X. Li, D. P. Fan, K. Song, D. Liang, T. Lu, P. Luo, L. Shao. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Montreal, Canada, pp. 568–578. 2021.
    [28]
    Y. H. Wu, Y. Liu, X. Zhan, M. M. Cheng. P2T: Pyramid pooling transformer for scene understanding. [Online], Available: https://arxiv.org/abs/2106.12011, 2021.
    [29]
    B. Cheng, A. G. Schwing, A. Kirillov. Per-pixel classification is not all you need for semantic segmentation. In Proceedings of Advances in Neural Information Processing Systems, Montreal, Canada, vol. 34, pp. 17864–17875, 2021.
    [30]
    E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, P. Luo. Segformer: Simple and efficient design for semantic segmentation with transformers. In Proceedings of Advances in Neural Information Processing Systems, Montreal, Canada, vol. 34, pp. 12077–12090, 2021.
    [31]
    G. Bertasius, H. Wang, L. Torresani. Is space-time attention all you need for video understanding? In Proceedings of the International Conference on Machine Learning, 2021.
    [32]
    A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark. Learning transferable visual models from natural language supervision. In Proceedings of International Conference on Machine Learning, 2021.
    [33]
    S. Xie, C. Sun, J. Huang, Z. Tu, K. Murphy. Rethinking spatiotemporal feature learning: Speed-accuracy trade-offs in video classification. In Proceedings of the European Conference on Computer, Springer, Munich, Germany, pp. 305–321, 2018.
    [34]
    B. Choksi, M. Mozafari, R. Vanrullen, L. Reddy. Multimodal neural networks better explain multivoxel patterns in the hippocampus. In Proceedings of Neural Information Processing Systems, Montreal, Canada, 2021. [Online], Available: https://openreview.net/forum?id=6dymbuga7nL.
    [35]
    J. Shi, H. Wen, Y. Zhang, K. Han, Z. Liu. Deep recurrent neural network reveals a hierarchy of process memory during dynamic natural vision. Human brain mapping, vol. 39, no. 5, pp. 2269–2282, 2018. DOI: 10.1002/hbm.24006.
    [36]
    C. Conwell, D. Mayo, A. Barbu, M. Buice, G. Alvarez, B. Katz. Neural regression, representational similarity, model zoology & neural taskonomy at scale in rodent visual cortex. In Proceedings of Advances in Neural Information Processing Systems, Montreal, Canada, vol. 34, pp. 5590–5607, 2021.
    [37]
    N. Kriegeskorte, M. Mur, P.A. Bandettini. Representational similarity analysis-connecting the branches of Systems Neuroscience. Frontiers in Systems Neuroscience, vol. 2, Article number 4, 2008. DOI: 10.3389/neuro.01.016.2008.
    [38]
    A. Steiner, A. Kolesnikov, X. Zhai, R. Wightman, J. Uszkoreit, L. Beyer. How to train your vit? Data, augmentation, and regularization in vision transformers. [Online], Available: https://arxiv.org/abs/2106.10270, 2021.
    [39]
    H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, H. Jégou. Training data-efficient image transformers & distillation through attention. In Proceedings of International Conference on Machine Learning, vol. 139, pp. 10347–10357, 2021.
    [40]
    S. D′Ascoli, H. Touvron, M. L. Leavitt, A. S. Morcos, G. Biroli, L. Sagun. Convit: Improving vision transformers with soft convolutional inductive biases. In Proceedings of International Conference on Machine Learning, Virtual, vol. 139, pp. 2286–2296, 2021.
    [41]
    R. M. Cichy, K. Dwivedi, B. Lahner, A. Lascelles, P. Iamshchinina, M. Graumann, A. Andonian, N. Murty, K. Kay, G. Roig, et al. The algonauts project 2021 challenge: How the human brain makes sense of a world in motion. [Online], Available: https://arxiv.org/abs/2104.13714, 2021.
    [42]
    A. G. Huth, S. Nishimoto, A. T. Vu, J. L. Gallant. A continuous semantic space describes the representation of thousands of object and action categories across the human brain. Neuron, vol. 76, no. 6, pp. 1210–1224, 2012. DOI: 10.1016/j.neuron.2012.10.014.
    [43]
    A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, S. Chintala. Pytorch: An imperative style, high-performance deep learning library. In Proceedings of Advances in Neural Information Processing Systems, Vancouver, Canada, vol. 32, 8024–8035, 2019.
    [44]
    R. Wightman. PyTorch Image Models. GitHub, 2019. [Online], Available: https://doi.org/10.5281/zenodo.4414861.
    [45]
    K. Simonyan, A. Zisserman. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations, San Diego, USA, 2015.
    [46]
    K. He, X. Zhang, S. Ren, J. Sun. Deep residual learning for image recognition. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE Las Vegas, USA, pp. 770–778, 2016.
    [47]
    Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, B. Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, IEEE, Montreal, Canada, pp. 10012–10022, 2021.
    [48]
    I. Radosavovic, R. P. Kosaraju, R. Girshick, K. He, P. Dollár. Designing network design spaces. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, pp. 10428–10436, 2020.
    [49]
    P. Li, T. J. Hastie, K. W. Church. Very sparse random projections. International Conference on Knowledge Discovery and Data Mining, ACM, Philadelphia, USA, pp. 287–296. 2006.
    [50]
    J. A. Mazer, W. E. Vinje, J. McDermott, P. H. Schiller, J. L. Gallant. Spatial frequency and orientation tuning dynamics in area v1. Proceedings of the National Academy of Sciences, vol. 99, no. 3, pp. 1645–1650, 2002. DOI: 10.1073/pnas.022638499.
    [51]
    I. Bramão, L. Faísca, C. Forkstam, A. Reis, K.M. Petersson. Cortical brain regions associated with color processing: An fmri study. The Open Neuroimaging Journal, vol. 4, pp. 164–173, 2010. DOI: 10.2174/1874440001004010164.
    [52]
    H. Tanigawa, H. D. Lu, A. W. Roe. Functional organization for color and orientation in macaque V4. Nature neuroscience, vol. 13, no. 12, pp. 1542–1548, 2010. DOI: 10.1038/nn.2676.
    [53]
    D. Y. Tsao, W. A. Freiwald, T. A. Knutsen, J. B. Mandeville, R. B. Tootell. Faces and objects in macaque cerebral cortex. Nature Neuroscience, vol. 6, no. 9, pp. 989–995, 2003. DOI: 10.1038/nn1111.
    [54]
    P. E. Downing, Y. Jiang, M. Shuman, N. Kanwisher. A cortical area selective for visual processing of the human body. Science, vol. 293, no. 5539, pp. 2470–2473, 2001. DOI: 10.1126/science.1063414.
    [55]
    S. Kornblith, X. Cheng, S. Ohayon, D. Y. Tsao. A network for scene processing in the macaque temporal lobe. Neuron, vol. 79, no. 4, pp. 766–781, 2013. DOI: 10.1016/j.neuron.2013.06.015.
    [56]
    B.R. Conway. The organization and operation of inferior temporal cortex. Annual Review of Vision Science, vol. 4, pp. 381–402, 2018. DOI: 10.1146/annurev-vision-091517-034202.
    [57]
    M. Manassi, B. Sayim, M. H. Herzog. When crowding of crowding leads to uncrowding. Journal of vision, vol. 13, Aiticle number 10, 2013.
    [58]
    W. Luo, Y. Li, R. Urtasun, R. Zemel. Understanding the effective receptive field in deep convolutional neural networks. In Proceedings of Advances in Neural Information Processing Systems, Barcelona, Spain, vol. 29, pp. 4898–4906, 2016.
    [59]
    M. M. Naseer, K. Ranasinghe, S. H. Khan, M. Hayat, F. Shahbaz Khan, M. H. Yang. Intriguing properties of vision transformers. In Proceedings of Advances in Neural Information Processing Systems, Montreal, Canada, vol. 34, pp. 23296–23308, 2021.
    [60]
    S. Nonaka, K. Majima, S. C. Aoki, Y. Kamitani. Brain hierarchy score: Which deep neural networks are hierarchically brain-like? Iscience, vol. 24, no. 9, Article number 103013, 2021. DOI: 10.1016/j.isci.2021.103013.
    [61]
    M. M. Murray, A. Thelen, G. Thut, V. Romei, R. Martuzzi, P. J. Matusz. The multisensory function of the human primary visual cortex. Neuropsychologia, vol. 83, pp. 161–169, 2016. DOI: 10.1016/j.neuropsychologia.2015.08.011.
    [62]
    T. Raij, J. Ahveninen, F. H. Lin, T. Witzel, I. P. Jääskeläinen, B. Letham, E. Israeli, C. Sahyoun, C. Vasios, S. Stufflebeam. Onset timing of cross-sensory activations and multisensory interactions in auditory and visual sensory cortices. European Journal of Neuroscience, vol. 31, no. 10, pp. 1772–1782, 2010. DOI: 10.1111/j.1460-9568.2010.07213.x.
    [63]
    C. Cappe, G. Thut, V. Romei, M. M. Murray. Auditory–visual multisensory interactions in humans: Timing, topography, directionality, and sources. Journal of Neuroscience, vol. 30, no. 38, pp. 12572–12580, 2010. DOI: 10.1523/JNEUROSCI.1099-10.2010.
    [64]
    T. Raij, K. Uutela, R. Hari. Audiovisual integration of letters in the human brain. Neuron, vol. 28, no. 2, pp. 617–625, 2000. DOI: 10.1016/S0896-6273(00)00138-0.
    [65]
    T. C. Kietzmann, C. J. Spoerer, L. K. Sörensen, R. M. Cichy, O. Hauk, N. Kriegeskorte. Recurrence is required to capture the representational dynamics of the human visual system. Proceedings of the National Academy of Sciences, vol. 116, no. 43, pp. 21854–21863, 2019. DOI: 10.1073/pnas.1905544116.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(14)  / Tables(2)

    用微信扫码二维码

    分享至好友和朋友圈

    Article Metrics

    Article views (74) PDF downloads(77) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return