Citation: | Qiongyi Zhou, Changde Du, Huiguang He. Exploring the Brain-like Properties of Deep Neural Networks: A Neural Encoding Perspective. Machine Intelligence Research, vol. 19, no. 5, pp.439-455, 2022. https://doi.org/10.1007/s11633-022-1348-x |
[1] |
W. Liu, Z. Wang, X. Liu, N. Zeng, Y. Liu, F. E. Alsaadi. A survey of deep neural network architectures and their applications. Neurocomputing, vol. 234, pp. 11–26, 2017. DOI: 10.1016/j.neucom.2016.12.038.
|
[2] |
A. Krizhevsky, I. Sutskever, G. E. Hinton Imagenet classification with deep convolutional neural networks. In Proceedings of Advances in Neural Information Processing Systems, Nevada, USA, vol. 25, pp. 1097–1105, 2012.
|
[3] |
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, N. Houlsby. An image is worth 16 × 16 words: Transformers for image recognition at scale. In Proceedings of the International Conference on Learning Representations, Vienna, Austria, 2021.
|
[4] |
Q.-S. Zhang, S.-C. Zhu. Visual interpretability for deep learning: A survey. Frontiers of Information Technology and Electronic Engineering, vol. 19, no. 1, pp. 27–39, 2018. DOI: 10.1631/FITEE.1700808.
|
[5] |
U. Güçlü, M. A. van Gerven. Deep neural networks reveal a gradient in the complexity of neural representations across the ventral stream. Journal of Neuroscience, vol. 35, no. 27, pp. 10005–10014, 2015. DOI: 10.1523/JNEUROSCI.5023-14.2015.
|
[6] |
C. Conwell, J. S. Prince, G. A. Alvarez, T. Konkle. What can 5.17 billion regression fits tell us about artificial models of the human visual system? In Proceedings of Neural Information Processing Systems, Montreal, Canada, 2021. [Online], Available: https://openreview.net/forum?id=i_xiyGq6FNT.
|
[7] |
S. Huang, W. Shao, M.-L. Wang, D.-Q. Zhang. Fmri-based decoding of visual information from human brain activity: A brief review. International Journal of Automation and Computing, vol. 18, no. 2, pp. 170–184, 2021. DOI: 10.1007/s11633-020-1263-y.
|
[8] |
M. Khosla, G. H. Ngo, K. Jamison, A. Kuceyeski, M. R. Sabuncu. Cortical response to naturalistic stimuli is largely predictable with deep neural networks. Science Advances, vol. 7, no. 22, Article number 7547, 2021. DOI: 10.1126/sciadv.abe7547.
|
[9] |
Y. Xu, M. Vaziri-Pashkam. Limits to visual representational correspondence between convolutional neural networks and the human brain. Nature Communications, vol. 12, no. 1, pp. 1–16, 2021.
|
[10] |
P. Bashivan, K. Kar, J. J. DiCarlo. Neural population control via deep image synthesis. Science, vol. 364, no. 6439, Article number 9436, 2019. DOI: 10.1126/science.aav9436.
|
[11] |
S. M. Khaligh-Razavi, N. Kriegeskorte. Deep supervised, but not unsupervised, models may explain it cortical representation. PLoS computational biology, vol. 10, no. 11, Article number 1003915, 2014. DOI: 10.1371/journal.pcbi.1003915.
|
[12] |
X. Q. Zhang, R. H. Jiang, C. X. Fan, T. Y. Tong, T. Wang, P. C. Huang. Advances in deep learning methods for visual tracking: Literature review and fundamentals. International Journal of Automation and Computing, vol. 18, no. 3, pp. 311–333, 2021. DOI: 10.1007/s11633-020-1274-8.
|
[13] |
F. Sultana, A. Sufian, P. Dutta. Evolution of image segmentation using deep convolutional neural network: A survey. Knowledge-Based Systems, vol. 201-202, Article number 106062, 2020. DOI: 10.1016/j.knosys.2020.106062.
|
[14] |
N. Sharma, V. Jain, A. Mishra. An analysis of convolutional neural networks for image classification. Procedia computer Science, vol. 132, pp. 377–384, 2018. DOI: 10.1016/j.procs.2018.05.198.
|
[15] |
K. Hermann, T. Chen, S. Kornblith. The origins and prevalence of texture bias in convolutional neural networks. In Proceedings of Advances in Neural Information Processing Systems, Vancouver, Canada, vol. 33, pp. 19000–19015, 2020.
|
[16] |
S. Tuli, I. Dasgupta, E. Grant, T. L. Griffiths: Are convolutional neural networks or transformers more like human vision? [Online], Available: https://arxiv.org/abs/2105.07197, 2021.
|
[17] |
D. Li, C. Du, H. He. Semi-supervised cross-modal image generation with generative adversarial networks. Pattern Recognition, vol. 100, Article number 107085, 2020. DOI: 10.1016/j.patcog.2019.107085.
|
[18] |
C, Du, C. Du, L. Huang, H. Wang, H. He. Structured neural decoding with multitask transfer learning of deep neural network representations. IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 2, pp. 600–614, 2022. DOI: 10.1109/TNNLS.2020.3028167.
|
[19] |
H. Wang, L. Huang, C. Du, D. Li, B. Wang, H. He. Neural encoding for human visual cortex with deep neural networks learning “what” and “where”. IEEE Transactions on Cognitive and Developmental Systems, vol. 13, no. 4, pp. 827–840, 2020.
|
[20] |
R. M. Cichy, A. Khosla, D. Pantazis, A. Torralba, A. Oliva. Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence. Scientific Reports. vol.6, no.1, Article number 1, 2016.
|
[21] |
G. St-Yves, T. Naselaris. The feature-weighted receptive field: an interpretable encoding model for complex feature spaces. NeuroImage, vol. 180, pp. 188–202, 2018. DOI: 10.1016/j.neuroimage.2017.06.035.
|
[22] |
C. Du, C. Du, L. Huang, H. He. Reconstructing perceived images from human brain activities with bayesian deep multiview learning. IEEE transactions on neural networks and learning systems, vol. 30, no. 8, pp. 2310–2323, 2018.
|
[23] |
R. Geirhos, K. Narayanappa, B. Mitzkus, T. Thieringer, M. Bethge, F. A. Wichmann, W. Brendel. Partial success in closing the gap between human and machine vision. In Proceedings of Advances in Neural Information Processing Systems, Montreal, Canada, vol. 34, pp.23885–23899, 2021.
|
[24] |
K. Han, Y. Wang, H. Chen, X. Chen, J. Guo, Z. Liu, Y. Tang, A. Xiao, C. Xu, Y. Xu, Z. Yang, Y Zhang, D Tao. A survey on vision transformer. IEEE Transactions on Pattern Analysis and Machine Intelligence, to be publised, 2022. DOI: 10.1109/TPAMI.2022.3152247
|
[25] |
Y. Wang, X. Zhang, T. Yang, J. Sun. Anchor detr: Query design for transformer-based detector. [Online], Available: https://arxiv.org/abs/2109.07107, 2021.
|
[26] |
H. Zhang, F. Li, S. Liu, L. Zhang, H. Su, J. Zhu, L. M. Ni, H. Y. Shum. DINO: DETR with improved denoising anchor boxes for end-to-end object detection. [Online], Available: https://arxiv.org/abs/2203.03605, 2022.
|
[27] |
W. Wang, E. Xie, X. Li, D. P. Fan, K. Song, D. Liang, T. Lu, P. Luo, L. Shao. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Montreal, Canada, pp. 568–578. 2021.
|
[28] |
Y. H. Wu, Y. Liu, X. Zhan, M. M. Cheng. P2T: Pyramid pooling transformer for scene understanding. [Online], Available: https://arxiv.org/abs/2106.12011, 2021.
|
[29] |
B. Cheng, A. G. Schwing, A. Kirillov. Per-pixel classification is not all you need for semantic segmentation. In Proceedings of Advances in Neural Information Processing Systems, Montreal, Canada, vol. 34, pp. 17864–17875, 2021.
|
[30] |
E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, P. Luo. Segformer: Simple and efficient design for semantic segmentation with transformers. In Proceedings of Advances in Neural Information Processing Systems, Montreal, Canada, vol. 34, pp. 12077–12090, 2021.
|
[31] |
G. Bertasius, H. Wang, L. Torresani. Is space-time attention all you need for video understanding? In Proceedings of the International Conference on Machine Learning, 2021.
|
[32] |
A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark. Learning transferable visual models from natural language supervision. In Proceedings of International Conference on Machine Learning, 2021.
|
[33] |
S. Xie, C. Sun, J. Huang, Z. Tu, K. Murphy. Rethinking spatiotemporal feature learning: Speed-accuracy trade-offs in video classification. In Proceedings of the European Conference on Computer, Springer, Munich, Germany, pp. 305–321, 2018.
|
[34] |
B. Choksi, M. Mozafari, R. Vanrullen, L. Reddy. Multimodal neural networks better explain multivoxel patterns in the hippocampus. In Proceedings of Neural Information Processing Systems, Montreal, Canada, 2021. [Online], Available: https://openreview.net/forum?id=6dymbuga7nL.
|
[35] |
J. Shi, H. Wen, Y. Zhang, K. Han, Z. Liu. Deep recurrent neural network reveals a hierarchy of process memory during dynamic natural vision. Human brain mapping, vol. 39, no. 5, pp. 2269–2282, 2018. DOI: 10.1002/hbm.24006.
|
[36] |
C. Conwell, D. Mayo, A. Barbu, M. Buice, G. Alvarez, B. Katz. Neural regression, representational similarity, model zoology & neural taskonomy at scale in rodent visual cortex. In Proceedings of Advances in Neural Information Processing Systems, Montreal, Canada, vol. 34, pp. 5590–5607, 2021.
|
[37] |
N. Kriegeskorte, M. Mur, P.A. Bandettini. Representational similarity analysis-connecting the branches of Systems Neuroscience. Frontiers in Systems Neuroscience, vol. 2, Article number 4, 2008. DOI: 10.3389/neuro.01.016.2008.
|
[38] |
A. Steiner, A. Kolesnikov, X. Zhai, R. Wightman, J. Uszkoreit, L. Beyer. How to train your vit? Data, augmentation, and regularization in vision transformers. [Online], Available: https://arxiv.org/abs/2106.10270, 2021.
|
[39] |
H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, H. Jégou. Training data-efficient image transformers & distillation through attention. In Proceedings of International Conference on Machine Learning, vol. 139, pp. 10347–10357, 2021.
|
[40] |
S. D′Ascoli, H. Touvron, M. L. Leavitt, A. S. Morcos, G. Biroli, L. Sagun. Convit: Improving vision transformers with soft convolutional inductive biases. In Proceedings of International Conference on Machine Learning, Virtual, vol. 139, pp. 2286–2296, 2021.
|
[41] |
R. M. Cichy, K. Dwivedi, B. Lahner, A. Lascelles, P. Iamshchinina, M. Graumann, A. Andonian, N. Murty, K. Kay, G. Roig, et al. The algonauts project 2021 challenge: How the human brain makes sense of a world in motion. [Online], Available: https://arxiv.org/abs/2104.13714, 2021.
|
[42] |
A. G. Huth, S. Nishimoto, A. T. Vu, J. L. Gallant. A continuous semantic space describes the representation of thousands of object and action categories across the human brain. Neuron, vol. 76, no. 6, pp. 1210–1224, 2012. DOI: 10.1016/j.neuron.2012.10.014.
|
[43] |
A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, S. Chintala. Pytorch: An imperative style, high-performance deep learning library. In Proceedings of Advances in Neural Information Processing Systems, Vancouver, Canada, vol. 32, 8024–8035, 2019.
|
[44] |
R. Wightman. PyTorch Image Models. GitHub, 2019. [Online], Available: https://doi.org/10.5281/zenodo.4414861.
|
[45] |
K. Simonyan, A. Zisserman. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations, San Diego, USA, 2015.
|
[46] |
K. He, X. Zhang, S. Ren, J. Sun. Deep residual learning for image recognition. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE Las Vegas, USA, pp. 770–778, 2016.
|
[47] |
Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, B. Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, IEEE, Montreal, Canada, pp. 10012–10022, 2021.
|
[48] |
I. Radosavovic, R. P. Kosaraju, R. Girshick, K. He, P. Dollár. Designing network design spaces. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, pp. 10428–10436, 2020.
|
[49] |
P. Li, T. J. Hastie, K. W. Church. Very sparse random projections. International Conference on Knowledge Discovery and Data Mining, ACM, Philadelphia, USA, pp. 287–296. 2006.
|
[50] |
J. A. Mazer, W. E. Vinje, J. McDermott, P. H. Schiller, J. L. Gallant. Spatial frequency and orientation tuning dynamics in area v1. Proceedings of the National Academy of Sciences, vol. 99, no. 3, pp. 1645–1650, 2002. DOI: 10.1073/pnas.022638499.
|
[51] |
I. Bramão, L. Faísca, C. Forkstam, A. Reis, K.M. Petersson. Cortical brain regions associated with color processing: An fmri study. The Open Neuroimaging Journal, vol. 4, pp. 164–173, 2010. DOI: 10.2174/1874440001004010164.
|
[52] |
H. Tanigawa, H. D. Lu, A. W. Roe. Functional organization for color and orientation in macaque V4. Nature neuroscience, vol. 13, no. 12, pp. 1542–1548, 2010. DOI: 10.1038/nn.2676.
|
[53] |
D. Y. Tsao, W. A. Freiwald, T. A. Knutsen, J. B. Mandeville, R. B. Tootell. Faces and objects in macaque cerebral cortex. Nature Neuroscience, vol. 6, no. 9, pp. 989–995, 2003. DOI: 10.1038/nn1111.
|
[54] |
P. E. Downing, Y. Jiang, M. Shuman, N. Kanwisher. A cortical area selective for visual processing of the human body. Science, vol. 293, no. 5539, pp. 2470–2473, 2001. DOI: 10.1126/science.1063414.
|
[55] |
S. Kornblith, X. Cheng, S. Ohayon, D. Y. Tsao. A network for scene processing in the macaque temporal lobe. Neuron, vol. 79, no. 4, pp. 766–781, 2013. DOI: 10.1016/j.neuron.2013.06.015.
|
[56] |
B.R. Conway. The organization and operation of inferior temporal cortex. Annual Review of Vision Science, vol. 4, pp. 381–402, 2018. DOI: 10.1146/annurev-vision-091517-034202.
|
[57] |
M. Manassi, B. Sayim, M. H. Herzog. When crowding of crowding leads to uncrowding. Journal of vision, vol. 13, Aiticle number 10, 2013.
|
[58] |
W. Luo, Y. Li, R. Urtasun, R. Zemel. Understanding the effective receptive field in deep convolutional neural networks. In Proceedings of Advances in Neural Information Processing Systems, Barcelona, Spain, vol. 29, pp. 4898–4906, 2016.
|
[59] |
M. M. Naseer, K. Ranasinghe, S. H. Khan, M. Hayat, F. Shahbaz Khan, M. H. Yang. Intriguing properties of vision transformers. In Proceedings of Advances in Neural Information Processing Systems, Montreal, Canada, vol. 34, pp. 23296–23308, 2021.
|
[60] |
S. Nonaka, K. Majima, S. C. Aoki, Y. Kamitani. Brain hierarchy score: Which deep neural networks are hierarchically brain-like? Iscience, vol. 24, no. 9, Article number 103013, 2021. DOI: 10.1016/j.isci.2021.103013.
|
[61] |
M. M. Murray, A. Thelen, G. Thut, V. Romei, R. Martuzzi, P. J. Matusz. The multisensory function of the human primary visual cortex. Neuropsychologia, vol. 83, pp. 161–169, 2016. DOI: 10.1016/j.neuropsychologia.2015.08.011.
|
[62] |
T. Raij, J. Ahveninen, F. H. Lin, T. Witzel, I. P. Jääskeläinen, B. Letham, E. Israeli, C. Sahyoun, C. Vasios, S. Stufflebeam. Onset timing of cross-sensory activations and multisensory interactions in auditory and visual sensory cortices. European Journal of Neuroscience, vol. 31, no. 10, pp. 1772–1782, 2010. DOI: 10.1111/j.1460-9568.2010.07213.x.
|
[63] |
C. Cappe, G. Thut, V. Romei, M. M. Murray. Auditory–visual multisensory interactions in humans: Timing, topography, directionality, and sources. Journal of Neuroscience, vol. 30, no. 38, pp. 12572–12580, 2010. DOI: 10.1523/JNEUROSCI.1099-10.2010.
|
[64] |
T. Raij, K. Uutela, R. Hari. Audiovisual integration of letters in the human brain. Neuron, vol. 28, no. 2, pp. 617–625, 2000. DOI: 10.1016/S0896-6273(00)00138-0.
|
[65] |
T. C. Kietzmann, C. J. Spoerer, L. K. Sörensen, R. M. Cichy, O. Hauk, N. Kriegeskorte. Recurrence is required to capture the representational dynamics of the human visual system. Proceedings of the National Academy of Sciences, vol. 116, no. 43, pp. 21854–21863, 2019. DOI: 10.1073/pnas.1905544116.
|