Brain-inspired computer vision aims to learn from biological systems to develop advanced image processing techniques. However, its progress so far is not impressing. A main obstacle comes from that the current paradigm for brain-inspired computer vision has not captured the fundamental nature of biological vision, i.e., the biological vision is targeted for processing spatio-temporal patterns. Recently, a new paradigm for developing brain-inspired computer vision is emerging, which emphasizes on the spatio-temporal nature of visual signals and the brain-inspired models for processing this type of data. This paper, we review some recent primary works towards this new paradigm, including the development of spike cameras which acquire spiking signals directly from visual scenes, and the development of computational models learned from neural systems that are specialized to process spatio-temporal patterns, including models for object detection, tracking, and recognition. It also discusses about the future directions to improve the paradigm. The review has published in the fifth issue of Machine Intelligence Research.
Nowadays, computer vision or machine vision, represented especially by deep convolutional neural networks (DCNNs), has achieved great success in many vision tasks. Compared to biological vision, however, computer vision is still lagging far behind in both performances and variety of capabilities. For instance, DCNNs, which mainly mimic the feedforward and hierarchical structure of the ventral pathway of biological vision, has achieved an extremely high accuracy in image classification, but in other tasks, such as video analysis and imaging understanding, they are still far from satisfactory. Thus, learning from biological vision, the so-called brain-inspired computer vision, is still a promising and efficient way to speed up the development of computer vision.
Although the importance of developing brain-inspired computer vision has been widely recognized, up to now, we have not achieved any really breakthrough in the field that can match the achievement of AlphaGo to GO game or AlphaFold to protein prediction.
So, what is the obstacle in the development? An important issue that is missed in the current practice of brain-inspired computer vision is the ignorance of a key nature of biological vision, i.e., biological vision is targeted on processing spatio-temporal patterns. This is fundamentally different from static images which DCNNs are good at. Shortly speaking, the characteristic of having both spatial and temporal structures is the nature of neural signals in every part of the brain.
At the beginning stage of acquiring visual information from the external world, the signals received by retina are in the form of continuous optical flow; these signals are converted into spike trains by retinal ganglion cells, which are subsequently transmitted layer by layer to the visual cortex, where the visual input is integrated with spikes from other cortical regions conveying the prior knowledge or memory; eventually, the visual information is extracted in the form of continuous neuronal responses (see illustration in Fig. 1). The whole process is very complicated with many fine details remaining unknown, nevertheless, the fact that the visual system computes spatio-temporal patterns in the form of spike trains is fully validated by experiments.
In recognition of the aforementioned difference between machine vision and biological vision, a new paradigm which captures this fundamental difference is emerging for developing brain-inspired computer vision. Specifically, in such a paradigm, from beginning the visual information in the external world is expressed in the form of spike trains, which is subsequently processed by computation models inspired by the biological system. Notably, the current popular spiking neural networks (SNNs) in the field do not fulfill this goal, and they normally miss many features in biological systems that are crucial for processing spatio-temporal patterns efficiently, such as the differentiation of excitatory and inhibitory neurons, the stochastic neuronal firing, the recurrent and feedback interactions between neurons, the short-term plasticity of synapses, etc.
The organization of the paper is as follows. Section 2 will first review the recent develop of spike cameras, in particular Vidar, which provides a way to represent visual inputs by spike trains. In Section 3 will review three brain-inspired computational models, which can implement rapid signal detection, anticipative object tracking, and spatio-temporal pattern recognition, respectively based on spike trains. Finally, Section 4 will discuss about the future development of brain-inspired computer vision.
Download full text：
Towards a New Paradigm for Brain-inspired Computer Vision
Xiao-Long Zou, Tie-Jun Huang, Si Wu