Display Method:
, Available online ,
doi: 10.1007/s11633-022-1335-2
Abstract:
Vision plays a peculiar role in intelligence. Visual information, forming a large part of the sensory information, is fed into the human brain to formulate various types of cognition and behaviours that make humans become intelligent agents. Recent advances have led to the development of brain-inspired algorithms and models for machine vision. One of the key components of these methods is the utilization of the computational principles underlying biological neurons. Additionally, advanced experimental neuroscience techniques have generated different types of neural signals that carry essential visual information. Thus, there is a high demand for mapping out functional models for reading out visual information from neural signals. Here, we briefly review recent progress on this issue with a focus on how machine learning techniques can help in the development of models for contending various types of neural signals, from fine-scale neural spikes and single-cell calcium imaging to coarse-scale electroencephalography (EEG) and functional magnetic resonance imaging recordings of brain signals.
Vision plays a peculiar role in intelligence. Visual information, forming a large part of the sensory information, is fed into the human brain to formulate various types of cognition and behaviours that make humans become intelligent agents. Recent advances have led to the development of brain-inspired algorithms and models for machine vision. One of the key components of these methods is the utilization of the computational principles underlying biological neurons. Additionally, advanced experimental neuroscience techniques have generated different types of neural signals that carry essential visual information. Thus, there is a high demand for mapping out functional models for reading out visual information from neural signals. Here, we briefly review recent progress on this issue with a focus on how machine learning techniques can help in the development of models for contending various types of neural signals, from fine-scale neural spikes and single-cell calcium imaging to coarse-scale electroencephalography (EEG) and functional magnetic resonance imaging recordings of brain signals.
, Available online ,
doi: 10.1007/s11633-022-1348-x
Abstract:
Nowadays, deep neural networks (DNNs) have been equipped with powerful representation capabilities. The deep convolutional neural networks (CNNs) that draw inspiration from the visual processing mechanism of the primate early visual cortex have outperformed humans on object categorization and have been found to possess many brain-like properties. Recently, vision transformers (ViTs) have been striking paradigms of DNNs and have achieved remarkable improvements on many vision tasks compared to CNNs. It is natural to ask how the brain-like properties of ViTs are. Beyond the model paradigm, we are also interested in the effects of factors, such as model size, multimodality, and temporality, on the ability of networks to model the human visual pathway, especially when considering that existing research has been limited to CNNs. In this paper, we systematically evaluate the brain-like properties of 30 kinds of computer vision models varying from CNNs and ViTs to their hybrids from the perspective of explaining brain activities of the human visual cortex triggered by dynamic stimuli. Experiments on two neural datasets demonstrate that neither CNN nor transformer is the optimal model paradigm for modelling the human visual pathway. ViTs reveal hierarchical correspondences to the visual pathway as CNNs do. Moreover, we find that multi-modal and temporal networks can better explain the neural activities of large parts of the visual cortex, whereas a larger model size is not a sufficient condition for bridging the gap between human vision and artificial networks. Our study sheds light on the design principles for more brain-like networks. The code is available athttps://github.com/QYiZhou/LWNeuralEncoding .
Nowadays, deep neural networks (DNNs) have been equipped with powerful representation capabilities. The deep convolutional neural networks (CNNs) that draw inspiration from the visual processing mechanism of the primate early visual cortex have outperformed humans on object categorization and have been found to possess many brain-like properties. Recently, vision transformers (ViTs) have been striking paradigms of DNNs and have achieved remarkable improvements on many vision tasks compared to CNNs. It is natural to ask how the brain-like properties of ViTs are. Beyond the model paradigm, we are also interested in the effects of factors, such as model size, multimodality, and temporality, on the ability of networks to model the human visual pathway, especially when considering that existing research has been limited to CNNs. In this paper, we systematically evaluate the brain-like properties of 30 kinds of computer vision models varying from CNNs and ViTs to their hybrids from the perspective of explaining brain activities of the human visual cortex triggered by dynamic stimuli. Experiments on two neural datasets demonstrate that neither CNN nor transformer is the optimal model paradigm for modelling the human visual pathway. ViTs reveal hierarchical correspondences to the visual pathway as CNNs do. Moreover, we find that multi-modal and temporal networks can better explain the neural activities of large parts of the visual cortex, whereas a larger model size is not a sufficient condition for bridging the gap between human vision and artificial networks. Our study sheds light on the design principles for more brain-like networks. The code is available at
, Available online ,
doi: 10.1007/s11633-022-1328-1
Abstract:
Adversarial example has been well known as a serious threat to deep neural networks (DNNs). In this work, we study the detection of adversarial examples based on the assumption that the output and internal responses of one DNN model for both adversarial and benign examples follow the generalized Gaussian distribution (GGD) but with different parameters (i.e., shape factor, mean, and variance). GGD is a general distribution family that covers many popular distributions (e.g., Laplacian, Gaussian, or uniform). Therefore, it is more likely to approximate the intrinsic distributions of internal responses than any specific distribution. Besides, since the shape factor is more robust to different databases rather than the other two parameters, we propose to construct discriminative features via the shape factor for adversarial detection, employing the magnitude of Benford-Fourier (MBF) coefficients, which can be easily estimated using responses. Finally, a support vector machine is trained as an adversarial detector leveraging the MBF features. Extensive experiments in terms of image classification demonstrate that the proposed detector is much more effective and robust in detecting adversarial examples of different crafting methods and sources compared to state-of-the-art adversarial detection methods.
Adversarial example has been well known as a serious threat to deep neural networks (DNNs). In this work, we study the detection of adversarial examples based on the assumption that the output and internal responses of one DNN model for both adversarial and benign examples follow the generalized Gaussian distribution (GGD) but with different parameters (i.e., shape factor, mean, and variance). GGD is a general distribution family that covers many popular distributions (e.g., Laplacian, Gaussian, or uniform). Therefore, it is more likely to approximate the intrinsic distributions of internal responses than any specific distribution. Besides, since the shape factor is more robust to different databases rather than the other two parameters, we propose to construct discriminative features via the shape factor for adversarial detection, employing the magnitude of Benford-Fourier (MBF) coefficients, which can be easily estimated using responses. Finally, a support vector machine is trained as an adversarial detector leveraging the MBF features. Extensive experiments in terms of image classification demonstrate that the proposed detector is much more effective and robust in detecting adversarial examples of different crafting methods and sources compared to state-of-the-art adversarial detection methods.
Display Method:
2022, vol. 19, no. 4,
pp. 257-287,
doi: 10.1007/s11633-022-1349-9
Abstract:
This paper aims to conduct a comprehensive study on facial-sketch synthesis (FSS). However, due to the high cost of obtaining hand-drawn sketch datasets, there is a lack of a complete benchmark for assessing the development of FSS algorithms over the last decade. We first introduce a high-quality dataset for FSS, named FS2K, which consists of 2 104 image-sketch pairs spanning three types of sketch styles, image backgrounds, lighting conditions, skin colors, and facial attributes. FS2K differs from previous FSS datasets in difficulty, diversity, and scalability and should thus facilitate the progress of FSS research. Second, we present the largest-scale FSS investigation by reviewing 89 classic methods, including 25 handcrafted feature-based facial-sketch synthesis approaches, 29 general translation methods, and 35 image-to-sketch approaches. In addition, we elaborate comprehensive experiments on the existing 19 cutting-edge models. Third, we present a simple baseline for FSS, named FSGAN. With only two straightforward components, i.e., facial-aware masking and style-vector expansion, our FSGAN surpasses the performance of all previous state-of-the-art models on the proposed FS2K dataset by a large margin. Finally, we conclude with lessons learned over the past years and point out several unsolved challenges. Our code is available athttps://github.com/DengPingFan/FSGAN .
This paper aims to conduct a comprehensive study on facial-sketch synthesis (FSS). However, due to the high cost of obtaining hand-drawn sketch datasets, there is a lack of a complete benchmark for assessing the development of FSS algorithms over the last decade. We first introduce a high-quality dataset for FSS, named FS2K, which consists of 2 104 image-sketch pairs spanning three types of sketch styles, image backgrounds, lighting conditions, skin colors, and facial attributes. FS2K differs from previous FSS datasets in difficulty, diversity, and scalability and should thus facilitate the progress of FSS research. Second, we present the largest-scale FSS investigation by reviewing 89 classic methods, including 25 handcrafted feature-based facial-sketch synthesis approaches, 29 general translation methods, and 35 image-to-sketch approaches. In addition, we elaborate comprehensive experiments on the existing 19 cutting-edge models. Third, we present a simple baseline for FSS, named FSGAN. With only two straightforward components, i.e., facial-aware masking and style-vector expansion, our FSGAN surpasses the performance of all previous state-of-the-art models on the proposed FS2K dataset by a large margin. Finally, we conclude with lessons learned over the past years and point out several unsolved challenges. Our code is available at
2022, vol. 19, no. 4,
pp. 288-306,
doi: 10.1007/s11633-022-1332-5
Abstract:
Robot-assisted microsurgery (RAMS) has many benefits compared to traditional microsurgery. Microsurgical platforms with advanced control strategies, high-quality micro-imaging modalities and micro-sensing systems are worth developing to further enhance the clinical outcomes of RAMS. Within only a few decades, microsurgical robotics has evolved into a rapidly developing research field with increasing attention all over the world. Despite the appreciated benefits, significant challenges remain to be solved. In this review paper, the emerging concepts and achievements of RAMS will be presented. We introduce the development tendency of RAMS from teleoperation to autonomous systems. We highlight the upcoming new research opportunities that require joint efforts from both clinicians and engineers to pursue further outcomes for RAMS in years to come.
Robot-assisted microsurgery (RAMS) has many benefits compared to traditional microsurgery. Microsurgical platforms with advanced control strategies, high-quality micro-imaging modalities and micro-sensing systems are worth developing to further enhance the clinical outcomes of RAMS. Within only a few decades, microsurgical robotics has evolved into a rapidly developing research field with increasing attention all over the world. Despite the appreciated benefits, significant challenges remain to be solved. In this review paper, the emerging concepts and achievements of RAMS will be presented. We introduce the development tendency of RAMS from teleoperation to autonomous systems. We highlight the upcoming new research opportunities that require joint efforts from both clinicians and engineers to pursue further outcomes for RAMS in years to come.
2022, vol. 19, no. 4,
pp. 307-318,
doi: 10.1007/s11633-022-1350-3
Abstract:
Seeing through dense occlusions and reconstructing scene images is an important but challenging task. Traditional frame-based image de-occlusion methods may lead to fatal errors when facing extremely dense occlusions due to the lack of valid information available from the limited input occluded frames. Event cameras are bio-inspired vision sensors that record the brightness changes at each pixel asynchronously with high temporal resolution. However, synthesizing images solely from event streams is ill-posed since only the brightness changes are recorded in the event stream, and the initial brightness is unknown. In this paper, we propose an event-enhanced multi-modal fusion hybrid network for image de-occlusion, which uses event streams to provide complete scene information and frames to provide color and texture information. An event stream encoder based on the spiking neural network (SNN) is proposed to encode and denoise the event stream efficiently. A comparison loss is proposed to generate clearer results. Experimental results on a large-scale event-based and frame-based image de-occlusion dataset demonstrate that our proposed method achieves state-of-the-art performance.
Seeing through dense occlusions and reconstructing scene images is an important but challenging task. Traditional frame-based image de-occlusion methods may lead to fatal errors when facing extremely dense occlusions due to the lack of valid information available from the limited input occluded frames. Event cameras are bio-inspired vision sensors that record the brightness changes at each pixel asynchronously with high temporal resolution. However, synthesizing images solely from event streams is ill-posed since only the brightness changes are recorded in the event stream, and the initial brightness is unknown. In this paper, we propose an event-enhanced multi-modal fusion hybrid network for image de-occlusion, which uses event streams to provide complete scene information and frames to provide color and texture information. An event stream encoder based on the spiking neural network (SNN) is proposed to encode and denoise the event stream efficiently. A comparison loss is proposed to generate clearer results. Experimental results on a large-scale event-based and frame-based image de-occlusion dataset demonstrate that our proposed method achieves state-of-the-art performance.
2022, vol. 19, no. 4,
pp. 319-330,
doi: 10.1007/s11633-022-1338-z
Abstract:
Federated learning (FedL) is a machine learning (ML) technique utilized to train deep neural networks (DeepNNs) in a distributed way without the need to share data among the federated training clients. FedL was proposed for edge computing and Internet of things (IoT) tasks in which a centralized server was responsible for coordinating and governing the training process. To remove the design limitation implied by the centralized entity, this work proposes two different solutions to decentralize existing FedL algorithms, enabling the application of FedL on networks with arbitrary communication topologies, and thus extending the domain of application of FedL to more complex scenarios and new tasks. Of the two proposed algorithms, one, called FedLCon, is developed based on results from discrete-time weighted average consensus theory and is able to reconstruct the performances of the standard centralized FedL solutions, as also shown by the reported validation tests.
Federated learning (FedL) is a machine learning (ML) technique utilized to train deep neural networks (DeepNNs) in a distributed way without the need to share data among the federated training clients. FedL was proposed for edge computing and Internet of things (IoT) tasks in which a centralized server was responsible for coordinating and governing the training process. To remove the design limitation implied by the centralized entity, this work proposes two different solutions to decentralize existing FedL algorithms, enabling the application of FedL on networks with arbitrary communication topologies, and thus extending the domain of application of FedL to more complex scenarios and new tasks. Of the two proposed algorithms, one, called FedLCon, is developed based on results from discrete-time weighted average consensus theory and is able to reconstruct the performances of the standard centralized FedL solutions, as also shown by the reported validation tests.
2022, vol. 19, no. 4,
pp. 331-346,
doi: 10.1007/s11633-022-1336-1
Abstract:
Due to the popularity of group activities in social media, group recommendation becomes increasingly significant. It aims to pursue a list of preferred items for a target group. Most deep learning-based methods on group recommendation have focused on learning group representations from single interaction between groups and users. However, these methods may suffer from data sparsity problem. Except for the interaction between groups and users, there also exist other interactions that may enrich group representation, such as the interaction between groups and items. Such interactions, which take place in the range of a group, form a local view of a certain group. In addition to local information, groups with common interests may also show similar tastes on items. Therefore, group representation can be conducted according to the similarity among groups, which forms a global view of a certain group. In this paper, we propose a novel global and local information fusion neural network (GLIF) model for group recommendation. In GLIF, an attentive neural network (ANN) activates rich interactions among groups, users and items with respect to forming a group′s local representation. Moreover, our model also leverages ANN to obtain a group′s global representation based on the similarity among different groups. Then, it fuses global and local representations based on attention mechanism to form a group′s comprehensive representation. Finally, group recommendation is conducted under neural collaborative filtering (NCF) framework. Extensive experiments on three public datasets demonstrate its superiority over the state-of-the-art methods for group recommendation.
Due to the popularity of group activities in social media, group recommendation becomes increasingly significant. It aims to pursue a list of preferred items for a target group. Most deep learning-based methods on group recommendation have focused on learning group representations from single interaction between groups and users. However, these methods may suffer from data sparsity problem. Except for the interaction between groups and users, there also exist other interactions that may enrich group representation, such as the interaction between groups and items. Such interactions, which take place in the range of a group, form a local view of a certain group. In addition to local information, groups with common interests may also show similar tastes on items. Therefore, group representation can be conducted according to the similarity among groups, which forms a global view of a certain group. In this paper, we propose a novel global and local information fusion neural network (GLIF) model for group recommendation. In GLIF, an attentive neural network (ANN) activates rich interactions among groups, users and items with respect to forming a group′s local representation. Moreover, our model also leverages ANN to obtain a group′s global representation based on the similarity among different groups. Then, it fuses global and local representations based on attention mechanism to form a group′s comprehensive representation. Finally, group recommendation is conducted under neural collaborative filtering (NCF) framework. Extensive experiments on three public datasets demonstrate its superiority over the state-of-the-art methods for group recommendation.
2015, vol. 12, no. 3,
pp. 229-242,
doi: 10.1007/s11633-015-0893-y
2015, vol. 12, no. 2,
pp. 134-141,
doi: 10.1007/s11633-015-0880-3
2015, vol. 12, no. 3,
pp. 337-342,
doi: 10.1007/s11633-014-0870-x
2015, vol. 12, no. 4,
pp. 343-367,
doi: 10.1007/s11633-015-0894-x
2015, vol. 12, no. 1,
pp. 70-76,
doi: 10.1007/s11633-014-0820-7
2016, vol. 13, no. 3,
pp. 199-225,
doi: 10.1007/s11633-016-1004-4
2015, vol. 12, no. 2,
pp. 156-170,
doi: 10.1007/s11633-014-0825-2
2015, vol. 12, no. 5,
pp. 511-517,
doi: 10.1007/s11633-014-0859-5
2015, vol. 12, no. 2,
pp. 192-198,
doi: 10.1007/s11633-014-0868-4
2015, vol. 12, no. 2,
pp. 117-124,
doi: 10.1007/s11633-015-0878-x
2015, vol. 12, no. 6,
pp. 579-587,
doi: 10.1007/s11633-015-0901-2
2016, vol. 13, no. 1,
pp. 1-18,
doi: 10.1007/s11633-015-0913-y
2015, vol. 12, no. 4,
pp. 368-381,
doi: 10.1007/s11633-015-0895-9
2016, vol. 13, no. 4,
pp. 382-391,
doi: 10.1007/s11633-015-0918-6
2015, vol. 12, no. 4,
pp. 393-401,
doi: 10.1007/s11633-014-0858-6
2015, vol. 12, no. 4,
pp. 440-447,
doi: 10.1007/s11633-014-0856-8
2015, vol. 12, no. 3,
pp. 243-253,
doi: 10.1007/s11633-015-0888-8
2015, vol. 12, no. 2,
pp. 149-155,
doi: 10.1007/s11633-015-0881-2
2015, vol. 12, no. 1,
pp. 43-49,
doi: 10.1007/s11633-014-0866-6
2015, vol. 12, no. 3,
pp. 330-336,
doi: 10.1007/s11633-015-0886-x
2015, vol. 12, no. 4,
pp. 448-454,
doi: 10.1007/s11633-014-0849-7
-
Satellite Integration into 5G: Deep Reinforcement Learning for Network Selection
-
A Dynamic Resource Allocation Strategy with Reinforcement Learning for Multimodal Multi-objective Optimization
-
Pedestrian Attribute Recognition in Video Surveillance Scenarios Based on View-attribute Attention Localization
MIR News More >
Global Visitors