Display Method:
, Available online ,
doi: 10.1007/s11633-022-1328-1
Abstract:
Adversarial example has been well known as a serious threat to deep neural networks (DNNs). In this work, we study the detection of adversarial examples based on the assumption that the output and internal responses of one DNN model for both adversarial and benign examples follow the generalized Gaussian distribution (GGD) but with different parameters (i.e., shape factor, mean, and variance). GGD is a general distribution family that covers many popular distributions (e.g., Laplacian, Gaussian, or uniform). Therefore, it is more likely to approximate the intrinsic distributions of internal responses than any specific distribution. Besides, since the shape factor is more robust to different databases rather than the other two parameters, we propose to construct discriminative features via the shape factor for adversarial detection, employing the magnitude of Benford-Fourier (MBF) coefficients, which can be easily estimated using responses. Finally, a support vector machine is trained as an adversarial detector leveraging the MBF features. Extensive experiments in terms of image classification demonstrate that the proposed detector is much more effective and robust in detecting adversarial examples of different crafting methods and sources compared to state-of-the-art adversarial detection methods.
Adversarial example has been well known as a serious threat to deep neural networks (DNNs). In this work, we study the detection of adversarial examples based on the assumption that the output and internal responses of one DNN model for both adversarial and benign examples follow the generalized Gaussian distribution (GGD) but with different parameters (i.e., shape factor, mean, and variance). GGD is a general distribution family that covers many popular distributions (e.g., Laplacian, Gaussian, or uniform). Therefore, it is more likely to approximate the intrinsic distributions of internal responses than any specific distribution. Besides, since the shape factor is more robust to different databases rather than the other two parameters, we propose to construct discriminative features via the shape factor for adversarial detection, employing the magnitude of Benford-Fourier (MBF) coefficients, which can be easily estimated using responses. Finally, a support vector machine is trained as an adversarial detector leveraging the MBF features. Extensive experiments in terms of image classification demonstrate that the proposed detector is much more effective and robust in detecting adversarial examples of different crafting methods and sources compared to state-of-the-art adversarial detection methods.
, Available online ,
doi: 10.1007/s11633-022-1327-2
Abstract:
Face detection has achieved tremendous strides thanks to convolutional neural networks. However, dense face detection remains an open challenge due to large face scale variation, tiny faces, and serious occlusion. This paper presents a robust, dense face detector using global context and visual attention mechanisms which can significantly improve detection accuracy. Specifically, a global context fusion module with top-down feedback is proposed to improve the ability to identify tiny faces. Moreover, a visual attention mechanism is employed to solve the problem of occlusion. Experimental results on the public face datasets WIDER FACE and FDDB demonstrate the effectiveness of the proposed method.
Face detection has achieved tremendous strides thanks to convolutional neural networks. However, dense face detection remains an open challenge due to large face scale variation, tiny faces, and serious occlusion. This paper presents a robust, dense face detector using global context and visual attention mechanisms which can significantly improve detection accuracy. Specifically, a global context fusion module with top-down feedback is proposed to improve the ability to identify tiny faces. Moreover, a visual attention mechanism is employed to solve the problem of occlusion. Experimental results on the public face datasets WIDER FACE and FDDB demonstrate the effectiveness of the proposed method.
Display Method:
2022, vol. 19, no. 2,
pp. 89-114,
doi: 10.1007/s11633-022-1323-6
Abstract:
Knowledge mining is a widely active research area across disciplines such as natural language processing (NLP), data mining (DM), and machine learning (ML). The overall objective of extracting knowledge from data source is to create a structured representation that allows researchers to better understand such data and operate upon it to build applications. Each mentioned discipline has come up with an ample body of research, proposing different methods that can be applied to different data types. A significant number of surveys have been carried out to summarize research works in each discipline. However, no survey has presented a cross-disciplinary review where traits from different fields were exposed to further stimulate research ideas and to try to build bridges among these fields. In this work, we present such a survey.
Knowledge mining is a widely active research area across disciplines such as natural language processing (NLP), data mining (DM), and machine learning (ML). The overall objective of extracting knowledge from data source is to create a structured representation that allows researchers to better understand such data and operate upon it to build applications. Each mentioned discipline has come up with an ample body of research, proposing different methods that can be applied to different data types. A significant number of surveys have been carried out to summarize research works in each discipline. However, no survey has presented a cross-disciplinary review where traits from different fields were exposed to further stimulate research ideas and to try to build bridges among these fields. In this work, we present such a survey.
2022, vol. 19, no. 2,
pp. 115-126,
doi: 10.1007/s11633-022-1324-5
Abstract:
The convolution operation possesses the characteristic of translation group equivariance. To achieve more group equivariances, rotation group equivariant convolutions (RGEC) are proposed to acquire both translation and rotation group equivariances. However, previous work paid more attention to the number of parameters and usually ignored other resource costs. In this paper, we construct our networks without introducing extra resource costs. Specifically, a convolution kernel is rotated to different orientations for feature extractions of multiple channels. Meanwhile, much fewer kernels than previous works are used to ensure that the output channel does not increase. To further enhance the orthogonality of kernels in different orientations, we construct the non-maximum-suppression loss on the rotation dimension to suppress the other directions except the most activated one. Considering that the low-level-features benefit more from the rotational symmetry, we only share weights in the shallow layers (SWSL) via RGEC. Extensive experiments on multiple datasets (i.e., ImageNet, CIFAR, and MNIST) demonstrate that SWSL can effectively benefit from the higher-degree weight sharing and improve the performances of various networks, including plain and ResNet architectures. Meanwhile, the convolutional kernels and parameters are much fewer (e.g., 75%, 87.5% fewer) in the shallow layers, and no extra computation costs are introduced.
The convolution operation possesses the characteristic of translation group equivariance. To achieve more group equivariances, rotation group equivariant convolutions (RGEC) are proposed to acquire both translation and rotation group equivariances. However, previous work paid more attention to the number of parameters and usually ignored other resource costs. In this paper, we construct our networks without introducing extra resource costs. Specifically, a convolution kernel is rotated to different orientations for feature extractions of multiple channels. Meanwhile, much fewer kernels than previous works are used to ensure that the output channel does not increase. To further enhance the orthogonality of kernels in different orientations, we construct the non-maximum-suppression loss on the rotation dimension to suppress the other directions except the most activated one. Considering that the low-level-features benefit more from the rotational symmetry, we only share weights in the shallow layers (SWSL) via RGEC. Extensive experiments on multiple datasets (i.e., ImageNet, CIFAR, and MNIST) demonstrate that SWSL can effectively benefit from the higher-degree weight sharing and improve the performances of various networks, including plain and ResNet architectures. Meanwhile, the convolutional kernels and parameters are much fewer (e.g., 75%, 87.5% fewer) in the shallow layers, and no extra computation costs are introduced.
2022, vol. 19, no. 2,
pp. 127-137,
doi: 10.1007/s11633-022-1326-3
Abstract:
This paper proposes a deep-Q-network (DQN) controller for network selection and adaptive resource allocation in heterogeneous networks, developed on the ground of a Markov decision process (MDP) model of the problem. Network selection is an enabling technology for multi-connectivity, one of the core functionalities of 5G. For this reason, the present work considers a realistic network model that takes into account path-loss models and intra-RAT (radio access technology) interference. Numerical simulations validate the proposed approach and show the improvements achieved in terms of connection acceptance, resource allocation, and load balancing. In particular, the DQN algorithm has been tested against classic reinforcement learning one and other baseline approaches.
This paper proposes a deep-Q-network (DQN) controller for network selection and adaptive resource allocation in heterogeneous networks, developed on the ground of a Markov decision process (MDP) model of the problem. Network selection is an enabling technology for multi-connectivity, one of the core functionalities of 5G. For this reason, the present work considers a realistic network model that takes into account path-loss models and intra-RAT (radio access technology) interference. Numerical simulations validate the proposed approach and show the improvements achieved in terms of connection acceptance, resource allocation, and load balancing. In particular, the DQN algorithm has been tested against classic reinforcement learning one and other baseline approaches.
2022, vol. 19, no. 2,
pp. 138-152,
doi: 10.1007/s11633-022-1314-7
Abstract:
Many isolation approaches, such as zoning search, have been proposed to preserve the diversity in the decision space of multimodal multi-objective optimization (MMO). However, these approaches allocate the same computing resources for subspaces with different difficulties and evolution states. In order to solve this issue, this paper proposes a dynamic resource allocation strategy (DRAS) with reinforcement learning for multimodal multi-objective optimization problems (MMOPs). In DRAS, relative contribution and improvement are utilized to define the aptitude of subspaces, which can capture the potentials of subspaces accurately. Moreover, the reinforcement learning method is used to dynamically allocate computing resources for each subspace. In addition, the proposed DRAS is applied to zoning searches. Experimental results demonstrate that DRAS can effectively assist zoning search in finding more and better distributed equivalent Pareto optimal solutions in the decision space.
Many isolation approaches, such as zoning search, have been proposed to preserve the diversity in the decision space of multimodal multi-objective optimization (MMO). However, these approaches allocate the same computing resources for subspaces with different difficulties and evolution states. In order to solve this issue, this paper proposes a dynamic resource allocation strategy (DRAS) with reinforcement learning for multimodal multi-objective optimization problems (MMOPs). In DRAS, relative contribution and improvement are utilized to define the aptitude of subspaces, which can capture the potentials of subspaces accurately. Moreover, the reinforcement learning method is used to dynamically allocate computing resources for each subspace. In addition, the proposed DRAS is applied to zoning searches. Experimental results demonstrate that DRAS can effectively assist zoning search in finding more and better distributed equivalent Pareto optimal solutions in the decision space.
2022, vol. 19, no. 2,
pp. 153-168,
doi: 10.1007/s11633-022-1321-8
Abstract:
Pedestrian attribute recognition in surveillance scenarios is still a challenging task due to the inaccurate localization of specific attributes. In this paper, we propose a novel view-attribute localization method based on attention (VALA), which utilizes view information to guide the recognition process to focus on specific attributes and attention mechanism to localize specific attribute-corresponding areas. Concretely, view information is leveraged by the view prediction branch to generate four view weights that represent the confidences for attributes from different views. View weights are then delivered back to compose specific view-attributes, which will participate and supervise deep feature extraction. In order to explore the spatial location of a view-attribute, regional attention is introduced to aggregate spatial information and encode inter-channel dependencies of the view feature. Subsequently, a fine attentive attribute-specific region is localized, and regional weights for the view-attribute from different spatial locations are gained by the regional attention. The final view-attribute recognition outcome is obtained by combining the view weights with the regional weights. Experiments on three wide datasets (richly annotated pedestrian (RAP), annotated pedestrian v2 (RAPv2), and PA-100K) demonstrate the effectiveness of our approach compared with state-of-the-art methods.
Pedestrian attribute recognition in surveillance scenarios is still a challenging task due to the inaccurate localization of specific attributes. In this paper, we propose a novel view-attribute localization method based on attention (VALA), which utilizes view information to guide the recognition process to focus on specific attributes and attention mechanism to localize specific attribute-corresponding areas. Concretely, view information is leveraged by the view prediction branch to generate four view weights that represent the confidences for attributes from different views. View weights are then delivered back to compose specific view-attributes, which will participate and supervise deep feature extraction. In order to explore the spatial location of a view-attribute, regional attention is introduced to aggregate spatial information and encode inter-channel dependencies of the view feature. Subsequently, a fine attentive attribute-specific region is localized, and regional weights for the view-attribute from different spatial locations are gained by the regional attention. The final view-attribute recognition outcome is obtained by combining the view weights with the regional weights. Experiments on three wide datasets (richly annotated pedestrian (RAP), annotated pedestrian v2 (RAPv2), and PA-100K) demonstrate the effectiveness of our approach compared with state-of-the-art methods.
2015, vol. 12, no. 3,
pp. 229-242,
doi: 10.1007/s11633-015-0893-y
2015, vol. 12, no. 2,
pp. 134-141,
doi: 10.1007/s11633-015-0880-3
2015, vol. 12, no. 3,
pp. 337-342,
doi: 10.1007/s11633-014-0870-x
2015, vol. 12, no. 1,
pp. 70-76,
doi: 10.1007/s11633-014-0820-7
2015, vol. 12, no. 4,
pp. 343-367,
doi: 10.1007/s11633-015-0894-x
2016, vol. 13, no. 3,
pp. 199-225,
doi: 10.1007/s11633-016-1004-4
2015, vol. 12, no. 2,
pp. 156-170,
doi: 10.1007/s11633-014-0825-2
2015, vol. 12, no. 5,
pp. 511-517,
doi: 10.1007/s11633-014-0859-5
2015, vol. 12, no. 2,
pp. 192-198,
doi: 10.1007/s11633-014-0868-4
2015, vol. 12, no. 2,
pp. 117-124,
doi: 10.1007/s11633-015-0878-x
2015, vol. 12, no. 6,
pp. 579-587,
doi: 10.1007/s11633-015-0901-2
2016, vol. 13, no. 1,
pp. 1-18,
doi: 10.1007/s11633-015-0913-y
2015, vol. 12, no. 4,
pp. 368-381,
doi: 10.1007/s11633-015-0895-9
2015, vol. 12, no. 4,
pp. 393-401,
doi: 10.1007/s11633-014-0858-6
2015, vol. 12, no. 4,
pp. 440-447,
doi: 10.1007/s11633-014-0856-8
2016, vol. 13, no. 4,
pp. 382-391,
doi: 10.1007/s11633-015-0918-6
2015, vol. 12, no. 3,
pp. 243-253,
doi: 10.1007/s11633-015-0888-8
2015, vol. 12, no. 2,
pp. 149-155,
doi: 10.1007/s11633-015-0881-2
2015, vol. 12, no. 1,
pp. 43-49,
doi: 10.1007/s11633-014-0866-6
2015, vol. 12, no. 3,
pp. 330-336,
doi: 10.1007/s11633-015-0886-x
2015, vol. 12, no. 4,
pp. 448-454,
doi: 10.1007/s11633-014-0849-7
, Available online
Abstract:
Cataracts are the leading cause of visual impairment and blindness globally. Over the years, researchers have achieved significant progress in developing state-of-the-art machine learning techniques for automatic cataract classification and grading, aiming to prevent cataracts early and improve clinicians′ diagnosis efficiency. This survey provides a comprehensive survey of recent advances in machine learning techniques for cataract classification/grading based on ophthalmic images. We summarize existing literature from two research directions: conventional machine learning methods and deep learning methods. This survey also provides insights into existing works of both merits and limitations. In addition, we discuss several challenges of automatic cataract classification/grading based on machine learning techniques and present possible solutions to these challenges for future research.
Cataracts are the leading cause of visual impairment and blindness globally. Over the years, researchers have achieved significant progress in developing state-of-the-art machine learning techniques for automatic cataract classification and grading, aiming to prevent cataracts early and improve clinicians′ diagnosis efficiency. This survey provides a comprehensive survey of recent advances in machine learning techniques for cataract classification/grading based on ophthalmic images. We summarize existing literature from two research directions: conventional machine learning methods and deep learning methods. This survey also provides insights into existing works of both merits and limitations. In addition, we discuss several challenges of automatic cataract classification/grading based on machine learning techniques and present possible solutions to these challenges for future research.
-
CFP | IEEE ICAC2022 (Submission Deadline EXTENDED ! )
-
Knowledge Mining: A Cross-disciplinary Survey
Yong Rui, Vicente Ivan Sanchez Carmona, Mohsen Pourvali, Yun Xing, Wei-Wen Yi, Hui-Bin Ruan, Yu Zhang -
Sharing Weights in Shallow Layers via Rotation Group Equivariant Convolutions
Zhiqiang Chen, Ting-Bing Xu, Jinpeng Li, Huiguang He
MIR News More >
Global Visitors