Volume 20, Number 1, 2023
column
Display Method:
2023, vol. 20, no. 1,
pp. 1-18, doi: 10.1007/s11633-022-1390-8
Abstract:
Traditional joint-link robots have been widely used in production lines because of their high precision for single tasks. With the development of the manufacturing and service industries, the requirement for the comprehensive performance of robotics is growing. Numerous types of bio-inspired robotics have been investigated to realize human-like motion control and manipulation. A study route from inner mechanisms to external structures is proposed to imitate humans and animals better. With this idea, a brain-inspired intelligent robotic system is constructed that contains visual cognition, decision-making, motion control, and musculoskeletal structures. This paper reviews cutting-edge research in brain-inspired visual cognition, decision-making, motion control, and musculoskeletal systems. Two software systems and a corresponding hardware system are established, aiming at the verification and applications of next-generation brain-inspired musculoskeletal robots.
Traditional joint-link robots have been widely used in production lines because of their high precision for single tasks. With the development of the manufacturing and service industries, the requirement for the comprehensive performance of robotics is growing. Numerous types of bio-inspired robotics have been investigated to realize human-like motion control and manipulation. A study route from inner mechanisms to external structures is proposed to imitate humans and animals better. With this idea, a brain-inspired intelligent robotic system is constructed that contains visual cognition, decision-making, motion control, and musculoskeletal structures. This paper reviews cutting-edge research in brain-inspired visual cognition, decision-making, motion control, and musculoskeletal systems. Two software systems and a corresponding hardware system are established, aiming at the verification and applications of next-generation brain-inspired musculoskeletal robots.
2023, vol. 20, no. 1,
pp. 19-37, doi: 10.1007/s11633-022-1343-2
Abstract:
In the past decades, artificial intelligence (AI) has achieved unprecedented success, where statistical models become the central entity in AI. However, the centralized training and inference paradigm for building and using these models is facing more and more privacy and legal challenges. To bridge the gap between data privacy and the need for data fusion, an emerging AI paradigm federated learning (FL) has emerged as an approach for solving data silos and data privacy problems. Based on secure distributed AI, federated learning emphasizes data security throughout the lifecycle, which includes the following steps: data preprocessing, training, evaluation, and deployments. FL keeps data security by using methods, such as secure multi-party computation (MPC), differential privacy, and hardware solutions, to build and use distributed multiple-party machine-learning systems and statistical models over different data sources. Besides data privacy concerns, we argue that the concept of “model” matters, when developing and deploying federated models, they are easy to expose to various kinds of risks including plagiarism, illegal copy, and misuse. To address these issues, we introduce FedIPR, a novel ownership verification scheme, by embedding watermarks into FL models to verify the ownership of FL models and protect model intellectual property rights (IPR or IP-right for short). While security is at the core of FL, there are still many articles referred to distributed machine learning with no security guarantee as “federated learning”, which are not satisfied with the FL definition supposed to be. To this end, in this paper, we reiterate the concept of federated learning and propose secure federated learning (SFL), where the ultimate goal is to build trustworthy and safe AI with strong privacy-preserving and IP-right-preserving. We provide a comprehensive overview of existing works, including threats, attacks, and defenses in each phase of SFL from the lifecycle perspective.
In the past decades, artificial intelligence (AI) has achieved unprecedented success, where statistical models become the central entity in AI. However, the centralized training and inference paradigm for building and using these models is facing more and more privacy and legal challenges. To bridge the gap between data privacy and the need for data fusion, an emerging AI paradigm federated learning (FL) has emerged as an approach for solving data silos and data privacy problems. Based on secure distributed AI, federated learning emphasizes data security throughout the lifecycle, which includes the following steps: data preprocessing, training, evaluation, and deployments. FL keeps data security by using methods, such as secure multi-party computation (MPC), differential privacy, and hardware solutions, to build and use distributed multiple-party machine-learning systems and statistical models over different data sources. Besides data privacy concerns, we argue that the concept of “model” matters, when developing and deploying federated models, they are easy to expose to various kinds of risks including plagiarism, illegal copy, and misuse. To address these issues, we introduce FedIPR, a novel ownership verification scheme, by embedding watermarks into FL models to verify the ownership of FL models and protect model intellectual property rights (IPR or IP-right for short). While security is at the core of FL, there are still many articles referred to distributed machine learning with no security guarantee as “federated learning”, which are not satisfied with the FL definition supposed to be. To this end, in this paper, we reiterate the concept of federated learning and propose secure federated learning (SFL), where the ultimate goal is to build trustworthy and safe AI with strong privacy-preserving and IP-right-preserving. We provide a comprehensive overview of existing works, including threats, attacks, and defenses in each phase of SFL from the lifecycle perspective.
2023, vol. 20, no. 1,
pp. 38-56, doi: 10.1007/s11633-022-1369-5
Abstract:
In the past few years, the emergence of pre-training models has brought uni-modal fields such as computer vision (CV) and natural language processing (NLP) to a new era. Substantial works have shown that they are beneficial for downstream uni-modal tasks and avoid training a new model from scratch. So can such pre-trained models be applied to multi-modal tasks? Researchers have explored this problem and made significant progress. This paper surveys recent advances and new frontiers in vision-language pre-training (VLP), including image-text and video-text pre-training. To give readers a better overall grasp of VLP, we first review its recent advances in five aspects: feature extraction, model architecture, pre-training objectives, pre-training datasets, and downstream tasks. Then, we summarize the specific VLP models in detail. Finally, we discuss the new frontiers in VLP. To the best of our knowledge, this is the first survey focused on VLP. We hope that this survey can shed light on future research in the VLP field.
In the past few years, the emergence of pre-training models has brought uni-modal fields such as computer vision (CV) and natural language processing (NLP) to a new era. Substantial works have shown that they are beneficial for downstream uni-modal tasks and avoid training a new model from scratch. So can such pre-trained models be applied to multi-modal tasks? Researchers have explored this problem and made significant progress. This paper surveys recent advances and new frontiers in vision-language pre-training (VLP), including image-text and video-text pre-training. To give readers a better overall grasp of VLP, we first review its recent advances in five aspects: feature extraction, model architecture, pre-training objectives, pre-training datasets, and downstream tasks. Then, we summarize the specific VLP models in detail. Finally, we discuss the new frontiers in VLP. To the best of our knowledge, this is the first survey focused on VLP. We hope that this survey can shed light on future research in the VLP field.
2023, vol. 20, no. 1,
pp. 57-78, doi: 10.1007/s11633-022-1361-0
Abstract:
In the past decade, multimodal neuroimaging and genomic techniques have been increasingly developed. As an interdisciplinary topic, brain imaging genomics is devoted to evaluating and characterizing genetic variants in individuals that influence phenotypic measures derived from structural and functional brain imaging. This technique is capable of revealing the complex mechanisms by macroscopic intermediates from the genetic level to cognition and psychiatric disorders in humans. It is well known that machine learning is a powerful tool in the data-driven association studies, which can fully utilize priori knowledge (intercorrelated structure information among imaging and genetic data) for association modelling. In addition, the association study is able to find the association between risk genes and brain structure or function so that a better mechanistic understanding of behaviors or disordered brain functions is explored. In this paper, the related background and fundamental work in imaging genomics are first reviewed. Then, we show the univariate learning approaches for association analysis, summarize the main idea and modelling in genetic-imaging association studies based on multivariate machine learning, and present methods for joint association analysis and outcome prediction. Finally, this paper discusses some prospects for future work.
In the past decade, multimodal neuroimaging and genomic techniques have been increasingly developed. As an interdisciplinary topic, brain imaging genomics is devoted to evaluating and characterizing genetic variants in individuals that influence phenotypic measures derived from structural and functional brain imaging. This technique is capable of revealing the complex mechanisms by macroscopic intermediates from the genetic level to cognition and psychiatric disorders in humans. It is well known that machine learning is a powerful tool in the data-driven association studies, which can fully utilize priori knowledge (intercorrelated structure information among imaging and genetic data) for association modelling. In addition, the association study is able to find the association between risk genes and brain structure or function so that a better mechanistic understanding of behaviors or disordered brain functions is explored. In this paper, the related background and fundamental work in imaging genomics are first reviewed. Then, we show the univariate learning approaches for association analysis, summarize the main idea and modelling in genetic-imaging association studies based on multivariate machine learning, and present methods for joint association analysis and outcome prediction. Finally, this paper discusses some prospects for future work.
2023, vol. 20, no. 1,
pp. 79-91, doi: 10.1007/s11633-022-1360-1
Abstract:
Concept learning constructs visual representations that are connected to linguistic semantics, which is fundamental to vision-language tasks. Although promising progress has been made, existing concept learners are still vulnerable to attribute perturbations and out-of-distribution compositions during inference. We ascribe the bottleneck to a failure to explore the intrinsic semantic hierarchy of visual concepts, e.g., {red, blue,···}\begin{document}$\in$\end{document} ![]()
![]()
“color” subspace yet cube \begin{document}$\in$\end{document} ![]()
![]()
“shape”. In this paper, we propose a visual superordinate abstraction framework for explicitly modeling semantic-aware visual subspaces (i.e., visual superordinates). With only natural visual question answering data, our model first acquires the semantic hierarchy from a linguistic view and then explores mutually exclusive visual superordinates under the guidance of linguistic hierarchy. In addition, a quasi-center visual concept clustering and superordinate shortcut learning schemes are proposed to enhance the discrimination and independence of concepts within each visual superordinate. Experiments demonstrate the superiority of the proposed framework under diverse settings, which increases the overall answering accuracy relatively by 7.5% for reasoning with perturbations and 15.6% for compositional generalization tests.
Concept learning constructs visual representations that are connected to linguistic semantics, which is fundamental to vision-language tasks. Although promising progress has been made, existing concept learners are still vulnerable to attribute perturbations and out-of-distribution compositions during inference. We ascribe the bottleneck to a failure to explore the intrinsic semantic hierarchy of visual concepts, e.g., {red, blue,···}
2023, vol. 20, no. 1,
pp. 92-108, doi: 10.1007/s11633-022-1365-9
Abstract:
This paper introduces deep gradient network (DGNet), a novel deep framework that exploits object gradient supervision for camouflaged object detection (COD). It decouples the task into two connected branches, i.e., a context and a texture encoder. The essential connection is the gradient-induced transition, representing a soft grouping between context and texture features. Benefiting from the simple but efficient framework, DGNet outperforms existing state-of-the-art COD models by a large margin. Notably, our efficient version, DGNet-S, runs in real-time (80 fps) and achieves comparable results to the cutting-edge model JCSOD-CVPR21 with only 6.82% parameters. The application results also show that the proposed DGNet performs well in the polyp segmentation, defect detection, and transparent object segmentation tasks. The code will be made available athttps://github.com/GewelsJI/DGNet .
This paper introduces deep gradient network (DGNet), a novel deep framework that exploits object gradient supervision for camouflaged object detection (COD). It decouples the task into two connected branches, i.e., a context and a texture encoder. The essential connection is the gradient-induced transition, representing a soft grouping between context and texture features. Benefiting from the simple but efficient framework, DGNet outperforms existing state-of-the-art COD models by a large margin. Notably, our efficient version, DGNet-S, runs in real-time (80 fps) and achieves comparable results to the cutting-edge model JCSOD-CVPR21 with only 6.82% parameters. The application results also show that the proposed DGNet performs well in the polyp segmentation, defect detection, and transparent object segmentation tasks. The code will be made available at
2023, vol. 20, no. 1,
pp. 109-120, doi: 10.1007/s11633-022-1353-0
Abstract:
Structural neural network pruning aims to remove the redundant channels in the deep convolutional neural networks (CNNs) by pruning the filters of less importance to the final output accuracy. To reduce the degradation of performance after pruning, many methods utilize the loss with sparse regularization to produce structured sparsity. In this paper, we analyze these sparsity-training-based methods and find that the regularization of unpruned channels is unnecessary. Moreover, it restricts the network′s capacity, which leads to under-fitting. To solve this problem, we propose a novel pruning method, named MaskSparsity, with pruning-aware sparse regularization. MaskSparsity imposes the fine-grained sparse regularization on the specific filters selected by a pruning mask, rather than all the filters of the model. Before the fine-grained sparse regularization of MaskSparity, we can use many methods to get the pruning mask, such as running the global sparse regularization. MaskSparsity achieves a 63.03% float point operations (FLOPs) reduction on ResNet-110 by removing 60.34% of the parameters, with no top-1 accuracy loss on CIFAR-10. On ILSVRC-2012, MaskSparsity reduces more than 51.07% FLOPs on ResNet-50, with only a loss of 0.76% in the top-1 accuracy. The code of this paper is released athttps://github.com/CASIA-IVA-Lab/MaskSparsity . We have also integrated the code into a self-developed PyTorch pruning toolkit, named EasyPruner, at https://gitee.com/casia_iva_engineer/easypruner .
Structural neural network pruning aims to remove the redundant channels in the deep convolutional neural networks (CNNs) by pruning the filters of less importance to the final output accuracy. To reduce the degradation of performance after pruning, many methods utilize the loss with sparse regularization to produce structured sparsity. In this paper, we analyze these sparsity-training-based methods and find that the regularization of unpruned channels is unnecessary. Moreover, it restricts the network′s capacity, which leads to under-fitting. To solve this problem, we propose a novel pruning method, named MaskSparsity, with pruning-aware sparse regularization. MaskSparsity imposes the fine-grained sparse regularization on the specific filters selected by a pruning mask, rather than all the filters of the model. Before the fine-grained sparse regularization of MaskSparity, we can use many methods to get the pruning mask, such as running the global sparse regularization. MaskSparsity achieves a 63.03% float point operations (FLOPs) reduction on ResNet-110 by removing 60.34% of the parameters, with no top-1 accuracy loss on CIFAR-10. On ILSVRC-2012, MaskSparsity reduces more than 51.07% FLOPs on ResNet-50, with only a loss of 0.76% in the top-1 accuracy. The code of this paper is released at
2023, vol. 20, no. 1,
pp. 121-144, doi: 10.1007/s11633-022-1367-7
Abstract:
Swarm intelligence has become a hot research field of artificial intelligence. Considering the importance of swarm intelligence for the future development of artificial intelligence, we discuss and analyze swarm intelligence from a broader and deeper perspective. In a broader sense, we are talking about not only bio-inspired swarm intelligence, but also human-machine hybrid swarm intelligence. In a deeper sense, we discuss the research using a three-layer hierarchy: in the first layer, we divide the research of swarm intelligence into bio-inspired swarm intelligence and human-machine hybrid swarm intelligence; in the second layer, the bio-inspired swarm intelligence is divided into single-population swarm intelligence and multi-population swarm intelligence; and in the third layer, we review single-population, multi-population and human-machine hybrid models from different perspectives. Single-population swarm intelligence is inspired by biological intelligence. To further solve complex optimization problems, researchers have made preliminary explorations in multi-population swarm intelligence. However, it is difficult for bio-inspired swarm intelligence to realize dynamic cognitive intelligent behavior that meets the needs of human cognition. Researchers have introduced human intelligence into computing systems and proposed human-machine hybrid swarm intelligence. In addition to single-population swarm intelligence, we thoroughly review multi-population and human-machine hybrid swarm intelligence in this paper. We also discuss the applications of swarm intelligence in optimization, big data analysis, unmanned systems and other fields. Finally, we discuss future research directions and key issues to be studied in swarm intelligence.
Swarm intelligence has become a hot research field of artificial intelligence. Considering the importance of swarm intelligence for the future development of artificial intelligence, we discuss and analyze swarm intelligence from a broader and deeper perspective. In a broader sense, we are talking about not only bio-inspired swarm intelligence, but also human-machine hybrid swarm intelligence. In a deeper sense, we discuss the research using a three-layer hierarchy: in the first layer, we divide the research of swarm intelligence into bio-inspired swarm intelligence and human-machine hybrid swarm intelligence; in the second layer, the bio-inspired swarm intelligence is divided into single-population swarm intelligence and multi-population swarm intelligence; and in the third layer, we review single-population, multi-population and human-machine hybrid models from different perspectives. Single-population swarm intelligence is inspired by biological intelligence. To further solve complex optimization problems, researchers have made preliminary explorations in multi-population swarm intelligence. However, it is difficult for bio-inspired swarm intelligence to realize dynamic cognitive intelligent behavior that meets the needs of human cognition. Researchers have introduced human intelligence into computing systems and proposed human-machine hybrid swarm intelligence. In addition to single-population swarm intelligence, we thoroughly review multi-population and human-machine hybrid swarm intelligence in this paper. We also discuss the applications of swarm intelligence in optimization, big data analysis, unmanned systems and other fields. Finally, we discuss future research directions and key issues to be studied in swarm intelligence.