Display Method:
, Available online ,
doi: 10.1007/s11633-022-1364-x
Abstract:
Lung cancer is the leading cause of cancer-related deaths worldwide. Medical imaging technologies such as computed tomography (CT) and positron emission tomography (PET) are routinely used for non-invasive lung cancer diagnosis. In clinical practice, physicians investigate the characteristics of tumors such as the size, shape and location from CT and PET images to make decisions. Recently, scientists have proposed various computational image features that can capture more information than that directly perceivable by human eyes, which promotes the rise of radiomics. Radiomics is a research field on the conversion of medical images into high-dimensional features with data-driven methods to help subsequent data mining for better clinical decision support. Radiomic analysis has four major steps: image preprocessing, tumor segmentation, feature extraction and clinical prediction. Machine learning, including the high-profile deep learning, facilitates the development and application of radiomic methods. Various radiomic methods have been proposed recently, such as the construction of radiomic signatures, tumor habitat analysis, cluster pattern characterization and end-to-end prediction of tumor properties. These methods have been applied in many studies aiming at lung cancer diagnosis, treatment and monitoring, shedding light on future non-invasive evaluations of the nodule malignancy, histological subtypes, genomic properties and treatment responses. In this review, we summarized and categorized the studies on the general workflow, methods for clinical prediction and clinical applications of machine learning in lung cancer radiomic studies, introduced some commonly-used software tools, and discussed the limitations of current methods and possible future directions.
Lung cancer is the leading cause of cancer-related deaths worldwide. Medical imaging technologies such as computed tomography (CT) and positron emission tomography (PET) are routinely used for non-invasive lung cancer diagnosis. In clinical practice, physicians investigate the characteristics of tumors such as the size, shape and location from CT and PET images to make decisions. Recently, scientists have proposed various computational image features that can capture more information than that directly perceivable by human eyes, which promotes the rise of radiomics. Radiomics is a research field on the conversion of medical images into high-dimensional features with data-driven methods to help subsequent data mining for better clinical decision support. Radiomic analysis has four major steps: image preprocessing, tumor segmentation, feature extraction and clinical prediction. Machine learning, including the high-profile deep learning, facilitates the development and application of radiomic methods. Various radiomic methods have been proposed recently, such as the construction of radiomic signatures, tumor habitat analysis, cluster pattern characterization and end-to-end prediction of tumor properties. These methods have been applied in many studies aiming at lung cancer diagnosis, treatment and monitoring, shedding light on future non-invasive evaluations of the nodule malignancy, histological subtypes, genomic properties and treatment responses. In this review, we summarized and categorized the studies on the general workflow, methods for clinical prediction and clinical applications of machine learning in lung cancer radiomic studies, introduced some commonly-used software tools, and discussed the limitations of current methods and possible future directions.
, Available online ,
doi: 10.1007/s11633-023-1466-0
Abstract:
While recent years have witnessed a dramatic upsurge of exploiting deep neural networks toward solving image denoising, existing methods mostly rely on simple noise assumptions, such as additive white Gaussian noise (AWGN), JPEG compression noise and camera sensor noise, and a general-purpose blind denoising method for real images remains unsolved. In this paper, we attempt to solve this problem from the perspective of network architecture design and training data synthesis. Specifically, for the network architecture design, we propose a swin-conv block to incorporate the local modeling ability of residual convolutional layer and non-local modeling ability of swin transformer block, and then plug it as the main building block into the widely-used image-to-image translation UNet architecture. For the training data synthesis, we design a practical noise degradation model which takes into consideration different kinds of noise (including Gaussian, Poisson, speckle, JPEG compression, and processed camera sensor noises) and resizing, and also involves a random shuffle strategy and a double degradation strategy. Extensive experiments on AGWN removal and real image denoising demonstrate that the new network architecture design achieves state-of-the-art performance and the new degradation model can help to significantly improve the practicability. We believe our work can provide useful insights into current denoising research. The source code is available athttps://github.com/cszn/SCUNet .
While recent years have witnessed a dramatic upsurge of exploiting deep neural networks toward solving image denoising, existing methods mostly rely on simple noise assumptions, such as additive white Gaussian noise (AWGN), JPEG compression noise and camera sensor noise, and a general-purpose blind denoising method for real images remains unsolved. In this paper, we attempt to solve this problem from the perspective of network architecture design and training data synthesis. Specifically, for the network architecture design, we propose a swin-conv block to incorporate the local modeling ability of residual convolutional layer and non-local modeling ability of swin transformer block, and then plug it as the main building block into the widely-used image-to-image translation UNet architecture. For the training data synthesis, we design a practical noise degradation model which takes into consideration different kinds of noise (including Gaussian, Poisson, speckle, JPEG compression, and processed camera sensor noises) and resizing, and also involves a random shuffle strategy and a double degradation strategy. Extensive experiments on AGWN removal and real image denoising demonstrate that the new network architecture design achieves state-of-the-art performance and the new degradation model can help to significantly improve the practicability. We believe our work can provide useful insights into current denoising research. The source code is available at
, Available online ,
doi: 10.1007/s11633-023-1458-0
Abstract:
This paper aims to address the problem of supervised monocular depth estimation. We start with a meticulous pilot study to demonstrate that the long-range correlation is essential for accurate depth estimation. Moreover, the Transformer and convolution are good at long-range and close-range depth estimation, respectively. Therefore, we propose to adopt a parallel encoder architecture consisting of a Transformer branch and a convolution branch. The former can model global context with the effective attention mechanism and the latter aims to preserve the local information as the Transformer lacks the spatial inductive bias in modeling such contents. However, independent branches lead to a shortage of connections between features. To bridge this gap, we design a hierarchical aggregation and heterogeneous interaction module to enhance the Transformer features and model the affinity between the heterogeneous features in a set-to-set translation manner. Due to the unbearable memory cost introduced by the global attention on high-resolution feature maps, we adopt the deformable scheme to reduce the complexity. Extensive experiments on the KITTI, NYU, and SUN RGB-D datasets demonstrate that our proposed model, termed DepthFormer, surpasses state-of-the-art monocular depth estimation methods with prominent margins. The effectiveness of each proposed module is elaborately evaluated through meticulous and intensive ablation studies.
This paper aims to address the problem of supervised monocular depth estimation. We start with a meticulous pilot study to demonstrate that the long-range correlation is essential for accurate depth estimation. Moreover, the Transformer and convolution are good at long-range and close-range depth estimation, respectively. Therefore, we propose to adopt a parallel encoder architecture consisting of a Transformer branch and a convolution branch. The former can model global context with the effective attention mechanism and the latter aims to preserve the local information as the Transformer lacks the spatial inductive bias in modeling such contents. However, independent branches lead to a shortage of connections between features. To bridge this gap, we design a hierarchical aggregation and heterogeneous interaction module to enhance the Transformer features and model the affinity between the heterogeneous features in a set-to-set translation manner. Due to the unbearable memory cost introduced by the global attention on high-resolution feature maps, we adopt the deformable scheme to reduce the complexity. Extensive experiments on the KITTI, NYU, and SUN RGB-D datasets demonstrate that our proposed model, termed DepthFormer, surpasses state-of-the-art monocular depth estimation methods with prominent margins. The effectiveness of each proposed module is elaborately evaluated through meticulous and intensive ablation studies.
, Available online ,
doi: 10.1007/s11633-023-1432-x
Abstract:
Cancelable biometrics are a group of techniques to transform the input biometric to an irreversible feature intentionally using a transformation function and usually a key in order to provide security and privacy in biometric recognition systems. This transformation is repeatable enabling subsequent biometric comparisons. This paper introduces a new idea to be exploited as a transformation function for cancelable biometrics aimed at protecting templates against iterative optimization attacks. Our proposed scheme is based on time-varying keys (random biometrics in our case) and morphing transformations. An experimental implementation of the proposed scheme is given for face biometrics. The results confirm that the proposed approach is able to withstand leakage attacks while improving the recognition performance.
Cancelable biometrics are a group of techniques to transform the input biometric to an irreversible feature intentionally using a transformation function and usually a key in order to provide security and privacy in biometric recognition systems. This transformation is repeatable enabling subsequent biometric comparisons. This paper introduces a new idea to be exploited as a transformation function for cancelable biometrics aimed at protecting templates against iterative optimization attacks. Our proposed scheme is based on time-varying keys (random biometrics in our case) and morphing transformations. An experimental implementation of the proposed scheme is given for face biometrics. The results confirm that the proposed approach is able to withstand leakage attacks while improving the recognition performance.
, Available online ,
doi: 10.1007/s11633-023-1430-z
Abstract:
3D shape recognition has drawn much attention in recent years. The view-based approach performs best of all. However, the current multi-view methods are almost all fully supervised, and the pretraining models are almost all based on ImageNet. Although the pretraining results of ImageNet are quite impressive, there is still a significant discrepancy between multi-view datasets and ImageNet. Multi-view datasets naturally retain rich 3D information. In addition, large-scale datasets such as ImageNet require considerable cleaning and annotation work, so it is difficult to regenerate a second dataset. In contrast, unsupervised learning methods can learn general feature representations without any extra annotation. To this end, we propose a three-stage unsupervised joint pretraining model. Specifically, we decouple the final representations into three fine-grained representations. Data augmentation is utilized to obtain pixel-level representations within each view. And we boost the spatial invariant features from the view level. Finally, we exploit global information at the shape level through a novel extract-and-swap module. Experimental results demonstrate that the proposed method gains significantly in 3D object classification and retrieval tasks, and shows generalization to cross-dataset tasks.
3D shape recognition has drawn much attention in recent years. The view-based approach performs best of all. However, the current multi-view methods are almost all fully supervised, and the pretraining models are almost all based on ImageNet. Although the pretraining results of ImageNet are quite impressive, there is still a significant discrepancy between multi-view datasets and ImageNet. Multi-view datasets naturally retain rich 3D information. In addition, large-scale datasets such as ImageNet require considerable cleaning and annotation work, so it is difficult to regenerate a second dataset. In contrast, unsupervised learning methods can learn general feature representations without any extra annotation. To this end, we propose a three-stage unsupervised joint pretraining model. Specifically, we decouple the final representations into three fine-grained representations. Data augmentation is utilized to obtain pixel-level representations within each view. And we boost the spatial invariant features from the view level. Finally, we exploit global information at the shape level through a novel extract-and-swap module. Experimental results demonstrate that the proposed method gains significantly in 3D object classification and retrieval tasks, and shows generalization to cross-dataset tasks.
, Available online ,
doi: 10.1007/s11633-022-1403-7
Abstract:
Score-based multimodal biometric fusion has been shown to be successful in addressing the problem of unimodal techniques′ vulnerability to attack and poor performance in low-quality data. However, difficulties still exist in how to unify the meaning of heterogeneous scores more effectively. Aside from the matching scores themselves, the importance of the ranking information they include has been undervalued in previous studies. This study concentrates on matching scores and their ranking information and suggests the ranking partition collision (RPC) theory from the standpoint of the worth of scores. To meet both forensic and judicial needs, this paper proposes a method that employs a neural network to fuse biometrics at the score level. In addition, this paper constructs a virtual homologous dataset and conducts experiments on it. Experimental results demonstrate that the proposed method achieves an accuracy of 100% in both mAP and Rank1. To show the efficiency of the proposed method in practical applications, this work carries out more experiments utilizing real-world data. The results show that the proposed approach maintains a Rank1 accuracy of 99.2% on the million-scale database. It offers a novel approach to fusion at the score level.
Score-based multimodal biometric fusion has been shown to be successful in addressing the problem of unimodal techniques′ vulnerability to attack and poor performance in low-quality data. However, difficulties still exist in how to unify the meaning of heterogeneous scores more effectively. Aside from the matching scores themselves, the importance of the ranking information they include has been undervalued in previous studies. This study concentrates on matching scores and their ranking information and suggests the ranking partition collision (RPC) theory from the standpoint of the worth of scores. To meet both forensic and judicial needs, this paper proposes a method that employs a neural network to fuse biometrics at the score level. In addition, this paper constructs a virtual homologous dataset and conducts experiments on it. Experimental results demonstrate that the proposed method achieves an accuracy of 100% in both mAP and Rank1. To show the efficiency of the proposed method in practical applications, this work carries out more experiments utilizing real-world data. The results show that the proposed approach maintains a Rank1 accuracy of 99.2% on the million-scale database. It offers a novel approach to fusion at the score level.
Display Method:
2023, vol. 20, no. 5,
pp. 605-613,
doi: 10.1007/s11633-023-1469-x
Abstract:
Google′s Bard has emerged as a formidable competitor to OpenAI′s ChatGPT in the field of conversational AI. Notably, Bard has recently been updated to handle visual inputs alongside text prompts during conversations. Given Bard′s impressive track record in handling textual inputs, we explore its capabilities in understanding and interpreting visual data (images) conditioned by text questions. This exploration holds the potential to unveil new insights and challenges for Bard and other forthcoming multi-modal Generative models, especially in addressing complex computer vision problems that demand accurate visual and language understanding. Specifically, in this study, we focus on 15 diverse task scenarios encompassing regular, camouflaged, medical, under-water and remote sensing data to comprehensively evaluate Bard′s performance. Our primary finding indicates that Bard still struggles in these vision scenarios, highlighting the significant gap in vision-based understanding that needs to be bridged in future developments. We expect that this empirical study will prove valuable in advancing future models, leading to enhanced capabilities in comprehending and interpreting fine-grained visual data. Our project is released onhttps://github.com/htqin/GoogleBard-VisUnderstand .
Google′s Bard has emerged as a formidable competitor to OpenAI′s ChatGPT in the field of conversational AI. Notably, Bard has recently been updated to handle visual inputs alongside text prompts during conversations. Given Bard′s impressive track record in handling textual inputs, we explore its capabilities in understanding and interpreting visual data (images) conditioned by text questions. This exploration holds the potential to unveil new insights and challenges for Bard and other forthcoming multi-modal Generative models, especially in addressing complex computer vision problems that demand accurate visual and language understanding. Specifically, in this study, we focus on 15 diverse task scenarios encompassing regular, camouflaged, medical, under-water and remote sensing data to comprehensively evaluate Bard′s performance. Our primary finding indicates that Bard still struggles in these vision scenarios, highlighting the significant gap in vision-based understanding that needs to be bridged in future developments. We expect that this empirical study will prove valuable in advancing future models, leading to enhanced capabilities in comprehending and interpreting fine-grained visual data. Our project is released on
2023, vol. 20, no. 5,
pp. 614-639,
doi: 10.1007/s11633-022-1395-3
Abstract:
With the application of mobile communication technology in the automotive industry, intelligent connected vehicles equipped with communication and sensing devices have been rapidly promoted. The road and traffic information perceived by intelligent vehicles has important potential application value, especially for improving the energy-saving and safe-driving of vehicles as well as the efficient operation of traffic. Therefore, a type of vehicle control technology called predictive cruise control (PCC) has become a hot research topic. It fully taps the perceived or predicted environmental information to carry out predictive cruise control of vehicles and improves the comprehensive performance of the vehicle-road system. Most existing reviews focus on the economical driving of vehicles, but few scholars have conducted a comprehensive survey of PCC from theory to the status quo. In this paper, the methods and advances of PCC technologies are reviewed comprehensively by investigating the global literature, and typical applications under a cloud control system (CCS) are proposed. Firstly, the methodology of PCC is generally introduced. Then according to typical scenarios, the PCC-related research is deeply surveyed, including freeway and urban traffic scenarios involving traditional vehicles, new energy vehicles, intelligent vehicles, and multi-vehicle platoons. Finally, the general architecture and three typical applications of the cloud control system (CCS) on PCC are briefly introduced, and the prospect and future trends of PCC are proposed.
With the application of mobile communication technology in the automotive industry, intelligent connected vehicles equipped with communication and sensing devices have been rapidly promoted. The road and traffic information perceived by intelligent vehicles has important potential application value, especially for improving the energy-saving and safe-driving of vehicles as well as the efficient operation of traffic. Therefore, a type of vehicle control technology called predictive cruise control (PCC) has become a hot research topic. It fully taps the perceived or predicted environmental information to carry out predictive cruise control of vehicles and improves the comprehensive performance of the vehicle-road system. Most existing reviews focus on the economical driving of vehicles, but few scholars have conducted a comprehensive survey of PCC from theory to the status quo. In this paper, the methods and advances of PCC technologies are reviewed comprehensively by investigating the global literature, and typical applications under a cloud control system (CCS) are proposed. Firstly, the methodology of PCC is generally introduced. Then according to typical scenarios, the PCC-related research is deeply surveyed, including freeway and urban traffic scenarios involving traditional vehicles, new energy vehicles, intelligent vehicles, and multi-vehicle platoons. Finally, the general architecture and three typical applications of the cloud control system (CCS) on PCC are briefly introduced, and the prospect and future trends of PCC are proposed.
2023, vol. 20, no. 5,
pp. 640-655,
doi: 10.1007/s11633-022-1396-2
Abstract:
This paper reviews the recent literature on solving the Boolean satisfiability problem (SAT), an archetypal\begin{document}$ {\cal{NP}}$\end{document} ![]()
![]()
-complete problem, with the aid of machine learning (ML) techniques. Over the last decade, the machine learning society advances rapidly and surpasses human performance on several tasks. This trend also inspires a number of works that apply machine learning methods for SAT solving. In this survey, we examine the evolving ML SAT solvers from naive classifiers with handcrafted features to emerging end-to-end SAT solvers, as well as recent progress on combinations of existing conflict-driven clause learning (CDCL) and local search solvers with machine learning methods. Overall, solving SAT with machine learning is a promising yet challenging research topic. We conclude the limitations of current works and suggest possible future directions. The collected paper list is available at https://github.com/Thinklab-SJTU/awesome-ml4co .
This paper reviews the recent literature on solving the Boolean satisfiability problem (SAT), an archetypal
2023, vol. 20, no. 5,
pp. 656-665,
doi: 10.1007/s11633-023-1423-y
Abstract:
Great progress has been made toward accurate face detection in recent years. However, the heavy model and expensive computation costs make it difficult to deploy many detectors on mobile and embedded devices where model size and latency are highly constrained. In this paper, we present a millisecond-level anchor-free face detector, YuNet, which is specifically designed for edge devices. There are several key contributions in improving the efficiency-accuracy trade-off. First, we analyse the influential state-of-the-art face detectors in recent years and summarize the rules to reduce the size of models. Then, a lightweight face detector, YuNet, is introduced. Our detector contains a tiny and efficient feature extraction backbone and a simplified pyramid feature fusion neck. To the best of our knowledge, YuNet has the best trade-off between accuracy and speed. It has only 75 856 parameters and is less than 1/5 of other small-size detectors. In addition, a training strategy is presented for the tiny face detector, and it can effectively train models with the same distribution of the training set. The proposed YuNet achieves 81.1% mAP (single-scale) on the WIDER FACE validation hard track with a high inference efficiency (Intel i7-12700K: 1.6 ms per frame at 320×320). Because of its unique advantages, the repository for YuNet and its predecessors has been popular at GitHub and gained more than 11 K stars athttps://github.com/ShiqiYu/libfacedetection .
Great progress has been made toward accurate face detection in recent years. However, the heavy model and expensive computation costs make it difficult to deploy many detectors on mobile and embedded devices where model size and latency are highly constrained. In this paper, we present a millisecond-level anchor-free face detector, YuNet, which is specifically designed for edge devices. There are several key contributions in improving the efficiency-accuracy trade-off. First, we analyse the influential state-of-the-art face detectors in recent years and summarize the rules to reduce the size of models. Then, a lightweight face detector, YuNet, is introduced. Our detector contains a tiny and efficient feature extraction backbone and a simplified pyramid feature fusion neck. To the best of our knowledge, YuNet has the best trade-off between accuracy and speed. It has only 75 856 parameters and is less than 1/5 of other small-size detectors. In addition, a training strategy is presented for the tiny face detector, and it can effectively train models with the same distribution of the training set. The proposed YuNet achieves 81.1% mAP (single-scale) on the WIDER FACE validation hard track with a high inference efficiency (Intel i7-12700K: 1.6 ms per frame at 320×320). Because of its unique advantages, the repository for YuNet and its predecessors has been popular at GitHub and gained more than 11 K stars at
2023, vol. 20, no. 5,
pp. 666-682,
doi: 10.1007/s11633-022-1328-1
Abstract:
Adversarial example has been well known as a serious threat to deep neural networks (DNNs). In this work, we study the detection of adversarial examples based on the assumption that the output and internal responses of one DNN model for both adversarial and benign examples follow the generalized Gaussian distribution (GGD) but with different parameters (i.e., shape factor, mean, and variance). GGD is a general distribution family that covers many popular distributions (e.g., Laplacian, Gaussian, or uniform). Therefore, it is more likely to approximate the intrinsic distributions of internal responses than any specific distribution. Besides, since the shape factor is more robust to different databases rather than the other two parameters, we propose to construct discriminative features via the shape factor for adversarial detection, employing the magnitude of Benford-Fourier (MBF) coefficients, which can be easily estimated using responses. Finally, a support vector machine is trained as an adversarial detector leveraging the MBF features. Extensive experiments in terms of image classification demonstrate that the proposed detector is much more effective and robust in detecting adversarial examples of different crafting methods and sources compared to state-of-the-art adversarial detection methods.
Adversarial example has been well known as a serious threat to deep neural networks (DNNs). In this work, we study the detection of adversarial examples based on the assumption that the output and internal responses of one DNN model for both adversarial and benign examples follow the generalized Gaussian distribution (GGD) but with different parameters (i.e., shape factor, mean, and variance). GGD is a general distribution family that covers many popular distributions (e.g., Laplacian, Gaussian, or uniform). Therefore, it is more likely to approximate the intrinsic distributions of internal responses than any specific distribution. Besides, since the shape factor is more robust to different databases rather than the other two parameters, we propose to construct discriminative features via the shape factor for adversarial detection, employing the magnitude of Benford-Fourier (MBF) coefficients, which can be easily estimated using responses. Finally, a support vector machine is trained as an adversarial detector leveraging the MBF features. Extensive experiments in terms of image classification demonstrate that the proposed detector is much more effective and robust in detecting adversarial examples of different crafting methods and sources compared to state-of-the-art adversarial detection methods.
2023, vol. 20, no. 5,
pp. 683-696,
doi: 10.1007/s11633-022-1341-4
Abstract:
Most finger vein authentication systems suffer from the problem of small sample size. However, the data augmentation can alleviate this problem to a certain extent but did not fundamentally solve the problem of category diversity. So the researchers resort to pre-training or multi-source data joint training methods, but these methods will lead to the problem of user privacy leakage. In view of the above issues, this paper proposes a federated learning-based finger vein authentication framework (FedFV) to solve the problem of small sample size and category diversity while protecting user privacy. Through training under FedFV, each client can share the knowledge learned from its user′s finger vein data with the federated client without causing template leaks. In addition, we further propose an efficient personalized federated aggregation algorithm, named federated weighted proportion reduction (FedWPR), to tackle the problem of non-independent identically distribution caused by client diversity, thus achieving the best performance for each client. To thoroughly evaluate the effectiveness of FedFV, comprehensive experiments are conducted on nine publicly available finger vein datasets. Experimental results show that FedFV can improve the performance of the finger vein authentication system without directly using other client data. To the best of our knowledge, FedFV is the first personalized federated finger vein authentication framework, which has some reference value for subsequent biometric privacy protection research.
Most finger vein authentication systems suffer from the problem of small sample size. However, the data augmentation can alleviate this problem to a certain extent but did not fundamentally solve the problem of category diversity. So the researchers resort to pre-training or multi-source data joint training methods, but these methods will lead to the problem of user privacy leakage. In view of the above issues, this paper proposes a federated learning-based finger vein authentication framework (FedFV) to solve the problem of small sample size and category diversity while protecting user privacy. Through training under FedFV, each client can share the knowledge learned from its user′s finger vein data with the federated client without causing template leaks. In addition, we further propose an efficient personalized federated aggregation algorithm, named federated weighted proportion reduction (FedWPR), to tackle the problem of non-independent identically distribution caused by client diversity, thus achieving the best performance for each client. To thoroughly evaluate the effectiveness of FedFV, comprehensive experiments are conducted on nine publicly available finger vein datasets. Experimental results show that FedFV can improve the performance of the finger vein authentication system without directly using other client data. To the best of our knowledge, FedFV is the first personalized federated finger vein authentication framework, which has some reference value for subsequent biometric privacy protection research.
2023, vol. 20, no. 5,
pp. 697-706,
doi: 10.1007/s11633-022-1345-0
Abstract:
Electrocardiogram (ECG) biometric recognition has gained considerable attention, and various methods have been proposed to facilitate its development. However, one limitation is that the diversity of ECG signals affects the recognition performance. To address this issue, in this paper, we propose a novel ECG biometrics framework based on enhanced correlation and semantic-rich embedding. Firstly, we construct an enhanced correlation between the base feature and latent representation by using only one projection. Secondly, to fully exploit the semantic information, we take both the label and pairwise similarity into consideration to reduce the influence of ECG sample diversity. Furthermore, to solve the objective function, we propose an effective and efficient algorithm for optimization. Finally, extensive experiments are conducted on two benchmark datasets, and the experimental results show the effectiveness of our framework.
Electrocardiogram (ECG) biometric recognition has gained considerable attention, and various methods have been proposed to facilitate its development. However, one limitation is that the diversity of ECG signals affects the recognition performance. To address this issue, in this paper, we propose a novel ECG biometrics framework based on enhanced correlation and semantic-rich embedding. Firstly, we construct an enhanced correlation between the base feature and latent representation by using only one projection. Secondly, to fully exploit the semantic information, we take both the label and pairwise similarity into consideration to reduce the influence of ECG sample diversity. Furthermore, to solve the objective function, we propose an effective and efficient algorithm for optimization. Finally, extensive experiments are conducted on two benchmark datasets, and the experimental results show the effectiveness of our framework.
2023, vol. 20, no. 5,
pp. 707-715,
doi: 10.1007/s11633-022-1366-8
Abstract:
Photoplethysmography (PPG) biometrics have received considerable attention. Although deep learning has achieved good performance for PPG biometrics, several challenges remain open: 1) How to effectively extract the feature fusion representation from time and frequency PPG signals. 2) How to effectively capture a series of PPG signal transition information. 3) How to extract time-varying information from one-dimensional time-frequency sequential data. To address these challenges, we propose a dual-domain and multiscale fusion deep neural network (DMFDNN) for PPG biometric recognition. The DMFDNN is mainly composed of a two-branch deep learning framework for PPG biometrics, which can learn the time-varying and multiscale discriminative features from the time and frequency domains. Meanwhile, we design a multiscale extraction module to capture transition information, which consists of multiple convolution layers with different receptive fields for capturing multiscale transition information. In addition, the dual-domain attention module is proposed to strengthen the domain of greater contributions from time-domain and frequency-domain data for PPG biometrics. Experiments on the four datasets demonstrate that DMFDNN outperforms the state-of-the-art methods for PPG biometrics.
Photoplethysmography (PPG) biometrics have received considerable attention. Although deep learning has achieved good performance for PPG biometrics, several challenges remain open: 1) How to effectively extract the feature fusion representation from time and frequency PPG signals. 2) How to effectively capture a series of PPG signal transition information. 3) How to extract time-varying information from one-dimensional time-frequency sequential data. To address these challenges, we propose a dual-domain and multiscale fusion deep neural network (DMFDNN) for PPG biometric recognition. The DMFDNN is mainly composed of a two-branch deep learning framework for PPG biometrics, which can learn the time-varying and multiscale discriminative features from the time and frequency domains. Meanwhile, we design a multiscale extraction module to capture transition information, which consists of multiple convolution layers with different receptive fields for capturing multiscale transition information. In addition, the dual-domain attention module is proposed to strengthen the domain of greater contributions from time-domain and frequency-domain data for PPG biometrics. Experiments on the four datasets demonstrate that DMFDNN outperforms the state-of-the-art methods for PPG biometrics.
2023, vol. 20, no. 5,
pp. 716-728,
doi: 10.1007/s11633-022-1379-3
Abstract:
Instance segmentation has drawn mounting attention due to its significant utility. However, high computational costs have been widely acknowledged in this domain, as the instance mask is generally achieved by pixel-level labeling. In this paper, we present a conceptually efficient contour regression network based on the you only look once (YOLO) architecture named YOLO-CORE for instance segmentation. The mask of the instance is efficiently acquired by explicit and direct contour regression using our designed multi-order constraint consisting of a polar distance loss and a sector loss. Our proposed YOLO-CORE yields impressive segmentation performance in terms of both accuracy and speed. It achieves 57.9% AP@0.5 with 47 FPS (frames per second) on the semantic boundaries dataset (SBD) and 51.1% AP@0.5 with 46 FPS on the COCO dataset. The superior performance achieved by our method with explicit contour regression suggests a new technique line in the YOLO-based image understanding field. Moreover, our instance segmentation design can be flexibly integrated into existing deep detectors with negligible computation cost (65.86 BFLOPs (billion float operations per second) to 66.15 BFLOPs with the YOLOv3 detector).
Instance segmentation has drawn mounting attention due to its significant utility. However, high computational costs have been widely acknowledged in this domain, as the instance mask is generally achieved by pixel-level labeling. In this paper, we present a conceptually efficient contour regression network based on the you only look once (YOLO) architecture named YOLO-CORE for instance segmentation. The mask of the instance is efficiently acquired by explicit and direct contour regression using our designed multi-order constraint consisting of a polar distance loss and a sector loss. Our proposed YOLO-CORE yields impressive segmentation performance in terms of both accuracy and speed. It achieves 57.9% AP@0.5 with 47 FPS (frames per second) on the semantic boundaries dataset (SBD) and 51.1% AP@0.5 with 46 FPS on the COCO dataset. The superior performance achieved by our method with explicit contour regression suggests a new technique line in the YOLO-based image understanding field. Moreover, our instance segmentation design can be flexibly integrated into existing deep detectors with negligible computation cost (65.86 BFLOPs (billion float operations per second) to 66.15 BFLOPs with the YOLOv3 detector).
2023, vol. 20, no. 5,
pp. 729-740,
doi: 10.1007/s11633-022-1355-y
Abstract:
Insulators are important components of power transmission lines. Once a failure occurs, it may cause a large-scale blackout and other hidden dangers. Due to the large image size and complex background, detecting small defect objects is a challenge. We make improvements based on the two-stage network Faster R-convolutional neural networks (CNN). First, we use a hierarchical Swin Transformer with shifted windows as the feature extraction network, instead of ResNet, to extract more discriminative features, and then design the deformable receptive field block to encode global and local context information, which is utilized to capture key clues for detecting objects in complex backgrounds. Finally, the filling data augmentation method is proposed for the problem of insufficient defects and more images of insulator defects under different backgrounds are added to the training set to improve the robustness of the model. As a result, the recall increases from 89.5% to 92.1%, and the average precision increases from 81.0% to 87.1%. To further prove the superiority of the proposed algorithm, we also tested the model on the public data set Pascal visual object classes (VOC), which also yields outstanding results.
Insulators are important components of power transmission lines. Once a failure occurs, it may cause a large-scale blackout and other hidden dangers. Due to the large image size and complex background, detecting small defect objects is a challenge. We make improvements based on the two-stage network Faster R-convolutional neural networks (CNN). First, we use a hierarchical Swin Transformer with shifted windows as the feature extraction network, instead of ResNet, to extract more discriminative features, and then design the deformable receptive field block to encode global and local context information, which is utilized to capture key clues for detecting objects in complex backgrounds. Finally, the filling data augmentation method is proposed for the problem of insufficient defects and more images of insulator defects under different backgrounds are added to the training set to improve the robustness of the model. As a result, the recall increases from 89.5% to 92.1%, and the average precision increases from 81.0% to 87.1%. To further prove the superiority of the proposed algorithm, we also tested the model on the public data set Pascal visual object classes (VOC), which also yields outstanding results.
2023, vol. 20, no. 5,
pp. 741-752,
doi: 10.1007/s11633-022-1342-3
Abstract:
Inspired by eagle eye mechanisms, the structure and information processing characteristics of the eagle′s visual system are used for the target capture task of an unmanned aerial vehicle (UAV) with a mechanical arm. In this paper, a novel eagle-eye inspired multi-camera sensor and a saliency detection method are proposed. A combined camera system is built by simulating the double fovea structure on the eagle retina. A saliency target detection method based on the eagle midbrain inhibition mechanism is proposed by measuring the static saliency information and dynamic features. Thus, salient targets can be accurately detected through the collaborative work between different cameras of the proposed multi-camera sensor. Experimental results show that the eagle-eye inspired visual system is able to continuously detect targets in outdoor scenes and that the proposed algorithm has a strong inhibitory effect on moving background interference.
Inspired by eagle eye mechanisms, the structure and information processing characteristics of the eagle′s visual system are used for the target capture task of an unmanned aerial vehicle (UAV) with a mechanical arm. In this paper, a novel eagle-eye inspired multi-camera sensor and a saliency detection method are proposed. A combined camera system is built by simulating the double fovea structure on the eagle retina. A saliency target detection method based on the eagle midbrain inhibition mechanism is proposed by measuring the static saliency information and dynamic features. Thus, salient targets can be accurately detected through the collaborative work between different cameras of the proposed multi-camera sensor. Experimental results show that the eagle-eye inspired visual system is able to continuously detect targets in outdoor scenes and that the proposed algorithm has a strong inhibitory effect on moving background interference.
2015, vol. 12, no. 3,
pp. 229-242,
doi: 10.1007/s11633-015-0893-y
2017, vol. 14, no. 1,
pp. 10-20,
doi: 10.1007/s11633-016-1039-6
2017, vol. 14, no. 6,
pp. 672-685,
doi: 10.1007/s11633-016-1010-6
2015, vol. 12, no. 2,
pp. 134-141,
doi: 10.1007/s11633-015-0880-3
2015, vol. 12, no. 3,
pp. 337-342,
doi: 10.1007/s11633-014-0870-x
2016, vol. 13, no. 3,
pp. 199-225,
doi: 10.1007/s11633-016-1004-4
2015, vol. 12, no. 4,
pp. 343-367,
doi: 10.1007/s11633-015-0894-x
2015, vol. 12, no. 1,
pp. 70-76,
doi: 10.1007/s11633-014-0820-7
2016, vol. 13, no. 1,
pp. 1-18,
doi: 10.1007/s11633-015-0913-y
2015, vol. 12, no. 2,
pp. 156-170,
doi: 10.1007/s11633-014-0825-2
2015, vol. 12, no. 5,
pp. 511-517,
doi: 10.1007/s11633-014-0859-5
2015, vol. 12, no. 2,
pp. 117-124,
doi: 10.1007/s11633-015-0878-x
2015, vol. 12, no. 2,
pp. 192-198,
doi: 10.1007/s11633-014-0868-4
2016, vol. 13, no. 4,
pp. 382-391,
doi: 10.1007/s11633-015-0918-6
2015, vol. 12, no. 6,
pp. 579-587,
doi: 10.1007/s11633-015-0901-2
2015, vol. 12, no. 4,
pp. 440-447,
doi: 10.1007/s11633-014-0856-8
2015, vol. 12, no. 4,
pp. 368-381,
doi: 10.1007/s11633-015-0895-9
2015, vol. 12, no. 2,
pp. 149-155,
doi: 10.1007/s11633-015-0881-2
2015, vol. 12, no. 4,
pp. 393-401,
doi: 10.1007/s11633-014-0858-6
2015, vol. 12, no. 3,
pp. 243-253,
doi: 10.1007/s11633-015-0888-8
2015, vol. 12, no. 1,
pp. 43-49,
doi: 10.1007/s11633-014-0866-6