Nan-Fei Jiang, Xu Zhao, Chao-Yang Zhao, Yong-Qi An, Ming Tang, Jin-Qiao Wang. Pruning-aware Sparse Regularization for Network Pruning. Machine Intelligence Research, vol. 20, no. 1, pp.109-120, 2023. https://doi.org/10.1007/s11633-022-1353-0
Citation: Nan-Fei Jiang, Xu Zhao, Chao-Yang Zhao, Yong-Qi An, Ming Tang, Jin-Qiao Wang. Pruning-aware Sparse Regularization for Network Pruning. Machine Intelligence Research, vol. 20, no. 1, pp.109-120, 2023. https://doi.org/10.1007/s11633-022-1353-0

Pruning-aware Sparse Regularization for Network Pruning

doi: 10.1007/s11633-022-1353-0
More Information
  • Author Bio:

    Nan-Fei Jiang received the B. Eng. degree in telecommunications engineering from Qingdao University, China in 2017. He is currently a master student in computer technology at School of Artificial Intelligence, University of Chinese Academy of Sciences, China.His research interests include network pruning, model compression, image and video processing.E-mail: nanfei.jiang@nlpr.ia.ac.cnORCID: 0000-0001-8919-884X

    Xu Zhao received the B. Eng. degree in software engineering from Dalian University of Technology, China in 2014, the Ph. D. degree in pattern recognition and intelligence systems from National Laboratory of Pattern Recognition, Chinese Academy of Sciences, China in 2019. He is currently an assistant professor with NationalLaboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, China.His research interests include object detection, scene text detection, image and video processing, and intelligent video surveillance.E-mail: xu.zhao@nlpr.ia.ac.cn (Corresponding author)ORCID: 0000-0001-5888-8872

    Chao-Yang Zhao received the B. Eng. degree in electronic information engineering and M. Sc. degree in circuit and system discipline from University of Electronic Science and Technology of China in 2009 and 2012 respectively, the Ph. D. degree in pattern recognition and intelligence systems from National Laboratory of Pattern Recognition, Chinese Academy of Sciences, China in 2016. He is currently an assistant professor with National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, China.His research interests include object detection, image and video processing and intelligent video surveillance.E-mail: chaoyang.zhao@nlpr.ia.ac.cn

    Yong-Qi An received the B. Eng. degree in automation from Beijing Institute of Technology, China in 2021. He is currently a master student in pattern recognition and intelligence systems at School of Artificial Intelligence, University of Chinese Academy of Sciences, China.His research interests include network pruning, knowledge distillation, model compression.E-mail: yongqi.an@nlpr.ia.ac.cn

    Ming Tang received the B. Eng. degree in computer science and engineering, the M. Sc. degree in artificial intelligence from Zhejiang University, China in 1984 and 1987, respectively, and the Ph. D. degree in pattern recognition and intelligent system from Chinese Academy of Sciences, China in 2002. He is currently a professor with National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, China.His research interests include computer vision and machine learning.E-mail: tangm@nlpr.ia.ac.cn

    Jin-Qiao Wang received the B. Eng. degree from Hebei University of Technology, China in 2001, and the M. Sc. degree from Tianjin University, China in 2004, the Ph.D. degree in pattern recognition and intelligence systems from the National Laboratory of Pattern Recognition, Chinese Academy of Sciences, China in 2008. He is currently a professor with Chinese Academy of Sciences, China.His research interests include pattern recognition and machine learning, image and video processing, mobile multimedia, and intelligent video surveillance.E-mail: jqwang@nlpr.ia.ac.cn

  • Received Date: 2022-03-30
  • Accepted Date: 2022-06-20
  • Publish Date: 2023-02-01
  • Structural neural network pruning aims to remove the redundant channels in the deep convolutional neural networks (CNNs) by pruning the filters of less importance to the final output accuracy. To reduce the degradation of performance after pruning, many methods utilize the loss with sparse regularization to produce structured sparsity. In this paper, we analyze these sparsity-training-based methods and find that the regularization of unpruned channels is unnecessary. Moreover, it restricts the network′s capacity, which leads to under-fitting. To solve this problem, we propose a novel pruning method, named MaskSparsity, with pruning-aware sparse regularization. MaskSparsity imposes the fine-grained sparse regularization on the specific filters selected by a pruning mask, rather than all the filters of the model. Before the fine-grained sparse regularization of MaskSparity, we can use many methods to get the pruning mask, such as running the global sparse regularization. MaskSparsity achieves a 63.03% float point operations (FLOPs) reduction on ResNet-110 by removing 60.34% of the parameters, with no top-1 accuracy loss on CIFAR-10. On ILSVRC-2012, MaskSparsity reduces more than 51.07% FLOPs on ResNet-50, with only a loss of 0.76% in the top-1 accuracy. The code of this paper is released at https://github.com/CASIA-IVA-Lab/MaskSparsity. We have also integrated the code into a self-developed PyTorch pruning toolkit, named EasyPruner, at https://gitee.com/casia_iva_engineer/easypruner.

     

  • 1 https://github.com/weiaicunzai/pytorch-cifar1002 https://github.com/facebookresearch/pycls
    https://github.com/facebookresearch/pycls
    3 https://github.com/ultralytics/yolov5
    These authors contribute equally to this work
  • loading
  • [1]
    O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. A. Ma, Z. H. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, L. Fei-Fei. ImageNet large scale visual recognition challenge. International Journal of Computer Vision, vol. 115, no. 3, pp. 211–252, 2015. DOI: 10.1007/s11263-015-0816-y.
    [2]
    T. Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, C. L. Zitnick. Microsoft COCO: Common objects in context. In Proceedings of the 13th European Conference on Computer Vision, Springer, Zurich, Switzerland, pp. 740–755, 2014. DOI: 10.1007/978-3-319-10602-1_48.
    [3]
    M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, B. Schiele. The cityscapes dataset for semantic urban scene understanding. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Vegas, USA, pp. 3213–3223, 2016. DOI: 10.1109/CVPR.2016.350.
    [4]
    K. M. He, X. Y. Zhang, S. Q. Ren, J. Sun. Deep residual learning for image recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Vegas, USA, pp. 770–778, 2016. DOI: 10.1109/CVPR.2016.90.
    [5]
    J. M. Alvarez, M. Salzmann. Learning the number of neurons in deep networks. In Proceedings of the 30th International Conference on Neural Information Processing Systems, ACM, Barcelona, Spain, pp. 2270–2278, 2016. DOI: 10.5555/3157096.3157350.
    [6]
    W. Wen, C. P. Wu, Y. D. Wang, Y. R. Chen, H. Li. Learning structured sparsity in deep neural networks. In Proceedings of the 30th International Conference on Neural Information Processing Systems, ACM, Barcelona, Spain, pp. 2082–2090, 2016. DOI: 10.5555/3157096.3157329.
    [7]
    Z. H. Huang, N. Y. Wang. Data-driven sparse structure selection for deep neural networks. In Proceedings of the 15th European Conference on Computer Vision, Springer, Munich, Germany, pp. 317–334, 2018. DOI: 10.1007/978-3-030-01270-0_19.
    [8]
    Z. Liu, J. G. Li, Z. Q. Shen, G. Huang, S. M. Yan, C. S. Zhang. Learning efficient convolutional networks through network slimming. In Proceedings of IEEE International Conference on Computer Vision, Venice, Italy, pp. 2755–2763, 2017. DOI: 10.1109/ICCV.2017.298.
    [9]
    J. M. Alvarez, M. Salzmann. Compression-aware training of deep networks. In Proceedings of the 31st International Conference on Neural Information Processing Systems, ACM, Long Beach, USA, pp. 856–867, 2017. DOI: 10.5555/3294771.329485.
    [10]
    Y. W. Li, S. H. Gu, C. Mayer, L. Van Gool, R. Timofte. Group sparsity: The hinge between filter pruning and decomposition for network compression. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp. 8015–8024, 2020. DOI: 10.1109/CVPR42600.2020.00804.
    [11]
    S. Srinivas, A. Subramanya, R. V. Babu. Training sparse neural networks. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, USA, pp. 455–462, 2017. DOI: 10.1109/CVPRW.2017.61.
    [12]
    Y. Ye, G. M. You, J. K. Fwu, X. Zhu, Q. Yang, Y. Zhu. Channel pruning via optimal thresholding. In Proceedings of the 27th International Conference on Neural Information Processing, Springer, Bangkok, Thailand, pp. 508–516, 2020. DOI: 10.1007/978-3-030-63823-8_58.
    [13]
    K. Zhao, X. Y. Zhang, Q. Han, M. M. Cheng. Dependency aware filter pruning. [Online], Available: https://arxiv.org/abs/2005.02634, 2020.
    [14]
    Y. Yamada, O. Lindenbaum, S. N. Negahban, Y. Kluger. 2020. Feature selection using stochastic gates. In Proceedings of the 37th International Conference on Machine Learning, pp. 10648–10659, 2020.
    [15]
    C. Louizos, M. Welling, D. P. Kingma. Learning sparse neural networks through L0 regularization. [Online], Available: https://arxiv.org/abs/1712.01312, 2018.
    [16]
    H. Li, A. Kadav, I. Durdanovic, H. Samet, H. P. Graf. Pruning filters for efficient ConvNets. In Proceedings of the 5th International Conference on Learning Representations, Toulon, France, 2017.
    [17]
    P. T. Fletcher, Suresh Venkatasubramanian, S. Joshi. Robust statistics on Riemannian manifolds via the geometric median. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, USA, 2008. DOI: 10.1109/CVPR.2008.4587747.
    [18]
    Y. He, P. Liu, Z. W. Wang, Z. L. Hu, Y. Yang. Filter pruning via geometric median for deep convolutional neural networks acceleration. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Long Beach, USA, pp. 4335–4344, 2019. DOI: 10.1109/CVPR.2019.00447.
    [19]
    M. B. Lin, R. R. Ji, Y. Wang, Y. C. Zhang, B. C. Zhang, Y. H. Tian, L. Shao. HRank: Filter pruning using high-rank feature map. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, pp. 1526–1535, 2020. DOI: 10.1109/CVPR42600.2020.00160.
    [20]
    S. Ioffe, C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on Machine Learning, ACM, Lille, France, pp. 448–456, 2015. DOI: 10.5555/3045118.3045167.
    [21]
    S. Han, J. Pool, J. Tran, W. J. Dally. Learning both weights and connections for efficient neural network. In Proceedings of the 28th International Conference on Neural Information Processing Systems, ACM, Montreal, Canada, pp. 1135–1143, 2015. DOI: 10.5555/2969239.2969366.
    [22]
    E. Tartaglione, S. Lepsøy, A. Fiandrotti, G. Francini. Learning sparse neural networks via sensitivity-driven regularization. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, ACM, Montréal, Canada, pp. 3882–3892, 2018. DOI: 10.5555/3327144.3327303.
    [23]
    P. Goyal, P. Dollár, R. Girshick, P. Noordhuis, L. Wesolowski, A. Kyrola, A. Tulloch, Y. Q. Jia, K. M. He. Accurate, large minibatch SGD: Training imageNet in 1 hour. [Online], Available: https://arxiv.org/abs/1706.02677, 2017.
    [24]
    A. Krizhevsky. Learning Multiple Layers of Features From Tiny Images, Technical Report TR-2009, University of Toronto, Toronto, Canada, 2009.
    [25]
    K. M. He, X. Y. Zhang, S. R. Ren, J. Sun. Identity mappings in deep residual networks. In Proceedings of the 14th European Conference on Computer Vision, Springer, Amsterdam, The Netherlands, pp. 630–645, 2016. DOI: 10.1007/978-3-319-46493-0_38.
    [26]
    Y. He, G. L. Kang, X. Y. Dong, Y. W. Fu, Y. Yang. Soft filter pruning for accelerating deep convolutional neural networks. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, ACM, Stockholm, Sweden, pp. 2234–2240, 2018. DOI: 10.5555/3304889.3304970.
    [27]
    S. H. Lin, R. R. Ji, C. Q. Yan, B. C. Zhang, L. J. Cao, Q. X. Ye, F. Y. Huang, D. S. Doermann. Towards optimal structured CNN pruning via generative adversarial learning. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Long Beach, USA, pp. 2785–2794, 2019. DOI: 10.1109/CVPR.2019.00290.
    [28]
    X. F. Xu, M. S. Park, C. Brick. Hybrid pruning: Thinner sparse networks for fast inference on edge devices. [Online], Available: https://arxiv.org/abs/1811.00482, 2018.
    [29]
    Z. C. Liu, H. Y. Mu, X. Y. Zhang, Z. C. Guo, X. Yang, K. T. Cheng, J. Sun. MetaPruning: Meta learning for automatic neural network channel pruning. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Seoul, Republic of Korea, pp. 3295–3304, 2019. DOI: 10.1109/ICCV.2019.00339.
    [30]
    J. H. Luo, J. X. Wu. AutoPruner: An end-to-end trainable filter pruning method for efficient deep model inference. Pattern Recognition, vol. 107, Article number 107461, 2020. DOI: 10.1016/j.patcog.2020.107461.
    [31]
    Z. W. Zhuang, M. K. Tan, B. H. Zhuang, J. Liu, Y. Guo, Q. Y. Wu, J. Z. Huang, J. H. Zhu. Discrimination-aware channel pruning for deep neural networks. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, ACM, Montréal, Canada, pp. 883–894, 2018. DOI: 10.5555/3326943.3327025.
    [32]
    J. H. Luo, H. Zhang, H. Y. Zhou, C. W. Xie, J. X. Wu, W. Y. Lin. ThiNet: Pruning CNN filters for a thinner net. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 10, pp. 2525–2538, 2019. DOI: 10.1109/TPAMI.2018.2858232.
    [33]
    B. L. Li, B. W. Wu, J. Su, G. R. Wang, L. Lin. EagleEye: Fast sub-net evaluation for efficient neural network pruning. In Proceedings of the 16th European Conference on Computer Vision, Springer, Glasgow, UK, pp. 639–654, 2020. DOI: 10.1007/978-3-030-58536-5_38.
    [34]
    Y. H. Tang, Y. H. Wang, Y. X. Xu, D. C. Tao, C. J. Xu, C. Xu, C. Xu. SCOP: Scientific control for reliable neural network pruning. In Proceedings of the 34th International Conference on Neural Information Processing Systems, 2020.
    [35]
    Y. Sui, M. Yin, Y. Xie, H. Phan, S. A. Zonouz, B. Yuan. CHIP: CHannel independence-based pruning for compact neural networks. In Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems, pp. 24604–24616, 2021.
    [36]
    R. C. Yu, A. Li, C. F. Chen, J. H. Lai, V. I. Morariu, X. T. Han, M. F. Gao, C. Y. Lin, L. S. Davis. NISP: Pruning networks using neuron importance score propagation. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, USA. pp. 9194–9203, 2018. DOI: 10.1109/CVPR.2018.00958.
    [37]
    Y. H. He, J. Lin, Z. J. Liu, H. R. Wang, L. J. Li, S. Han. AMC: AutoML for model compression and acceleration on mobile devices. In Proceedings of the 15th European Conference on Computer Vision, Springer, Munich, Germany, pp. 815–832, 2018. DOI: 10.1007/978-3-030-01234-2_48.
    [38]
    T. W. Chin, R. Z. Ding, C. Zhang, D. Marculescu. Towards efficient model compression via learned global ranking. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp. 1515–1525, 2020. DOI: 10.1109/CVPR42600.2020.00159.
    [39]
    Y. He, Y. H. Ding, P. Liu, L. C. Zhu, H. W. Zhang, Y. Yang. Learning filter pruning criteria for deep convolutional neural networks acceleration. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp. 2006–2015, 2020. DOI: 10.1109/CVPR42600.2020.00208.
    [40]
    J. Shi, J. F. Xu, K. Tasaka, Z. B. Chen. SASL: Saliency-adaptive sparsity learning for neural network acceleration. IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 5, pp. 2008–2019, 2021. DOI: 10.1109/TCSVT.2020.3013170.
    [41]
    S. Yang, P. Luo, C. C. Loy, X. O. Tang. WIDER FACE: A face detection benchmark. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, pp. 5525–5533, 2016. DOI: 10.1109/CVPR.2016.596.
    [42]
    T. Y. Lin, P. Dollár, R. Girshick, K. M. He, B. Hariharan, S. Belongie. Feature pyramid networks for object detection. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, pp. 936–944, 2017. DOI: 10.1109/CVPR.2017.106.
    [43]
    M. J. Zhu, Y. H. Tang, K. Han. Vision transformer pruning. [Online], Available: https://arxiv.org/abs/2014.08500, 2021.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(5)  / Tables(10)

    用微信扫码二维码

    分享至好友和朋友圈

    Article Metrics

    Article views (537) PDF downloads(29) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return