Citation: | Wei-Chen Chen, Xin-Yi Yu, Lin-Lin Ou. Pedestrian Attribute Recognition in Video Surveillance Scenarios Based on View-attribute Attention Localization. Machine Intelligence Research, vol. 19, no. 2, pp.153-168, 2022. https://doi.org/10.1007/s11633-022-1321-8 |
[1] |
P. Sudowe, H. Spitzer, B. Leibe. Person attribute recognition with a jointly-trained holistic CNN model. In Proceedings of IEEE International Conference on Computer Vision Workshop, IEEE, Santiago, Chile, pp. 329−337, 2015. DOI: 10.1109/ICCVW.2015.51.
|
[2] |
D. W. Li, X. T. Chen, Z. Zhang, K. Q. Huang. Pose guided deep model for pedestrian attribute recognition in surveillance scenarios. In Proceedings of IEEE International Conference on Multimedia and Expo, IEEE, San Diego, USA, pp. 1−6, 2018. DOI: 10.1109/ICME.2018.8486604.
|
[3] |
L. Bourdev, S. Maji, J. Malik. Describing people: A poselet-based approach to attribute classification. In Proceedings of International Conference on Computer Vision, IEEE, Barcelona, Spain, pp. 1543−1550, 2011. DOI: 10.1109/ICCV.2011.6126413.
|
[4] |
P. Z. Liu, X. H. Liu, J. J. Yan, J. Shao. Localization guided learning for pedestrian attribute recognition, [Online], Available: https://arxiv.org/abs/1808.09102, 2018.
|
[5] |
D. W. Li, Z. Zhang, X. T. Chen, K. Q. Huang. A richly annotated pedestrian dataset for person retrieval in real surveillance scenarios. IEEE Transactions on Image Processing, vol. 28, no. 4, pp. 1575–1590, 2019. DOI: 10.1109/TIP.2018.2878349.
|
[6] |
N. Sarafianos, X. Xu, I. A. Kakadiaris. Deep imbalanced attribute classification using visual attention aggregation. In Proceedings of the 15th European Conference on Computer Vision, Springer, Munich, Germany, pp. 708−725, 2018. DOI: 10.1007/978-3-030-01252-6_42.
|
[7] |
E. Yaghoubi, D. Borza, J. Neves, A. Kumar, H. Proença. An attention-based deep learning model for multiple pedestrian attributes recognition. Image and Vision Computing, vol. 102, Article number 103981, 2020. DOI: 10.1016/j.imavis.2020.103981.
|
[8] |
M. D. Wu, D. Huang, Y. F. Guo, Y. H. Wang. Distraction-aware feature learning for human attribute recognition via coarse-to-fine attention mechanism. In Proceedings of AAAI Conference on Artificial Intelligence, vol. 34, no. 7, pp. 12394−12401, 2020. DOI: 10.1609/aaai.v34i07.6925.
|
[9] |
J. Hu, L. Shen, G. Sun. Squeeze-and-excitation networks. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, USA, pp. 7132−7141, 2018. DOI: 10.1109/CVPR.2018.00745.
|
[10] |
S. Woo, J. Park, J. Y. Lee, I. S. Kweon. CBAM: Convolutional block attention module. In Proceedings of the 15th European Conference on Computer Vision, Springer, Munich, Germany, pp. 3−19, 2018. DOI: 10.1007/978-3-030-01234-2_1.
|
[11] |
K. M. He, X. Y. Zhang, S. Q. Ren, J. Sun. Deep residual learning for image recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Las Vegas, USA, pp. 770−778, 2016. DOI: 10.1109/CVPR.2016.90.
|
[12] |
D. W. Li, Z. Zhang, X. T. Chen, H. B. Ling, K. Q. Huang. A richly annotated dataset for pedestrian attribute recognition, [Online], Available: https://arxiv.org/abs/1603.07054, 2016.
|
[13] |
X. H. Liu, H. Y. Zhao, M. Q. Tian, L. Sheng, J. Shao, S. Yi, J. J. Yan, X. G. Wang. HydraPlus-Net: Attentive deep features for pedestrian analysis. In Proceedings of IEEE International Conference on Computer Vision, IEEE, Venice, Italy, pp. 350−359, 2017. DOI: 10.1109/ICCV.2017.46.
|
[14] |
C. Su, S. L. Zhang, J. L. Xing, W. Gao, Q. Tian. Deep attributes driven multi-camera person re-identification. In Proceedings of the 14th European Conference on Computer Vision, Springer, Amsterdam, The Netherlands, pp. 475−491, 2016. DOI: 10.1007/978-3-319-46475-6_30.
|
[15] |
Y. T. Lin, L. Zheng, Z. D. Zheng, Y. Wu, Z. L. Hu, C. G. Yan, Y. Yang. Improving person re-identification by attribute and identity learning. Pattern Recognition, vol. 95, pp. 151–161, 2019. DOI: 10.1016/j.patcog.2019.06.006.
|
[16] |
Z. D. Zheng, X. D. Yang, Z. D. Yu, L. Zheng, Y. Yang, J. Kautz. Joint discriminative and generative learning for person re-identification. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Long Beach, USA, pp. 2133−2142, 2019. DOI: 10.1109/CVPR.2019.00224.
|
[17] |
Y. L. Tian, P. Luo, X. G. Wang, X. O. Tang. Pedestrian detection aided by deep learning semantic tasks. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Boston, USA, pp. 5079−5087, 2015. DOI: 10.1109/CVPR.2015.7299143.
|
[18] |
X. B. Liu, Y. L. Xu, L. Zhu, Y. D. Mu. A stochastic attribute grammar for robust cross-view human tracking. IEEE Transactions on Circuits and Systems for Video Technology, vol. 28, no. 10, pp. 2884–2895, 2018. DOI: 10.1109/TCSVT.2017.2781738.
|
[19] |
X. W. Wang, T. Zhang, D. R. Tretter, Q. Lin. Personal clothing retrieval on photo collections by color and attributes. IEEE Transactions on Multimedia, vol. 15, no. 8, pp. 2035–2045, 2013. DOI: 10.1109/TMM.2013.2279658.
|
[20] |
R. Feris, R. Bobbitt, L. Brown, S. Pankanti. Attribute-based people search: Lessons learnt from a practical surveillance system. In Proceedings of International Conference on Multimedia Retrieval, ACM, Glasgow, UK, pp. 153−160, 2014. DOI: 10.1145/2578726.2578732.
|
[21] |
J. E. Liu, B. Kuipers, S. Savarese. Recognizing human actions by attributes. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Colorado Springs, USA, pp. 3337−3344, 2011. DOI: 10.1109/CVPR.2011.5995353.
|
[22] |
X. F. Ji, Q. Q. Wu, Z. J. Ju, Y. Y. Wang. Study of human action recognition based on improved spatio-temporal features. International Journal of Automation and Computing, vol. 11, no. 5, pp. 500–509, 2014. DOI: 10.1007/s11633-014-0831-4.
|
[23] |
L. F. Wu, Q. Wang, M. Jian, Y. Qiao, B. X. Zhao. A comprehensive review of group activity recognition in videos. International Journal of Automation and Computing, vol. 18, no. 3, pp. 334–350, 2021. DOI: 10.1007/s11633-020-1258-8.
|
[24] |
Z. W. Xu, X. J. Wu, J. Kittler. STRNet: Triple-stream spatiotemporal relation network for action recognition. International Journal of Automation and Computing, vol. 18, no. 5, pp. 718–730, 2021. DOI: 10.1007/s11633-021-1289-9.
|
[25] |
M. Fayyaz, J. Gall. SCT: Set constrained temporal transformer for set supervised action segmentation. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp. 498−507, 2020. DOI: 10.1109/CVPR42600.2020.00058.
|
[26] |
J. Li, S. Todorovic. Set-constrained viterbi for set-supervised action segmentation. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp. 10817−10826, 2020. DOI: 10.1109/CVPR42600.2020.01083.
|
[27] |
Y. F. Huang, Y. Sugano, Y. Sato. Improving action segmentation via graph-based temporal reasoning. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp. 14021−14031, 2020. DOI: 10.1109/CVPR42600.2020.01404.
|
[28] |
J. Chen, Z. H. Li, J. B. Luo, C. L. Xu. Learning a weakly-supervised video actor-action segmentation model with a wise selection. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp. 9898−9908, 2020. DOI: 10.1109/CVPR42600.2020.00992.
|
[29] |
N. Dalal, B. Triggs. Histograms of oriented gradients for human detection. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, San Diego, USA, pp. 886−893, 2005. DOI: 10.1109/CVPR.2005.177.
|
[30] |
R. Layne, T. Hospedales, S. G. Gong. Person re-identification by attributes. In Proceedings of British Machine Vision Conference, Surrey, UK, Article number 24, 2012. DOI: 10.5244/C.26.24.
|
[31] |
D. W. Li, X. T. Chen, K. Q. Huang. Multi-attribute learning for pedestrian attribute recognition in surveillance scenarios. In Proceedings of the 3rd IAPR Asian Conference on Pattern Recognition, IEEE, Kuala Lumpur, Malaysia, pp. 111−115, 2015. DOI: 10.1109/ACPR.2015.7486476.
|
[32] |
J. J. Zhang, P. Y. Ren, J. M. Li. Deep template matching for pedestrian attribute recognition with the auxiliary supervision of attribute-wise keypoints, [Online], Available: https://arxiv.org/abs/2011.06798, November 13, 2020.
|
[33] |
J. Y. Wang, X. T. Zhu, S. G. Gong, W. Li. Attribute recognition by joint recurrent learning of context and correlation. In Proceedings of IEEE International Conference on Computer Vision, IEEE, Venice, Italy, pp. 531−540, 2017. DOI: 10.1109/ICCV.2017.65.
|
[34] |
X. Zhao, L. F. Sang, G. G. Ding, Y. C. Guo, X. M. Jin. Grouping attribute recognition for pedestrian with joint recurrent learning. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, pp. 3177−3183, 2018. DOI: 10.24963/ijcai.2018/441.
|
[35] |
C. F. Tang, L. Sheng, Z. X. Zhang, X. L. Hu. Improving pedestrian attribute recognition with weakly-supervised multi-scale attribute-specific localization. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Seoul, Korea, pp. 4996−5005, 2019. DOI: 10.1109/ICCV.2019.00510.
|
[36] |
C. L. Zitnick, P. Dollár. Edge boxes: Locating object proposals from edges. In Proceedings of the 13th European Conference on Computer Vision, Springer, Zurich, Switzerland, pp. 391−405, 2014. DOI: 10.1007/978-3-319-10602-1_26.
|
[37] |
Z. X. Feng, J. H. Lai, X. H. Xie. Learning view-specific deep networks for person re-identification. IEEE Transactions on Image Processing, vol. 27, no. 7, pp. 3472–3483, 2018. DOI: 10.1109/TIP.2018.2818438.
|
[38] |
S. S. Farfade, M. J. Saberian, L. J. Li. Multi-view face detection using deep convolutional neural networks. In Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, ACM, Shanghai, China, pp. 643−650, 2015. DOI: 10.1145/2671188.2749408.
|
[39] |
H. Sadr, M. M. Pedram, M. Teshnehlab. Multi-view deep network: A deep model based on learning features from heterogeneous neural networks for sentiment analysis. IEEE Access, vol. 8, pp. 86984–86997, 2020. DOI: 10.1109/ACCESS.2020.2992063.
|
[40] |
F. Zhu, H. S. Li, W. L. Ouyang, N. H. Yu, X. G. Wang. Learning spatial regularization with image-level supervisions for multi-label image classification. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Honolulu, USA, pp. 2027−2036, 2017. DOI: 10.1109/CVPR.2017.219.
|
[41] |
Z. C. Tan, Y. Yang, J. Wan, H. Y. Hang, G. D. Guo, S. Z. Li. Attention-based pedestrian attribute analysis. IEEE Transactions on Image Processing, vol. 28, no. 12, pp. 6126–6140, 2019. DOI: 10.1109/TIP.2019.2919199.
|
[42] |
C. Szegedy, S. Ioffe, V. Vanhoucke, A. A. Alemi. Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, San Francisco, USA, pp. 4278−4284, 2017.
|
[43] |
C. Szegedy, W. Liu, Y. Q. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich. Going deeper with convolutions. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Boston, USA, pp. 1−9, 2015. DOI: 10.1109/CVPR.2015.7298594.
|
[44] |
S. Ioffe, C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on International Conference on Machine Learning, Lille, France, pp. 448−456, 2015.
|
[45] |
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna. Rethinking the inception architecture for computer vision. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Las Vegas, USA, pp. 2818−2826, 2016. DOI: 10.1109/CVPR.2016.308.
|
[46] |
H. Cai, C. Gan, T. Z. Wang, Z. K. Zhang, S. Han. Once-for-all: Train one network and specialize it for efficient deployment, [Online], Available: https://arxiv.org/abs/1908.09791, 2019.
|
[47] |
A. Howard, M. Sandler, B. Chen, W. J. Wang, L. C. Chen, M. X. Tan, G. Chu, V. Vasudevan, Y. K. Zhu, R. M. Pang, H, Adam, Q. Le. Searching for MobileNetV3. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Seoul, Korea, pp. 1314−1324, 2019. DOI: 10.1109/ICCV.2019.00140.
|
[48] |
Y. B. Deng, P. Luo, C. C. Loy, X. O. Tang. Pedestrian attribute recognition at far distance. In Proceedings of the 22nd ACM International Conference on Multimedia, ACM, Lisboa, Portugal, pp. 789−792, 2014. DOI: 10.1145/2647868.2654966.
|
[49] |
M. S. Sarfraz, A. Schumann, Y. Wang, R. Stiefelhagen. Deep view-sensitive pedestrian attribute inference in an end-to-end model, [Online], Available: https://arxiv.org/abs/1707.06089, 2017.
|
[50] |
H. Guo, K. Zheng, X. C. Fan, H. K. Yu, S. Wang. Visual attention consistency under image transforms for multi-label image classification. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Long Beach, USA, pp. 729−739, 2019. DOI: 10.1109/CVPR.2019.00082.
|
[51] |
J. Jia, H. J. Huang, W. J. Yang, X. T. Chen, K. Q. Huang. Rethinking of pedestrian attribute recognition: Realistic datasets with efficient method, [Online], Available: https://arxiv.org/abs/2005.11909, 2020.
|
[52] |
H. T. Zeng, H. Z. Ai, Z. J. Zhuang, L. Chen. Multi-task learning via co-attentive sharing for pedestrian attribute recognition. In Proceedings of IEEE International Conference on Multimedia and Expo, IEEE, London, UK, pp. 1−6, 2020. DOI: 10.1109/ICME46284.2020.9102757.
|
[53] |
X. Y. Yu, W. C. Chen, Y. F. Jin, L. L. Ou. Pedestrian View-attribute Location and Recognition Method in Video Surveillance Scene Based on Attention Mechanism, CN113361336A, 2021. (in Chinese)
|