Citation: | Dong Wu, Man-Wen Liao, Wei-Tian Zhang, Xing-Gang Wang, Xiang Bai, Wen-Qing Cheng, Wen-Yu Liu. YOLOP: You Only Look Once for Panoptic Driving Perception. Machine Intelligence Research, vol. 19, no. 6, pp.550-562, 2022. https://doi.org/10.1007/s11633-022-1339-y |
[1] |
S. Q. Ren, K. M. He, R. Girshick, J. Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, Canada, vol. 1, pp. 91–99, 2015.
|
[2] |
A. Bochkovskiy, C. Y. Wang, H. Y. M. Liao. YOLOv4: Optimal speed and accuracy of object detection. DOI: 10.48550/arxiv.2004.10934, 2020.
|
[3] |
A. Paszke, A. Chaurasia, S. Kim, E. Culurciello. ENet: A deep neural network architecture for real-time semantic segmentation. DOI: 10.48550/arxiv.1606.02147, 2016.
|
[4] |
H. S. Zhao, J. P. Shi, X. J. Qi, X. G. Wang, J. Y. Jia. Pyramid scene parsing network. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Honolulu, USA, pp. 6230–6239, 2017. DOI: 10.1109/CVPR.2017.660.
|
[5] |
X. G. Pan, J. P. Shi, P. Luo, X. G. Wang, X. O. Tang. Spatial as deep: Spatial CNN for traffic scene understanding. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, USA, pp. 7276–7283, 2018. DOI: 10.1609/aaai.v32i1.12301.
|
[6] |
Y. N. Hou, Z. Ma, C. X. Liu, C. C. Loy. Learning lightweight lane detection CNNs by self attention distillation. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Seoul, Korea, pp. 1013–1021, 2019. DOI: 10.1109/ICCV.2019.00110.
|
[7] |
C. Y. Wang, A. Bochkovskiy, H. Y. M. Liao. Scaled-YOLOv4: Scaling cross stage partial network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Nashville, USA, pp. 13024–13033, 2021. DOI: 10.1109/CVPR46437.2021.01283.
|
[8] |
K. M. He, G. Gkioxari, P. Dollár, R. Girshick. Mask R-CNN. In Proceedings of IEEE International Conference on Computer Vision, IEEE, Venice, Italy, pp. 2980–2988, 2017. DOI: 10.1109/ICCV.2017.322.
|
[9] |
F. Yu, W. Q. Xian, Y. Y. Chen, F. C. Liu, M. K. Liao, V. Madhavan, T. Darrell. BDD100K: A diverse driving video database with scalable annotation tooling. DOI: 10.48550/arxiv.1805.04687, 2018.
|
[10] |
R. Girshick, J. Donahue, T. Darrell, J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Columbus, USA, pp. 580–587, 2014. DOI: 10.1109/CVPR.2014.81.
|
[11] |
R. Girshick. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, IEEE, Santiago, Chile, pp. 1440–1448, 2015. DOI: 10.1109/ICCV.2015.169.
|
[12] |
J. F. Dai, Y. Li, K. M. He, J. Sun. R-FCN: Object detection via region-based fully convolutional networks. In Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, pp. 379–387, 2016.
|
[13] |
W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, A. C. Berg. SSD: Single shot MultiBox detector. In Proceedings of the 14th European Conference on Computer Vision, Springer, Amsterdam, The Netherlands, pp. 21–37, 2016. DOI: 10.1007/978-3-319-46448-0_2.
|
[14] |
J. Redmon, S. Divvala, R. Girshick, A. Farhadi. You only look once: Unified, real-time object detection. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Las Vegas, USA, pp. 779–788, 2016. DOI: 10.1109/CVPR.2016.91.
|
[15] |
J. Redmon, A. Farhadi. YOLO9000: Better, faster, stronger. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Honolulu, USA, pp. 6517–6525, 2017. DOI: 10.1109/CVPR.2017.690.
|
[16] |
J. Redmon, A. Farhadi. YOLOv3: An incremental improvement. [Online], Availabe: https://arxiv.org/abs/1804.02767, 2018.
|
[17] |
J. Long, E. Shelhamer, T. Darrell. Fully convolutional networks for semantic segmentation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Boston, USA, pp. 3431–3440, 2015. DOI: 10.1109/CVPR.2015.7298965.
|
[18] |
H. Y. Han, Y. C. Chen, P. Y. Hsiao, L. C. Fu. Using channel-wise attention for deep CNN based real-time semantic segmentation with class-aware edge information. IEEE Transactions on Intelligent Transportation Systems, vol. 22, no. 2, pp. 1041–1051, 2021. DOI: 10.1109/TITS.2019.2962094.
|
[19] |
D. Neven, B. De Brabandere, S. Georgoulis, M. Proesmans, L. Van Gool. Towards end-to-end lane detection: An instance segmentation approach. In Proceedings of IEEE Intelligent Vehicles Symposium, IEEE, Changshu, China, pp. 286–291, 2018. DOI: 10.1109/IVS.2018.8500547.
|
[20] |
K. W. Duan, L. X. Xie, H. G. Qi, S. Bai, Q. M. Huang, Q. Tian. Location-sensitive visual recognition with cross-IOU loss. [Online], Available: https://arxiv.org/abs/2104.04899v1, 2021.
|
[21] |
M. Teichmann, M. Weber, M. Zöllner, R. Cipolla, R. Urtasun. MultiNet: Real-time joint semantic reasoning for autonomous driving. In Proceedings of IEEE Intelligent Vehicles Symposium, IEEE, Changshu, China, pp. 1013–1020, 2018. DOI: 10.1109/IVS.2018.8500504.
|
[22] |
Y. Q. Qian, J. M. Dolan, M. Yang. DLT-Net: Joint detection of drivable areas, lane lines, and traffic objects. IEEE Transactions on Intelligent Transportation Systems, vol. 21, no. 11, pp. 4670–4679, 2020. DOI: 10.1109/TITS.2019.2943777.
|
[23] |
J. Zhang, Y. Xu, B. B. Ni, Z. Y. Duan. Geometric constrained joint lane segmentation and lane boundary detection. In Proceedings of the 15th European Conference on Computer Vision, Springer, Munich, Germany, pp. 502–518, 2018. DOI: 10.1007/978-3-030-01246-5_30.
|
[24] |
Z. L. Kang, K. Grauman, F. Sha. Learning with whom to share in multi-task feature learning. In Proceedings of the 28th International Conference on International Conference on Machine Learning, Bellevue, USA, pp. 521–528, 2011.
|
[25] |
C. Y. Wang, H. Y. M. Liao, Y. H. Wu, P. Y. Chen, J. W. Hsieh, I. H. Yeh. CSPNet: A new backbone that can enhance learning capability of CNN. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, IEEE, Seattle, USA, pp. 1571–1580, 2020. DOI: 10.1109/CVPRW50498.2020.00203.
|
[26] |
K. M. He, X. Y. Zhang, S. Q. Ren, J. Sun. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 9, pp. 1904–1916, 2015. DOI: 10.1109/TPAMI.2015.2389824.
|
[27] |
T. Y. Lin, P. Dollár, R. Girshick, K. M. He, B. Hariharan, S. Belongie. Feature pyramid networks for object detection. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Honolulu, USA, pp. 936–944, 2017. DOI: 10.1109/CVPR.2017.106.
|
[28] |
S. Liu, L. Qi, H. F. Qin, J. P. Shi, J. Y. Jia. Path aggregation network for instance segmentation. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, USA, pp. 8759–8768, 2018. DOI: 10.1109/CVPR.2018.00913.
|
[29] |
T. Y. Lin, P. Goyal, R. Girshick, K. M. He, P. Dollár. Focal loss for dense object detection. In Proceedings of IEEE International Conference on Computer Vision, IEEE, Venice, Italy, pp. 2999–3007, 2017. DOI: 10.1109/ICCV.2017.324.
|
[30] |
Z. H. Zheng, P. Wang, W. Liu, J. Z. Li, R. G. Ye, D. W. Ren. Distance-IoU loss: Faster and better learning for bounding box regression. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, New York, USA, pp. 12993–13000, 2020. DOI: 10.1609/aaai.v34i07.6999.
|
[31] |
I. Loshchilov, F. Hutter. SGDR: Stochastic gradient descent with warm restarts. In Proceedings of the 5th International Conference on Learning Representations, Toulon, France, 2017.
|
[32] |
K. M. He, X. Y. Zhang, S. Q. Ren, J. Sun. Deep residual learning for image recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Las Vegas, USA, pp. 770–778, 2016. DOI: 10.1109/CVPR.2016.90.
|