Yuchi Liu, Zhongdao Wang, Xiangxin Zhou, Liang Zheng. A Study of Using Synthetic Data for Effective Association Knowledge Learning. Machine Intelligence Research, vol. 20, no. 2, pp.194-206, 2023. https://doi.org/10.1007/s11633-022-1380-x
Citation: Yuchi Liu, Zhongdao Wang, Xiangxin Zhou, Liang Zheng. A Study of Using Synthetic Data for Effective Association Knowledge Learning. Machine Intelligence Research, vol. 20, no. 2, pp.194-206, 2023. https://doi.org/10.1007/s11633-022-1380-x

A Study of Using Synthetic Data for Effective Association Knowledge Learning

doi: 10.1007/s11633-022-1380-x
More Information
  • Author Bio:

    Yuchi Liu received the B.Eng. degree in software engineering from Australian National University, Australia in 2018. He is currently a Ph. D. degree candidate in computer science at Australian National University, Australia. His research interests include video object tracking, learning from synthetic data, and weakly supervised learning.E-mail: yuchi.liu@anu.edu.auORCID iD: 0000-0001-9061-6180

    Zhongdao Wang received the B. Sc. degree in physics from Department of Physics Tsinghua University, China in 2017. He is currently a Ph. D. degree candidate in electronic engineering at Department of Electronic Engineering, Tsinghua University, China. His research interests include perception algorithms for autonomous driving, including but not limited to 3D object detection/tracking, network architecture/learning algorithm/pre-training for multi-modal fusion, and 4D Auto-labeling. E-mail: wcd17@mails.tsinghua.edu.cnORCID iD: 0000-0002-4483-8783

    Xiangxin Zhou received the B. Sc. degree in electronic engineering from Department of Electronic Engineering, Tsinghua University, China in 2021. He is currently a Ph. D. degree candidate in artificial intelligence at School of Artificial Intelligence, University of Chinese Academy of Sciences (UCAS), China, and Institute of Automation, Chinese Academy of Sciences (CASIA), China. His research interests include geometric deep learning, graph neural networks for drug design, causal inference, and multimodal machine learning. E-mail: xx-zhou16@mails.tsinghua.edu.cn ORCID iD: 0000-0002-1526-0548

    Liang Zheng received the B. Eng. degree in life science from Tsinghua University, China in 2010, and the Ph. D. degree in electronic engineering from Tsinghua University, China in 2015. He is a lecturer and a computer science futures fellowship in School of Computer Science, Australian National University, Australia. His research interests include computer vision, machine learning, object re-identification and dataset-centered vision. E-mail: liang.zheng@anu.edu.au (Corresponding author) ORCID iD: 0000-0002-1464-9500

  • Received Date: 2022-07-19
  • Accepted Date: 2022-10-13
  • Publish Online: 2023-03-08
  • Publish Date: 2023-04-01
  • Association, aiming to link bounding boxes of the same identity in a video sequence, is a central component in multi-object tracking (MOT). To train association modules, e.g., parametric networks, real video data are usually used. However, annotating person tracks in consecutive video frames is expensive, and such real data, due to its inflexibility, offer us limited opportunities to evaluate the system performance w.r.t. changing tracking scenarios. In this paper, we study whether 3D synthetic data can replace real-world videos for association training. Specifically, we introduce a large-scale synthetic data engine named MOTX, where the motion characteristics of cameras and objects are manually configured to be similar to those of real-world datasets. We show that, compared with real data, association knowledge obtained from synthetic data can achieve very similar performance on real-world test sets without domain adaption techniques. Our intriguing observation is credited to two factors. First and foremost, 3D engines can well simulate motion factors such as camera movement, camera view, and object movement so that the simulated videos can provide association modules with effective motion features. Second, the experimental results show that the appearance domain gap hardly harms the learning of association knowledge. In addition, the strong customization ability of MOTX allows us to quantitatively assess the impact of motion factors on MOT, which brings new insights to the community.

     

  • 1 Our manual parameter tuning is very efficient: A rough estimation of the motion parameters will be enough.
    2 This engine is publicly released at https://github.com/liuyvchi/MOTX.
    3 http://ai.bu.edu/visda-2020/
  • loading
  • [1]
    G. Brasó, L. Leal-Taixé. Learning a neural solver for multiple object tracking. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp. 6246–6256, 2020. DOI: 10.1109/CVPR42600.2020.00628.
    [2]
    Y. H. Xu, A. Ŝep, Y. T. Ban, R. Horaud, L. Leal-Taixé, X. Alameda-Pineda. How to train your deep multi-object tracker. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp. 6786–6795, 2020. DOI: 10.1109/CVPR42600.2020.00682.
    [3]
    L. Leal-Taixé, A. Milan, I. Reid, S. Roth, K. Schindler. MOTChallenge 2015: Towards a benchmark for multi-target tracking. [Online], Available: https://arxiv.org/abs/1504.01942, 2015.
    [4]
    A. Milan, L. Leal-Taixé, I. Reid, S. Roth, K. Schindler. MOT16: A benchmark for multi-object tracking. [Online], Available: https://arxiv.org/abs/1603.00831, 2016.
    [5]
    S. Bąk, P. Carr, J. F. Lalonde. Domain adaptation through synthesis for unsupervised person re-identification. In Proceedings of the 15th European Conference on Computer Vision, Springer, Munich, Germany, pp. 193–209, 2018. DOI: 10.1007/978-3-030-01261-8_12.
    [6]
    H. Z. Dou, W. H. Zhang, P. Z. Zhang, Y. H. Zhao, S. Y. Li, Z. Q. Qin, F. Wu, L. Dong, X. Li. VersatileGait: A large-scale synthetic gait dataset with fine-grained attributes and complicated scenarios. [Online], Available: https://arxiv.org/abs/2101.01394, 2021.
    [7]
    Z. F. Xue, W. J. Mao, L. Zheng. Learning to simulate complex scenes. [Online], Available: https://arxiv.org/abs/2006.14611, 2020.
    [8]
    Y. Yao, L. Zheng, X. D. Yang, M. Naphade, T. Gedeon. Simulating content consistent vehicle datasets with attribute descent. In Proceedings of the 16th European Conference on Computer Vision, Springer, Glasgow, UK, pp. 775–791, 2020. DOI: 10.1007/978-3-030-58539-6_46.
    [9]
    J. H. Li, X. Gao, T. T. Jiang. Graph networks for multiple object tracking. In Proceedings of IEEE Winter Conference on Applications of Computer Vision, Snowmass, USA, pp. 708–717, 2020. DOI: 10.1109/WACV45572.2020.9093347.
    [10]
    Z. D. Wang, L. Zheng, Y. X. Liu, Y. L. Li, S. J. Wang. Towards real-time multi-object tracking. In Proceedings of the 16th European Conference on Computer Vision, Springer, Glasgow, UK, pp. 107–122, 2020. DOI: 10.1007/978-3-030-58621-8_7.
    [11]
    N. Wojke, A. Bewley, D. Paulus. Simple online and realtime tracking with a deep association metric. In Proceedings of IEEE International Conference on Image Processing, Beijing, China, pp. 3645–3649, 2017. DOI: 10.1109/ICIP.2017.8296962.
    [12]
    Y. F. Zhan, C. Y. Wang, X. G. Wang, W. J. Zeng, W. Y. Liu. A simple baseline for multi-object tracking. [Online], Available: https://arxiv.org/abs/2004.01888v1, 2020.
    [13]
    Z. W. Zhou, J. L. Xing, M. D. Zhang, W. M. Hu. Online multi-target tracking with tensor-based high-order graph matching. In Proceedings of the 24th International Conference on Pattern Recognition, IEEE, Beijing, China, pp. 1809–1814, 2018. DOI: 10.1109/ICPR.2018.8545450.
    [14]
    J. Zhu, H. Yang, N. Liu, M. Kim, W. J. Zhang, M. H. Yang. Online multi-object tracking with dual matching attention networks. In Proceedings of the 15th European Conference on Computer Vision, Springer, Munich, Germany, pp. 379–396, 2018. DOI: 10.1007/978-3-030-01228-1_23.
    [15]
    Q. C. Wang, Y. H. Gong, C. H. Yang, C. H. Li. Robust object tracking under appearance change conditions. International Journal of Automation and Computing, vol. 7, no. 1, pp. 31–38, 2010. DOI: 10.1007/s11633-010-0031-9.
    [16]
    H. W. Kuhn. The Hungarian method for the assignment problem. Naval Research Logistics Quarterly, vol. 2, no. 1–2, pp. 83–97, 1955. DOI: 10.1002/nav.3800020109.
    [17]
    G. Welch, G. Bishop. An Introduction to the Kalman Filter. University of North Carolina at Chapel Hill, Chapel Hill, USA, 1995.
    [18]
    I. Papakis, A. Sarkar, A. Karpatne. GCNNMatch: Graph convolutional neural networks for multi-object tracking via Sinkhorn normalization. [Online], Available: https://arxiv.org/abs/2010.00067, 2020.
    [19]
    X. C. Peng, B. Usman, N. Kaushik, J. Hoffman, D. Q. Wang, K. Saenko. VisDA: The visual domain adaptation challenge. [Online], Available: https://arxiv.org/abs/1710.06924, 2017.
    [20]
    X. C. Peng, B. Usman, N. Kaushik, D. Q. Wang, J. Hoffman, K. Saenko. VisDA: A synthetic-to-real benchmark for visual domain adaptation. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, IEEE, Salt Lake City, USA, pp. 2021–2026, 2018. DOI: 10.1109/CVPRW.2018.00271.
    [21]
    Y. Cabon, N. Murray, M. Humenberger. Virtual KITTI 2. [Online], Available: https://arxiv.org/abs/2001.10773, 2020.
    [22]
    A. Gaidon, Q. Wang, Y. Cabon, E. Vig. Virtual Worlds as proxy for multi-object tracking analysis. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, pp. 4340–4349, 2016. DOI: 10.1109/CVPR.2016.470.
    [23]
    Y. Z. Hou, L. Zheng, S. Gould. Multiview detection with feature perspective transformation. In Proceedings of the 16th European Conference on Computer Vision, Springer, Glasgow, UK, pp. 1–18, 2020. DOI: 10.1007/978-3-030-58571-6_1.
    [24]
    M. Fabbri, F. Lanzi, S. Calderara, A. Palazzi, R. Vezzani, R. Cucchiara. Learning to detect and track visible and occluded body joints in a virtual world. In Proceedings of the 15th European Conference on Computer Vision, Springer, Munich, Germany, pp. 450–456, 2018. DOI: 10.1007/978-3-030-01225-0_27.
    [25]
    S. Sankaranarayanan, Y. Balaji, A. Jain, S. Nam Lim, R. Chellappa. Learning from synthetic data: Addressing domain shift for semantic segmentation. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, USA, pp. 3752–3761, 2018. DOI: 10.1109/CVPR.2018.00395.
    [26]
    C. Doersch, A. Zisserman. Sim2real transfer learning for 3D human pose estimation: Motion to the rescue. In Proceedings of the 33rd Conference on Neural Information Processing Systems, Vancouver, Canada, 2019.
    [27]
    E. Kolve, R. Mottaghi, W. Han, E. VanderBilt, L. Weihs, A. Herrasti, M. Deitke, K. Ehsani, D. Gordon, Y. K. Zhu, A. Kembhavi, A. Gupta, A. Farhadi. AI2-THOR: An interactive 3D environment for visual AI. [Online], Available: https://arxiv.org/abs/1712.05474, 2017.
    [28]
    A. Kar, A. Prakash, M. Y. Liu, E. Cameracci, J. Yuan, M. Rusiniak, D. Acuna, A. Torralba, S. Fidler. Meta-Sim: Learning to generate synthetic datasets. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Seoul, Republic of Korea, pp. 4550–4559, 2019. DOI: 10.1109/ICCV.2019.00465.
    [29]
    A. Juliani, V. P. Berges, E. Teng, A. Cohen, J. Harper, C. Elion, C. Goy, Y. Gao, H. Henry, M. Mattar, D. Lange. Unity: A general platform for intelligent agents. [Online], Available: https://arxiv.org/abs/1809.02627, 2018.
    [30]
    X. X. Sun, L. Zheng. Dissecting person re-identification from the viewpoint of viewpoint. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Long Beach, USA, pp. 608–617, 2019. DOI: 10.1109/CVPR.2019.00070.
    [31]
    F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, G. Monfardini. The graph neural network model. IEEE Transactions on Neural Networks, vol. 20, no. 1, pp. 61–80, 2009. DOI: 10.1109/TNN.2008.2005605.
    [32]
    A. Bewley, Z. Y. Ge, L. Ott, F. Ramos, B. Upcroft. Simple online and realtime tracking. In Proceedings of IEEE International Conference on Image Processing, Phoenix, USA, pp. 3464–3468, 2016. DOI: 10.1109/ICIP.2016.7533003.
    [33]
    Y. H. Du, Y. Song, B. Yang, Y. Y. Zhao. StrongSORT: Make deepSORT great again. [Online], Available: https://arxiv.org/abs/2202.13514, 2022.
    [34]
    K. Bernardin, R. Stiefelhagen. Evaluating multiple object tracking performance: The clear mot metrics. EURASIP Journal on Image and Video Processing, vol. 2008, Article number 246309, 2008.
    [35]
    P. Dendorfer, H. Rezatofighi, A. Milan, J. Shi, D. Cremers, I. Reid, S. Roth, K. Schindler, L. Leal-Taixé. MOT20: A benchmark for multi object tracking in crowded scenes. [Online], Available: https://arxiv.org/abs/2003.09003. 2020.
    [36]
    W. J. Deng, L. Zheng, Q. X. Ye, G. L. Kang, Y. Yang, J. B. Jiao. Image-image domain adaptation with preserved self-similarity and domain-dissimilarity for person re-identification. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, USA, pp. 994–1003, 2018. DOI: 10.1109/CVPR.2018.00110.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(9)  / Tables(5)

    用微信扫码二维码

    分享至好友和朋友圈

    Article Metrics

    Article views (396) PDF downloads(33) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return