Luequan Wang, Hongbin Xu, Wenxiong Kang. MVContrast: Unsupervised Pretraining for Multi-view 3D Object Recognition[J]. Machine Intelligence Research, 2023, 20(6): 872-883. DOI: 10.1007/s11633-023-1430-z
Citation: Luequan Wang, Hongbin Xu, Wenxiong Kang. MVContrast: Unsupervised Pretraining for Multi-view 3D Object Recognition[J]. Machine Intelligence Research, 2023, 20(6): 872-883. DOI: 10.1007/s11633-023-1430-z

MVContrast: Unsupervised Pretraining for Multi-view 3D Object Recognition

  • 3D shape recognition has drawn much attention in recent years. The view-based approach performs best of all. However, the current multi-view methods are almost all fully supervised, and the pretraining models are almost all based on ImageNet. Although the pretraining results of ImageNet are quite impressive, there is still a significant discrepancy between multi-view datasets and ImageNet. Multi-view datasets naturally retain rich 3D information. In addition, large-scale datasets such as ImageNet require considerable cleaning and annotation work, so it is difficult to regenerate a second dataset. In contrast, unsupervised learning methods can learn general feature representations without any extra annotation. To this end, we propose a three-stage unsupervised joint pretraining model. Specifically, we decouple the final representations into three fine-grained representations. Data augmentation is utilized to obtain pixel-level representations within each view. And we boost the spatial invariant features from the view level. Finally, we exploit global information at the shape level through a novel extract-and-swap module. Experimental results demonstrate that the proposed method gains significantly in 3D object classification and retrieval tasks, and shows generalization to cross-dataset tasks.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return