MVContrast: Unsupervised Pretraining for Multi-view 3D Object Recognition

Luequan Wang; Hongbin Xu; Wenxiong Kang

doi:10.1007/s11633-023-1430-z

Luequan Wang, Hongbin Xu, Wenxiong Kang. MVContrast: Unsupervised Pretraining for Multi-view 3D Object Recognition[J]. Machine Intelligence Research, 2023, 20(6): 872-883. DOI: 10.1007/s11633-023-1430-z

Citation:

MVContrast: Unsupervised Pretraining for Multi-view 3D Object Recognition

Graphical Abstract

Graphical Abstract

Abstract

Abstract

3D shape recognition has drawn much attention in recent years. The view-based approach performs best of all. However, the current multi-view methods are almost all fully supervised, and the pretraining models are almost all based on ImageNet. Although the pretraining results of ImageNet are quite impressive, there is still a significant discrepancy between multi-view datasets and ImageNet. Multi-view datasets naturally retain rich 3D information. In addition, large-scale datasets such as ImageNet require considerable cleaning and annotation work, so it is difficult to regenerate a second dataset. In contrast, unsupervised learning methods can learn general feature representations without any extra annotation. To this end, we propose a three-stage unsupervised joint pretraining model. Specifically, we decouple the final representations into three fine-grained representations. Data augmentation is utilized to obtain pixel-level representations within each view. And we boost the spatial invariant features from the view level. Finally, we exploit global information at the shape level through a novel extract-and-swap module. Experimental results demonstrate that the proposed method gains significantly in 3D object classification and retrieval tasks, and shows generalization to cross-dataset tasks.

FullText(HTML)

References (48)

Supplements (0)

Cited By

MVContrast: Unsupervised Pretraining for Multi-view 3D Object Recognition

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content