Citation: | Luequan Wang, Hongbin Xu, Wenxiong Kang. MVContrast: Unsupervised Pretraining for Multi-view 3D Object Recognition. Machine Intelligence Research, vol. 20, no. 6, pp.872-883, 2023. https://doi.org/10.1007/s11633-023-1430-z |
[1] |
K. Simonyan, A. Zisserman. Very deep convolutional networks for large-scale image recognition, [Online], Available: https://arxiv.org/abs/1409.1556, 2014.
|
[2] |
K. M. He, X. Y. Zhang, S. Q. Ren, J. Sun. Deep residual learning for image recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, pp. 770–778, 2016. DOI: 10.1109/CVPR.2016.90.
|
[3] |
K. M. He, R. Girshick, P. Dollár. Rethinking imageNet pre-training. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Seoul, Republic of Korea, pp. 4917–4926, 2019. DOI: 10.1109/ICCV.2019.00502.
|
[4] |
T. Chen, S. Kornblith, M. Norouzi, G. Hinton. A simple framework for contrastive learning of visual representations. In Proceedings of the 37th International Conference on Machine Learning, Article number 149, 2020.
|
[5] |
I. Misra, L. van der Maaten. Self-supervised learning of pretext-invariant representations. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp. 6706–6716, 2020. DOI: 10.1109/CVPR42600.2020.00674.
|
[6] |
H. Su, S. Maji, E. Kalogerakis, E. Learned-Miller. Multi-view convolutional neural networks for 3D shape recognition. In Proceedings of IEEE International Conference on Computer Vision, Santiago, Chile, pp. 945–953, 2015. DOI: 10.1109/ICCV.2015.114.
|
[7] |
D. Pathak, P. Krähenbühl, J. Donahue, T. Darrell, A. A. Efros. Context encoders: Feature learning by inpainting. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, pp. 2536–2544, 2016. DOI: 10.1109/CVPR.2016.278.
|
[8] |
R. Qian, T. J. Meng, B. Q. Gong, M. H. Yang, H. S. Wang, S. Belongie, Y. Cui. Spatiotemporal contrastive video representation learning. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Nashville, USA, pp. 6960–6970, 2021. DOI: 10.1109/CVPR46437.2021.00689.
|
[9] |
R. Zhang, P. Isola, A. A. Efros. Colorful image colorization. In Proceedings of the 14th European Conference on Computer Vision, Springer, Amsterdam, The Netherlands, pp. 649–666, 2016. DOI: 10.1007/978-3-319-46487-9_40.
|
[10] |
C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, W. Z. Shi. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, pp. 105–114, 2017. DOI: 10.1109/CVPR.2017.19.
|
[11] |
J. Y. Liang, J. Z. Cao, G. L. Sun, K. Zhang, L. Van Gool, R. Timofte. SwinIR: Image restoration using swin transformer. In Proceedings of IEEE/CVF International Conference on Computer Vision Workshops, IEEE, Montreal, Canada, pp. 1833–1844, 2021. DOI: 10.1109/ICCVW54120.2021.00210.
|
[12] |
R. R. Zhang, Z. Y. Guo, W. Zhang, K. C. Li, X. P. Miao, B. Cui, Y. Qiao, P. Gao, H. S. Li. PointCLIP: Point cloud understanding by clip. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, New Orleans, USA, pp. 8542–8552, 2022. DOI: 10.1109/CVPR52688.2022.00836.
|
[13] |
T. Y. Huang, B. W. Dong, Y. H. Yang, X. S. Huang, R. W. H. Lau, W. L. Ouyang, W. M. Zuo. CLIP2Point: Transfer CLIP to point cloud classification with image-depth pre-training, [Online], Available: https://arxiv.org/abs/2210.01055, 2022.
|
[14] |
K. M. He, H. Q. Fan, Y. X. Wu, S. N. Xie, R. Girshick. Momentum contrast for unsupervised visual representation learning. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp. 9726–9735, 2020. DOI: 10.1109/CVPR42600.2020.00975.
|
[15] |
X. L. Chen, H. Q. Fan, R. Girshick, K. M. He. Improved baselines with momentum contrastive learning, [Online], Available: https://arxiv.org/abs/2003.04297, 2020.
|
[16] |
J. B. Grill, F. Strub, F. Altché, C. Tallec, P. H. Richemond, E. Buchatskaya, C. Doersch, B. A. Pires, Z. D. Guo, M. G. Azar, B. Piot, K. Kavukcuoglu, R. Munos, M. Valko. Bootstrap your own latent a new approach to self-supervised learning. In Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, Canada, Article number 1786, 2020.
|
[17] |
Z. Zhang, L. Liu, F. M. Shen, H. T. Shen, L. Shao. Binary multi-view clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 7, pp. 1774–1782, 2019. DOI: 10.1109/TPAMI.2018.2847335.
|
[18] |
T. Yu, J. J. Meng, J. S. Yuan. Multi-view harmonized bilinear network for 3D object recognition. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, USA, pp. 186–194, 2018. DOI: 10.1109/CVPR.2018.00027.
|
[19] |
X. W. He, T. T. Huang, S. Bai, X. Bai. View N-gram network for 3d object retrieval. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Seoul, Republic of Korea, pp. 7514–7523, 2019. DOI: 10.1109/ICCV.2019.00761.
|
[20] |
Y. Xu, C. D. Zheng, R. T. Xu, Y. H. Quan, H. B. Ling. Multi-view 3D shape recognition via correspondence-aware deep learning. IEEE Transactions on Image Processing, vol. 30, pp. 5299–5312, 2021. DOI: 10.1109/TIP.2021.3082310.
|
[21] |
S. S. Mohammadi, Y. M. Wang, A. Del Bue. Pointview-GCN: 3D shape classification with multi-view point clouds. In Proceedings of IEEE International Conference on Image Processing, Anchorage, USA, pp. 3103–3107, 2021. DOI: 10.1109/ICIP42928.2021.9506426.
|
[22] |
Z. Z. Han, X. Y. Wang, C. M. Vong, Y. S. Liu, M. Zwicker, C. L. P. Chen. 3Dviewgraph: Learning global features for 3D shapes from a graph of unordered views with attention. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, Macao, China, pp. 758–765, 2019.
|
[23] |
X. Wei, R. X. Yu, J. Sun. View-GCN: View-based graph convolutional network for 3D shape analysis. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp. 1847–1856, 2020. DOI: 10.1109/CVPR42600.2020.00192.
|
[24] |
A. Hamdi, S. Giancola, B. Ghanem. MVTN: Multi-view transformation network for 3D shape recognition. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Montreal, Canada, pp. 1–11, 2021. DOI: 10.1109/ICCV48922.2021.00007.
|
[25] |
R. Girdhar, D. F. Fouhey, M. Rodriguez, A. Gupta. Learning a predictable and generative vector representation for objects. In Proceedings of the 14th European Conference on Computer Vision, Springer, Amsterdam, The Netherlands, pp. 484–499, 2016. DOI: 10.1007/978-3-319-46466-4_29.
|
[26] |
A. Sharma, O. Grau, M. Fritz. VConv-DAE: Deep volumetric shape learning without object labels. In Proceedings of European Conference on Computer Vision, Springer, Amsterdam, The Netherlands, pp. 236–250, 2016. 10.1007/978-3-319-49409-8_20.
|
[27] |
J. J. Wu, C. K. Zhang, T. F. Xue, W. T. Freeman, J. B. Tenenbaum. Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. In Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, pp. 82–90, 2016.
|
[28] |
S. K. Liu, L. Giles, A. Ororbia. Learning a hierarchical latent-variable model of 3D shapes. In Proceedings of International Conference on 3D Vision, IEEE, Verona, Italy, pp. 542–551, 2018. DOI: 10.1109/3DV.2018.00068.
|
[29] |
P. Achlioptas, O. Diamanti, I. Mitliagkas, L. J. Guibas. Learning representations and generative models for 3D point clouds. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, pp. 40–49, 2018.
|
[30] |
Y. Q. Yang, C. Feng, Y. R. Shen, D. Tian. FoldingNet: Point cloud auto-encoder via deep grid deformation. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, USA, pp. 206–215, 2018. DOI: 10.1109/CVPR.2018.00029.
|
[31] |
Y. H. Zhao, T. Birdal, H. W. Deng, F. Tombari. 3D point capsule networks. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Long Beach, USA, pp. 1009–1018, 2019. DOI: 10.1109/CVPR.2019.00110.
|
[32] |
Z. Z. Han, X. Y. Wang, Y. S. Liu, M. Zwicker. Multi-angle point cloud-VAE: Unsupervised feature learning for 3D point clouds from multiple angles by joint self-reconstruction and half-to-half prediction. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Seoul, Republic of Korea, pp. 10441–10450, 2019. DOI: 10.1109/ICCV.2019.01054.
|
[33] |
Z. Z. Han, M. Y. Shang, Y. S. Liu, M. Zwicker. View inter-prediction GAN: Unsupervised representation learning for 3D shapes by learning global shape memories to support local view predictions. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence, the 31st Innovative Applications of Artificial Intelligence Conference and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, Honolulu, USA, Article number 1027, 2019.
|
[34] |
Y. F. Feng, Z. Z. Zhang, X. B. Zhao, R. R. Ji, Y. Gao. GVCNN: Group-view convolutional neural networks for 3D shape recognition. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, USA, pp. 264–272, 2018. DOI: 10.1109/CVPR.2018.00035.
|
[35] |
J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, L. Fei-Fei. ImageNet: A large-scale hierarchical image database. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Miami, USA, pp. 248-255, 2009. DOI: 10.1109/CVPR.2009.5206848.
|
[36] |
H. X. You, Y. F. Feng, X. B. Zhao, C. Q. Zou, R. R. Ji, Y. Gao. PVRNet: Point-view relation neural network for 3D shape recognition. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence and 31st Innovative Applications of Artificial Intelligence Conference and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, Honolulu, USA, Article number 1119, 2019. DOI: 10.1609/aaai.v33i01.33019119.
|
[37] |
M. Kazhdan, T. Funkhouser, S. Rusinkiewicz. Rotation invariant spherical harmonic representation of 3D shape descriptors. In Proceedings of Eurographics/ACM SIGGRAPH Symposium on Geometry Processing, Aachen, Germany, pp. 156–164, 2003.
|
[38] |
Z. R. Wu, S. R. Song, A. Khosla, F. Yu, L. G. Zhang, X. O. Tang, J. X. Xiao. 3D shapeNets: A deep representation for volumetric shapes. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, pp. 1912–1920, 2015. DOI: 10.1109/CVPR.2015.7298801.
|
[39] |
T. Furuya, R. Ohbuchi. Deep aggregation of local 3D geometric features for 3D model retrieval. In Proceedings of British Machine Vision Conference, York, UK, Article number 8, 2016.
|
[40] |
B. G. Shi, S. Bai, Z. C. Zhou, X. Bai. DeepPano: Deep panoramic representation for 3-D shape recognition. IEEE Signal Processing Letters, vol. 22, no. 12, pp. 2339–2343, 2015. DOI: 10.1109/LSP.2015.2480802.
|
[41] |
S. Bai, X. Bai, Z. C. Zhou, Z. X. Zhang, L. Jan Latecki. GIFT: A real-time and scalable 3D shape search engine. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, pp. 5023–5032, 2016. DOI: 10.1109/CVPR.2016.543.
|
[42] |
K. Sfikas, T. Theoharis, I. Pratikakis. Exploiting the PANORAMA representation for convolutional neural network classification and retrieval. In Proceedings of Workshop on 3D Object Retrieval, Lyon, France, 2017. DOI: 10.2312/3dor.20171045.
|
[43] |
X. W. He, Y. Zhou, Z. C. Zhou, S. Bai, X. Bai. Triplet-center loss for multi-view 3D object retrieval. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, USA, pp. 1945–1954, 2018. DOI: 10.1109/CVPR.2018.00208.
|
[44] |
M. Savva, F. Yu, H. Su, A. Kanezaki, T. Furuya, R. Ohbuchi, Z. C. Zhou, R. Yu, S. Bai, X. Bai, M. Aono, A. Tatsuma, S. Thermos, A. Axenopoulos, G. T. Papadopoulos, P. Daras, X. Deng, Z. H. Lian, B. Li, H. Johan, Y. J. Lu, S. Mk. Large-scale 3D shape retrieval from shapeNet core55. In Proceedings of Workshop on 3D Object Retrieval, Lyon, France, pp. 39–50, 2017. DOI: 10.2312/3dor.20171050.
|
[45] |
B. Li, H. Johan. 3D model retrieval using hybrid features and class information. Multimedia Tools and Applications, vol. 62, no. 3, pp. 821–846, 2013. DOI: 10.1007/s11042-011-0873-3.
|
[46] |
D. Robben, J. Bertels, S. Willems, D. Vandermeulen, F. Maes, P. Suetens. DeepVoxNet: Voxel-Wise Prediction for 3D Images, Report No. KUL/ESAT/PSI/1801, 2018.
|
[47] |
A. Kanezaki, Y. Matsushita, Y. Nishida. RotationNet: Joint object categorization and pose estimation using multiviews from unsupervised viewpoints. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, USA, pp. 5010–5019, 2018. DOI: 10.1109/CVPR.2018.00526.
|
[48] |
Y. M. Rao, J. W. Lu, J. Zhou. Global-local bidirectional reasoning for unsupervised representation learning of 3D point clouds. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp. 5375–5384, 2020. DOI: 10.1109/CVPR42600.2020.00542.
|