Citation: | Qi Zheng, Chao-Yue Wang, Dadong Wang, Da-Cheng Tao. Visual Superordinate Abstraction for Robust Concept Learning. Machine Intelligence Research, vol. 20, no. 1, pp.79-91, 2023. https://doi.org/10.1007/s11633-022-1360-1 |
[1] |
B. Inhelder, J. Piaget. The early growth of logic in the child: Classification and seriation. Routledge, vol. 83, 2013. DOI: 10.4324/9781315009667.
|
[2] |
S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, C. L. Zitnick, D. Parikh. VQA: Visual question answering. In Proceedings of IEEE International Conference on Computer Vision, Santiago, Chile, pp. 2425–2433, 2015. DOI: 10.1109/ICCV.2015.279
|
[3] |
R. Zellers, Y. Bisk, A. Farhadi, Y. Choi. From recognition to cognition: Visual commonsense reasoning. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, USA, pp. 6720–6731, 2019. DOI: 10.1109/CVPR.2019.00688
|
[4] |
P. Anderson, Q. Wu, D. Teney, J. Bruce, M. Johnson, N. Sünderhauf, I. Reid, S. Gould, A. Van Den Hengel. Vision-and-language navigation: Interpreting visually-grounded navigation instructions in real environments. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, pp. 3674–3683, 2018. DOI: 10.1109/CVPR.2018.00387
|
[5] |
D. Mascharka, P. Tran, R. Soklaski, A. Majumdar. Transparency by design: Closing the gap between performance and interpretability in visual reasoning. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, pp. 4942–4950, 2018. DOI: 10.1109/CVPR.2018.00519
|
[6] |
K. Yi, J. Wu, C. Gan, A. Torralba, P. Kohli, J. B. Tenenbaum. Neural-symbolic VQA: Disentangling reasoning from vision and language understanding. In Proceedings of Advances in Neural Information Processing Systems, Montréal, Canada, vol. 31, 2018.
|
[7] |
J. Mao, C. Gan, P. Kohli, J. B. Tenenbaum, J. Wu. The neuro-symbolic concept learner: Interpreting scenes, words, and sentences from natural supervision. In Proceedings of International Conference on Learning Representations, New Orleans, USA, 2019.
|
[8] |
J. Johnson, B. Hariharan, L. Van Der Maaten, L. Fei-Fei, C. L. Zitnick, R. Girshick. CLEVR: A diagnostic dataset for compositional language and elementary visual reasoning. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Hawaii, USA, pp. 2901–2910, 2017. DOI: 10.1109/CVPR.2017.215
|
[9] |
V. Marois, T. Jayram, V. Albouy, T. Kornuta, Y. Bouhadjar, A. S. Ozcan. On transfer learning using a mac model variant. In Proceedings of Workshop of Advances in Neural Information Processing Systems, Montréal, Canada, 2018.
|
[10] |
G. Murphy. The Big Book of Concepts. Cambridge, USA: MIT press, 2004. DOI: 10.7551/mitpress/1602.001.0001
|
[11] |
T. K. Landauer, S. T. Dumais. A solution to plato′s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological review, vol. 104, no. 2, Article number 211, 1997. DOI: 10.1037/0033-295X.104.2.211.
|
[12] |
K. Lund, C. Burgess. Producing high-dimensional semantic spaces from lexical co-occurrence. Behavior research methods,instruments &computers, vol. 28, no. 2, pp. 203–208, 1996. DOI: 10.3758/BF03204766.
|
[13] |
B. M. Lake, G. L. Murphy. Word meaning in minds and machines. Psychological Review, to be published.
|
[14] |
J. B. Tenenbaum, C. Kemp, T. L. Griffiths, N. D. Goodman. How to grow a mind: Statistics, structure, and abstraction. Science, vol. 331, no. 6022, pp. 1279–1285, 2011. DOI: 10.1126/science.1192788.
|
[15] |
E. Rosch, C. B. Mervis, W. D. Gray, D. M. Johnson, P. Boyes-Braem. Basic objects in natural categories. Cognitive psychology, vol. 8, no. 3, pp. 382–439, 1976. DOI: 10.1016/0010-0285(76)90013-X.
|
[16] |
J. W. Tanaka, M. Taylor. Object categories and expertise: Is the basic level in the eye of the beholder? Cognitive psychology, vol. 23, no. 3, pp. 457–482, 1991. DOI: 10.1016/0010-0285(91)90016-H.
|
[17] |
C. Han, J. Mao, C. Gan, J. B. Tenenbaum, J. Wu. Visual concept-metaconcept learning. In Proceedings of Advances in Neural Information Processing Systems, Vancouver, Canada, 2019.
|
[18] |
A. Li, K. Zhang, L. Wang. Zero-shot fine-grained classification by deep feature learning with semantics. International Journal of Automation and Computing, vol. 16, no. 5, pp. 563–574, 2019. DOI: 10.1007/s11633-019-1177-8.
|
[19] |
W. Zhu, W. Sun, X. Min, G. Zhai, X. Yang. Structured computational modeling of human visual system for no-reference image quality assessment. International Journal of Automation and Computing, vol. 18, no. 2, pp. 204–218, 2021. DOI: 10.1007/s11633-020-1270-z.
|
[20] |
J. Johnson, B. Hariharan, L. Van Der Maaten, J. Hoffman, L. Fei-Fei, C. Lawrence Zitnick, R. Girshick. Inferring and executing programs for visual reasoning. In Proceedings of IEEE International Conference on Computer Vision, Venice, Italy, pp. 2989–2998, 2017. DOI: 10.1109/ICCV.2017.325
|
[21] |
K. He, X. Zhang, S. Ren, J. Sun. Deep residual learning for image recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, pp. 770–778, 2016. DOI: 10.1109/CVPR.2016.90
|
[22] |
R. Hu, J. Andreas, M. Rohrbach, T. Darrell, K. Saenko. Learning to reason: End-to-end module networks for visual question answering. In Proceedings of IEEE International Conference on Computer Vision, Venice, Italy, pp. 804–813, 2017. DOI: 10.1109/ICCV.2017.93
|
[23] |
R. Hu, J. Andreas, T. Darrell, K. Saenko. Explainable neural computation via stack neural module networks. In Proceedings of the European Conference on Computer Vision, Springer, Munich, Germany, pp. 53–69, 2018. DOI: 10.1007/978-3-030-01234-2_4
|
[24] |
Z. Chen, J. Mao, J. Wu, K. Wong, J. Tenenbaum, C. Gan. Grounding physical concepts of objects and events through dynamic visual reasoning. In Proceedings of International Conference on Learning Representations, Vienna, Austria, 2021.
|
[25] |
Q. Li, S. Huang, Y. Hong, S.-C. Zhu. A competence-aware curriculum for visual concepts learning via question answering. In Proceedings of the European Conference on Computer Vision, Springer, pp. 141–157, 2020. DOI: 10.1007/978-3-030-58536-5_9
|
[26] |
E. Perez, F. Strub, H. De Vries, V. Dumoulin, A. Courville. Film: Visual reasoning with a general conditioning layer. In Proceedings of AAAI Conference on Artificial Intelligence, New Orleans, USA, pp.3942–3951, 2018. DOI: 10.1609/aaai.v32i1.11671
|
[27] |
D. A. Hudson, C. D. Manning. Compositional attention networks for machine reasoning. In Proceedings of International Conference on Learning Representations, Vancouver, Canada, 2018.
|
[28] |
Z. Wang, K. Wang, M. Yu, J. Xiong, W. Hwu, M. Hasegawa-Johnson, H. Shi. Interpretable visual reasoning via induced symbolic space. In Proceedings of IEEE International Conference on Computer Vision, Montréal, Canada, pp. 1878–1887, 2021. DOI: 10.1109/ICCV48922.2021.00189
|
[29] |
A. Kamath, M. Singh, Y. LeCun, I. Misra, G. Synnaeve, N. Carion. Mdetr-modulated detection for end-to-end multi-modal understanding. In Proceedings of IEEE International Conference on Computer Vision, Montréal, Canada, pp.1780–1790, 2021. DOI: 10.1109/ICCV48922.2021.00180
|
[30] |
J. Pearl. Causal inference in statistics: An overview. Statistics Surveys, vol. 3, pp. 96–146, 2009. DOI: 10.1214/09-SS057.
|
[31] |
G. Dunn, R. Emsley, H. Liu, S. Landau, J. Green, I. White, A. Pickles. Evaluation and validation of social and psychological markers in randomised trials of complex interventions in mental health: a methodological research programme. Health Technology Assessment,Winchester,England, vol. 19, no. 93, pp. 1–115, 2015. DOI: 10.3310/hta19930.
|
[32] |
B. G. King. A political mediation model of corporate response to social movement activism. Administrative Science Quarterly, vol. 53, no. 3, pp. 395–421, 2008. DOI: 10.2189/asqu.53.3.395.
|
[33] |
D. P. MacKinnon, A. J. Fairchild, M. S. Fritz. Mediation analysis. Annual Review of Psychology, vol. 58, pp. 593–614, 2007. DOI: 10.1146/annurev.psych.58.110405.085542.
|
[34] |
L. Richiardi, R. Bellocco, D. Zugna. Mediation analysis in epidemiology: methods, interpretation and bias. International Journal of Epidemiology, vol. 42, no. 5, pp. 1511–1519, 2013. DOI: 10.1093/ije/dyt127.
|
[35] |
S. Nair, Y. Zhu, S. Savarese, L. Fei-Fei. Causal induction from visual observations for goal directed tasks. [Online], Available: https://arxiv.org/abs/1910.01751.
|
[36] |
Y. Niu, K. Tang, H. Zhang, Z. Lu, X.-S. Hua, J.-R. Wen. Counterfactual VQA: A cause-effect look at language bias. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Nashville, USA, pp.12700–12710, 2021. DOI: 10.1109/CVPR46437.2021.01251.
|
[37] |
J. Qi, Y. Niu, J. Huang, H. Zhang. Two causal principles for improving visual dialog. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Seattle, USA, pp. 10860–10869, 2020. DOI: 10.1109/CVPR42600.2020.01087
|
[38] |
T. Wang, J. Huang, H. Zhang, Q. Sun. Visual commonsense R-CNN. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Seattle, USA, pp. 10760–10770, 2020. DOI: 10.1109/CVPR42600.2020.01077
|
[39] |
X. Yang, H. Zhang, J. Cai. Deconfounded image captioning: A causal retrospect. IEEE Transactions on Pattern Analysis and Machine Intelligence, to be published, 2022.
|
[40] |
K. Tang, Y. Niu, J. Huang, J. Shi, H. Zhang. Unbiased scene graph generation from biased training. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Seattle, USA, pp. 3716–3725, 2020. DOI: 10.1109/CVPR42600.2020.00377
|
[41] |
I. Loshchilov, F. Hutter. Decoupled weight decay regularization. In Proceedings of International Conference on Learning Representations, New Orleans, USA, 2019.
|