Yuliang Liang, Enneng Yang, Guibing Guo, Wei Cai, Linying Jiang, Jianzhe Zhao, Xingwei Wang. Answer Semantics-enhanced Medical Visual Question Answering[J]. Machine Intelligence Research, 2025, 22(6): 1127-1137. DOI: 10.1007/s11633-025-1564-2
Citation: Yuliang Liang, Enneng Yang, Guibing Guo, Wei Cai, Linying Jiang, Jianzhe Zhao, Xingwei Wang. Answer Semantics-enhanced Medical Visual Question Answering[J]. Machine Intelligence Research, 2025, 22(6): 1127-1137. DOI: 10.1007/s11633-025-1564-2

Answer Semantics-enhanced Medical Visual Question Answering

  • Medical visual question answering (Med-VQA) is a task that aims to answer clinical questions given a medical image. Existing literature generally treats it as a classic classification task based on interaction features of the image and question. However, such a paradigm ignores the valuable semantics of candidate answers as well as their relations. From the real-world dataset, we observe that: 1) The text of candidate answers has a strong intrinsic correlation with medical images; 2) Subtle differences among multiple candidate answers are crucial for identifying the correct one. Therefore, we propose an answer semantics enhanced (ASE) method to integrate the semantics of answers and capture their subtle differences. Specifically, we enhance the semantic correlation of image-question-answer triplets by aligning images and question-answer tuples within the feature fusion module. Then, we devise a contrastive learning loss to highlight the semantic differences between the correct answer and other answers. Finally, extensive experiments demonstrate the effectiveness of our method.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return