Riel Castro-Zunti, Eun Hae Park, Amol Satsangi, Younhee Choi, Gong Yong Jin, Hee Suk Chae, Seok-Bum Ko. A Novel Vision Transformer + InceptionV3 Hybrid Network for Accurate Diagnosis of Ankylosing Spondylitis from Computed Tomography Scans[J]. Machine Intelligence Research, 2025, 22(6): 1061-1087. DOI: 10.1007/s11633-024-1539-8
Citation: Riel Castro-Zunti, Eun Hae Park, Amol Satsangi, Younhee Choi, Gong Yong Jin, Hee Suk Chae, Seok-Bum Ko. A Novel Vision Transformer + InceptionV3 Hybrid Network for Accurate Diagnosis of Ankylosing Spondylitis from Computed Tomography Scans[J]. Machine Intelligence Research, 2025, 22(6): 1061-1087. DOI: 10.1007/s11633-024-1539-8

A Novel Vision Transformer + InceptionV3 Hybrid Network for Accurate Diagnosis of Ankylosing Spondylitis from Computed Tomography Scans

  • Rationale and objectives: Ankylosing spondylitis (AS) is a lifelong form of arthritis that inflames the sacroiliac joints (SIJs) and spine. Compared to magnetic resonance imaging (MRI), computed tomography (CT) better images bony erosions within SIJs – subtle early-stage AS indicators – and CT is considered the “gold standard” AS diagnostic modality. While clinical AS diagnoses via CT yield impressive specificity, sensitivity often lags. Materials and methods: Using 2000+ SIJs across the pelvic CT scans of 35 AS patients and 65 control patients, a 3-stage computer vision diagnostic pipeline was developed: SIJ localization/extraction using YOLOv5; classifying SIJs as AS (across various progression levels) or control (across young and old age groups) via a custom vision transformer (ViT) + InceptionV3 hybrid architecture; and using aggregated/normalized SIJ classification results to predict AS at the patient level using a traditional machine learning model. Results: For 73 patients with >8 SIJs (restriction limits the impact of SIJ misclassifications), the proposed pipeline achieved 98.63% accuracy, 96.43% sensitivity, perfect specificity, and 96.51% ROC AUC. It is demonstrated to be robust and outperforms retrained related works. Compared to vanilla ViT for SIJ classification, the hybrid model pipeline is 16% more accurate with 14% higher ROC AUC; compared to vanilla InceptionV3, it is 7% more accurate with 4% higher ROC AUC. Conclusion: AS can be diagnosed with high sensitivity and specificity via the proposed pipeline. ViT + InceptionV3 outperforms its baseline forebears while unifying their respective strengths. Being able to capture global and local details, ViT + InceptionV3 is applicable to tasks within computer assisted diagnostics and elsewhere.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return