Hierarchical Open-vocabulary Part-object Segmentation with Knowledge-guided SAM

Xin-Jian Wu; Cheng-Lin Liu

doi:10.1007/s11633-025-1610-0

Xin-Jian Wu, Cheng-Lin Liu. Hierarchical Open-vocabulary Part-object Segmentation with Knowledge-guided SAMJ. Machine Intelligence Research, 2026, 23(1): 214-226. DOI: 10.1007/s11633-025-1610-0

Citation:

Xin-Jian Wu, Cheng-Lin Liu. Hierarchical Open-vocabulary Part-object Segmentation with Knowledge-guided SAMJ. Machine Intelligence Research, 2026, 23(1): 214-226. DOI: 10.1007/s11633-025-1610-0

Citation:

Xin-Jian Wu, Cheng-Lin Liu. Hierarchical Open-vocabulary Part-object Segmentation with Knowledge-guided SAMJ. Machine Intelligence Research, 2026, 23(1): 214-226. DOI: 10.1007/s11633-025-1610-0

Hierarchical Open-vocabulary Part-object Segmentation with Knowledge-guided SAM

Graphical Abstract

Graphical Abstract

Abstract

Abstract

Open-vocabulary semantic segmentation (OV-Seg) aims to segment novel categories without dense annotations, facilitating scalable perception in open-world scenarios. While recent advances in foundation models – such as CLIP for semantic grounding and segment anything model (SAM) for spatial localization – have boosted OV-Seg performance, most existing methods remain limited to object-level segmentation and neglect the compositional and hierarchical structure of real-world entities. In this paper, we propose a hierarchical open-vocabulary segmentation framework that integrates CLIP and SAM with structured external knowledge in the form of object–part hierarchies. Specifically, we construct a category-level knowledge graph to encode part–whole relationships and guide the generation of enriched prompts, thereby aligning visual features with both object-level and part-level semantics. Extensive experiments on PartImageNet and PASCAL-Part demonstrate that our method consistently outperforms state-of-the-art baselines, especially in part-level segmentation and novel-category generalization. These results confirm the effectiveness of incorporating structured priors to enhance compositional and fine-grained visual understanding in open-vocabulary settings.

FullText(HTML)

References (49)

Supplements (0)

Cited By

Hierarchical Open-vocabulary Part-object Segmentation with Knowledge-guided SAM

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content