Hierarchical Open-vocabulary Part-object Segmentation with Knowledge-guided SAM
-
Graphical Abstract
-
Abstract
Open-vocabulary semantic segmentation (OV-Seg) aims to segment novel categories without dense annotations, facilitating scalable perception in open-world scenarios. While recent advances in foundation models – such as CLIP for semantic grounding and segment anything model (SAM) for spatial localization – have boosted OV-Seg performance, most existing methods remain limited to object-level segmentation and neglect the compositional and hierarchical structure of real-world entities. In this paper, we propose a hierarchical open-vocabulary segmentation framework that integrates CLIP and SAM with structured external knowledge in the form of object–part hierarchies. Specifically, we construct a category-level knowledge graph to encode part–whole relationships and guide the generation of enriched prompts, thereby aligning visual features with both object-level and part-level semantics. Extensive experiments on PartImageNet and PASCAL-Part demonstrate that our method consistently outperforms state-of-the-art baselines, especially in part-level segmentation and novel-category generalization. These results confirm the effectiveness of incorporating structured priors to enhance compositional and fine-grained visual understanding in open-vocabulary settings.
-
-