A Novel Divide and Conquer Solution for Long-term Video Salient Object Detection
-
-
Abstract
Recently, a new research trend in our video salient object detection (VSOD) research community has focused on enhancing the detection results via model self-fine-tuning using sparsely mined high-quality keyframes from the given sequence. Although such a learning scheme is generally effective, it has a critical limitation, i.e., the model learned on sparse frames only possesses weak generalization ability. This situation could become worse on ``long'' videos since they tend to have intensive scene variations. Moreover, in such videos, the keyframe information from a longer time span is less relevant to the previous, which could also cause learning conflict and deteriorate the model performance. Thus, the learning scheme is usually incapable of handling complex pattern modeling. To solve this problem, we propose a divide-and-conquer framework, which can convert a complex problem domain into multiple simple ones. First, we devise a novel background consistency analysis (BCA) which effectively divides the mined frames into disjoint groups. Then for each group, we assign an individual deep model on it to capture its key attribute during the fine-tuning phase. During the testing phase, we design a model-matching strategy, which could dynamically select the best-matched model from those fine-tuned ones to handle the given testing frame. Comprehensive experiments show that our method can adapt severe background appearance variation coupling with object movement and obtain robust saliency detection compared with the previous scheme and the state-of-the-art methods.
-
-