This paper aims to conduct a comprehensive study on facial-sketch synthesis (FSS). However, due to the high costs of obtaining hand-drawn sketch datasets, there is a lack of a complete benchmark for assessing the development of FSS algorithms over the last decade. Researchers from ETH Zurich first introduce a high-quality dataset for FSS, named FS2K, which consists of 2104 image-sketch pairs spanning three types of sketch styles, image backgrounds, lighting conditions, skin colors, and facial attributes. Second, they present the largest-scale FSS investigation by reviewing 89 classic methods. In addition, they elaborate comprehensive experiments on the existing 19 cutting-edge models. Third, they present a simple baseline for FSS, named FSGAN. Finally, they conclude with lessons learned over the past years and point out several unsolved challenges. Related work has published in the fourth issue of Machine Intelligence Research.
Facial-sketch synthesis (FSS) aims to generate grayscale sketches from RGB images of human faces (imageto-sketch, I2S) or the other way around (sketch-to-image, S2I). FSS is commonly used by law enforcement or used in surveillance to assist in face recognition and retrieval, based on a sketch drawing from an eyewitness. Entertainment is also used in mobile apps, such as TikTok and Facebook. In addition, it is an attractive topic in digital entertainment. Research into FSS has achieved significant progress over the past decade.
Different from other face-related datasets, such as those for face recognition, face detection, face keypoints detection, face alignment, and face synthesis, which can be manually labelled by annotators with limited training, face sketch datasets are much more difficult to obtain because only professional artists can produce high-quality references. Due to the high costs of obtaining professional sketches, existing image-sketch datasets are relatively small with limited diversity. This dataset shortage has limited the development, especially for data-hungry deep learning models.
In addition, how to evaluate FSS remains an open question. Structural similarity (SSIM) is one of the most widely used metrics for evaluating image quality, so it is also typically used to assess the performance of S2I models. Nevertheless, the characteristics of facial sketches are very different from RGB-based facial images, which makes it challenging to apply the current evaluation metrics to I2S tasks. Therefore, a new objective and quantitative metric, which is also highly consistent with human assessment, is needed for benchmarking the FSS task.
Moreover, due to the lack of high-quality datasets and proper evaluation metrics, different FSS models are usually built and tested on diverse training datasets and with different evaluation methods. Hence, it is not easy to provide fair and comprehensive comparisons. Furthermore, many cutting-edge transformation models (e.g., CycleGAN, UNIT, Pix2pixHD, SPADE, DSMAP, NICE-GAN, and DRIT++) designed for related image-to-image transfer tasks could potentially be employed in FSS tasks. However, as mentioned above, these models lack performance evaluations for the FSS task because of the shortage of datasets and evaluation metrics. Therefore, thorough comparisons and assessments of FSS-related models on a standard FSS dataset with unified evaluation metrics are long overdue. To this end, this paper has introduced and maintained an online paper list (https://github.com/DengPingFan/FaceSketch-Awesome-List) to track the progress of this fast-developing field.
The main contributions of this paper are as follows:
FSS dataset. We build a new high-quality FSS dataset, termed FS2K. It is the largest (see Table 1) publicly released FSS dataset, consisting of 2104 image-sketch pairs with a wide range of image backgrounds, skin patches, sketch styles, and lighting conditions. In addition, we also provide extra attributes, e.g., gender, smile, hair style, etc., to enable deep learning models to learn more details.
FSS review and benchmark. We conduct the largest-scale FSS study, reviewing 89 representative approaches, including 25 methods using handcrafted features, 29 models for the general transfer task, and 35 I2S transfer algorithms. Based on our FS2K, we adopt the SCOOT metric and conduct a rigorous evaluation of 19 state-of-the-art models from the perspective of content and style.
FSS baseline. We design an efficient GAN-based baseline, termed FSGAN, which consists of two simple core components, i.e., facial-aware masking and style-vector expansion. The former is utilized to restore details of the facial components, while the latter is adopted to learn different face styles. FSGAN serves as a unified baseline model for both I2S and S2I tasks (Fig. 1) on our newly built FS2K dataset. Our project is available at https:// github.com/DengPingFan/FSGAN.
Discussions and future directions. In addition to an overall performance assessment, we also conduct an attribute-level evaluation, present detailed discussions, and explore some promising future directions.
Download full text：
Facial-sketch Synthesis: A New Challenge
Deng-Ping Fan, Zi-Ling Huang, Peng Zheng, Hong Liu, Xue-Bin Qin, Luc Van Gool