1. Introduction
Computer-aided technologies for the diagnostic analysis of medical images have received significant attention from the research community. These are efficiently designed and modified for the purposes of inter-alia segmentation and classification of the region of interest (ROI) [
1], which in this instance involves cancerous regions. Needless to mention, the effective treatment of cancer is dependent on early detection and delimitation of lesion boundaries, particularly during its nascent stages because cancer generally has the characteristic tendency of delayed clinical onset [
2]. Every year, nearly 17 million people are affected by cancer and about 9.6 million people die due to delayed diagnosis and treatment [
3]. This makes cancer the leading causes of death worldwide [
4]. In the case of skin cancer, it is one of the most prevalent types of the disease in both adults and children [
5] and occurs or originates in the epidermal tissue. Various computer-aided techniques have been proposed for cancer boundary detection from dermoscopic images [
3].
Among the different types of skin cancers, melanoma is not only the most dangerous and aggressive in nature due to its high metastasis rate, but also has the greatest prevalence [
4]. Melanoma is a malignant type of skin cancer that develops through the irregular growth of pigmented skin cells called melanocytes [
5]. It can develop anywhere on the epidermal layer of the skin and presumably may also affect the chest and back and propagate from the primary site of the cancer [
6]. Its incidence rate has risen up to 4–6% annually and has the highest mortality rate among all types of skin cancers [
4]. Early diagnosis is crucial as it increases the five-year survival rate up to 98% [
7,
8].
From the above details pertaining to the incidence and mortality rate associated with melanoma, timely diagnosis becomes all the more necessary for providing effective treatment to the affected. Insofar as the detection and segmentation of lesion boundaries, there are two streams of methodologies: first, traditional methods that usually resort to visual inspection by the clinician, and second, semi-automated and automated methods, which mostly involve point-based pixel intensity operations [
9,
10], pixel clustering methods [
11,
12,
13,
14], level set methods [
15], deformable models [
16], deep-learning based methods [
17,
18,
19], et cetera.
Be that as it may, most of the methods being used today are not semi-automated because the accuracy associated therewith is generally prone to errors due to the following reasons: the inherent limitations of the methods [
20], and changing character of dermoscopic images induced due to the florescence and brightness inhomogeneities [
10]. For this very reason, the world has shifted toward more sophisticated methods, inter-alia, the convolutional neural networks (CNNs) [
21].
In this paper, we intend to exploit the properties and model architectures based on CNN for skin lesion boundary delimitation and segmentation. In addition, we propose our own novelty within the already available techniques which greatly increases the segmentation accuracy, which is image inpainting. Image inpainting, together with other image preprocessing techniques such as morphological operations, is used to remove the hair structures contained within the dermoscopic images that otherwise handicap the architecture because of complexities present in the images.
This research examines the accuracy of the proposed technique together with the adoption of the proposed preprocessing method. We also benchmark our proposed scheme with other available methods by way of results through network accuracy, Jaccard Index, Dice score, and other performance metrics that aid us in comparison.
1.1. Literature Review
This section delineates and chalks-out the relevant work done on the issue of segmentation of skin lesions. It is done with an added emphasis and focus on the recent studies that have incorporated deep-learning methods for the aforementioned purpose of lesion segmentation.
At the outset, it is contended that accurate segmentation and delimitation of skin lesion boundaries can aid and assist the clinician in the detection and diagnosis process, and may later also help toward classification of the lesion type. There has been a gamut of studies done for the purposes of segmentation and classification of skin lesions, and for a general survey of these, the reader can refer the following two papers authored by Oliveira et al. [
3], and Rafael et al. [
22].
We hereinafter present a review of the literature vis-à-vis two aspects (i.e., preprocessing and segmentation techniques, respectively). Both aspects have a direct effect on the outcome of the results (the prediction) and therefore, both are catered into the broader scheme of methodology presented in this paper. Additionally, since dermoscopic images have varying complexities and contain different textural, intensity, and feature inhomogeneity, it becomes necessary to apply prior preprocessing techniques so that inhomogeneous sections can be smoothened out.
1.1.1. Preprocessing Techniques
Researchers encounter complications while segmenting skin lesions due to low brightness and the noise present in the images. These artifacts affect the accuracy of segmentation. For better results, Celebi et al. [
23] proposed a technique that enhances image contrast by searching for idyllic weights for converting RGB images into grayscale by maximizing Otsu’s histogram bimodality measure. Optimization resulted in a better adaptive ability to distinguish between tumor and skin and allowed for accurate resolution of the regions, whereas Beuren et al. [
24] described the morphological operation that can be applied on the image for contrast enhancement. The lesion is highlighted through the color morphological filter and simply segmented through binarization. Lee et al. [
25] proposed a method to remove hair artifacts from dermoscopic images. An algorithm based on morphological operations was designed to remove hair like artifacts from skin images. Removing hair, characterized as noise, from skin images has a noteworthy effect on segmentation results. A median filter was found to be effective on noisy images. A nonlinear filter was applied to images to smooth them [
26]. Celebi et al. [
27] established a concept where the size of the filter to be applied must be proportional to the size of the image for effective smoothing.
Image inpainting is a preprocessing technique used for both removing parts from an image and for restoration purposes, so that the missing and damaged information in images is restored. It is of vital importance in the field of medical imaging and through its application, unnecessary structures or artifacts from the images (i.e., hair artifacts in skin lesions images) can be removed [
28,
29,
30].
1.1.2. Segmentation Techniques
Most image segmentation tasks use traditional machine learning processes for feature extraction. The literature explains some of the important techniques used for accurate segmentation. Jaisakthi et al. [
31] summarizes a semi-supervised method for segmenting skin lesions. Grab-cut techniques and K-means clustering are employed conjunctively for segmentation. After the former segments the melanoma through graph cuts, the latter fine-tunes the boundaries of the lesion. Preprocessing techniques such as image normalization and noise removal measures are used on the input images before feeding them to the pixel classifier. Mohanad Aljanabi et al. [
32] proposed an artificial bee colony (ABC) method to segment skin lesions. Utilizing fewer parameters, the model is a swarm-based scheme involving preprocessing of the digital images, followed by determining the optimum threshold value of the melanoma through which the lesion is segmented, as done by Otsu thresholding. High specificity and Jaccard Index are achieved by this algorithm.
Pennisi et al. [
33] introduced a technique that segments images using the Delaunay triangulation method (DTM). The approach involves parallel segmentation techniques that generate two varying images that are then merged to obtain the final lesion mask. Artifacts are removed from the images after which one process filters out the skin from the images to provide a binary mask of the lesion, and similarly, the other technique utilizes Delaunay triangulation to produce the mask. Both of these are combined to obtain the extracted lesion. The DTM technique is automated and does not require a training process, which is why it is faster than other methods. M Emre Celebi et al. [
34] provides a brief overview of the border detection techniques (i.e., edge based, region based, histogram thresholding, active contours and clustering, etc.) and especially pays attention to evaluation aspects and computational issues. Lei Bi et al. [
35] suggested a new automated method that performed image segmentation using image-wise supervised learning (ISL) and multiscale super pixel based cellular automata (MSCA). The authors used probabilistic mapping for automatic seed selection that removes user-defined seed selection; afterward, the MSCA model was employed for segmenting skin lesions. Ashnil Kumar et al. [
36] introduced a fully convolutional network (FCN) based method for segmenting dermoscopic images. Image features were learned from embedded multi-stages of the FCN and achieved an improved segmentation accuracy (than previous works) of skin lesion without employing any preprocessing part (i.e., hair removal, contrast improvement, etc.). Yading Yuan et al. [
37] proposed a convolution deconvolutional neural network (CDNN) to automate the process of the segmentation of skin lesions. This paper focused on training strategies that makes the model more efficient, as opposed to the use of various pre- and post-processing techniques. The model generates probability maps where the elements correspond to the probability of pixels belonging to the melanoma. Berseth et al. [
38] developed a U-Net architecture for segmenting skin lesions based on the probability map of the image dimension where the ten-fold cross validation technique was used for training the model. Mishra [
17] presented a deep learning technique for extracting the lesion region from dermoscopic images.
This paper combines Otsu’s thresholding and CNN for better results. U-Net based architecture was used to extract more complex features. Chengyao Qian et al. [
39] proposed an encoder decoder architecture for segmentation inspired by DeepLab [
40] and ResNet 101 was adapted for feature extraction. Frederico Guth et al. [
41] introduced a U-Net 34 architecture that merged insights from U-Net and ResNet. The optimized learning rate was used for fine tuning the network and the slanted triangular learning rate strategy (STLR) was employed.
5. Conclusions
Skin lesion segmentation is a vital step in developing a computer aided diagnosis system for skin cancer. In this paper, we successfully developed a skin lesion segmentation algorithm using CNN with an advanced hair-removal algorithm that effectively removed hair structures from the dermoscopic images, improving the accuracy considerably. We tested our model architecture on the ISIC-2017 dataset and PH2 dataset, and the Jaccard index obtained thereof was 0.772 and 0.854, respectively. Our proposed method achieved promising results compared with the state-of-the-art techniques in terms of the Jaccard index. Furthermore, our CNN model was tested on a PH2 dataset along with the ISIC-17 test set and produced better segmentation and performed better than the existing methods in the literature. Empirical results show that the combination of the U-Net and ResNet shows impressive results.
The limited training data used requires extensive augmentation to prevent the model from overfitting. A large dataset is therefore needed for better accuracy and generalization of the model. Furthermore, for it to achieve state-of-the-art results, the model was made to be complex and efficient, which takes more time to train as opposed to the conventional U-Net.
Our future work includes using a larger dataset to reduce overfitting problems and hyper tuning the parameters for more effective training. Additionally, a conditional random field (CRF) application can also be applied to refine the model output.