Next Article in Journal
The Influence of Classroom Illumination Environment on the Efficiency of Foreign Language Learning
Next Article in Special Issue
Evaluation of Quality and Readability of Online Health Information on High Blood Pressure Using DISCERN and Flesch-Kincaid Tools
Previous Article in Journal
Will Bid/No-Bid Decision Factors for Construction Projects Be Different in Economic Downturns? A Chinese Study
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Identification of Breast Malignancy by Marker-Controlled Watershed Transformation and Hybrid Feature Set for Healthcare

1
Department of Computer Science, University of Central Punjab, Sargodha Campus 40100, Pakistan
2
Department of Computer Sciences, Quaid-i-Azam University, Islamabad 44000, Pakistan
3
Department of Computer Science, International Islamic University, Islamabad 44000, Pakistan
4
Department of Software Engineering, Foundation University Islamabad, Islamabad 44000, Pakistan
5
School of Reliability and Systems Engineering, Beihang University, Beijing 100191, China
6
College of Computing and Informatics, Saudi Electronic University, Riyadh 11673, Saudi Arabia
*
Author to whom correspondence should be addressed.
Appl. Sci. 2020, 10(6), 1900; https://doi.org/10.3390/app10061900
Submission received: 12 February 2020 / Revised: 4 March 2020 / Accepted: 5 March 2020 / Published: 11 March 2020
(This article belongs to the Special Issue Recent Developments in Smart Healthcare)

Abstract

:
Breast cancer is a highly prevalent disease in females that may lead to mortality in severe cases. The mortality can be subsided if breast cancer is diagnosed at an early stage. The focus of this study is to detect breast malignancy through computer-aided diagnosis (CADx). In the first phase of this work, Hilbert transform is employed to reconstruct B-mode images from the raw data followed by the marker-controlled watershed transformation to segment the lesion. The methods based only on texture analysis are quite sensitive to speckle noise and other artifacts. Therefore, a hybrid feature set is developed after the extraction of shape-based and texture features from the breast lesion. Decision tree, k-nearest neighbor (KNN), and ensemble decision tree model via random under-sampling with Boost (RUSBoost) are utilized to segregate the cancerous lesions from the benign ones. The proposed technique is tested on OASBUD (Open Access Series of Breast Ultrasonic Data) and breast ultrasound (BUS) images collected at Baheya Hospital Egypt (BHE). The OASBUD dataset contains raw ultrasound data obtained from 100 patients containing 52 malignant and 48 benign lesions. The dataset collected at BHE contains 210 malignant and 437 benign images. The proposed system achieved promising accuracy of 97% with confidence interval (CI) of 91.48% to 99.38% for OASBUD and 96.6% accuracy with CI of 94.90% to 97.86% for the BHE dataset using ensemble method.

1. Introduction

Breast cancer (BC) is one of the main reasons of demises among women worldwide [1]. According to the report, one out of twelve female may be affected with BC. The World Health Organization (WHO) documented that, approximately 508,000 women died due to this disease in 2011 [2]. In spite of this, it is possible to defeat breast cancer if it is diagnosed at the initial stage. The mammography imaging is considered an appropriate method of breast cancer diagnosis. However, BUS is employed as the supportive tool to assess mammographic finding, profound masses, indecisive mammograms, and to guide biopsies [3]. BUS is also the main procedure prescribed to assess breast ailments in females especially below the age of 30 due to low cost and easy access that can accurately differentiate those types as compared to other imaging procedures. However, proper diagnosis by BUS involves trained and experienced radiologists to identify dense and cystic breast lesion. The report of radiologist is based on visual inspection of BUS images only, so it is difficult to diagnose it as malignant or benign [4,5]. Furthermore, BUS is operator-dependent modality [5] that comprises artifacts (i.e., speckle noise etc.), which degrades the image quality. CADx systems using machine learning approaches can be developed for diagnosis of breast anomalies and tumors classification. The CADx system assists radiologists in the evaluation of breast abnormalities with reliability and accuracy [6,7]. Furthermore, CADx may avoid extra efforts during an examination and diagnostic errors of physicians that are made due to fatigue and workload.
A CADx framework usually comprises of four important steps: Preprocessing, segmentation, feature extraction, and feature classification [8,9]. The efficacy of a CADx framework depends on employed features that are mostly acquired using expert knowledge.
Several researchers proposed the CADx system to detect breast abnormalities using BUS images. Image segmentation is an essential phase in the CADx framework. Different classical approaches are used for BUS segmentation such as thresholding, region growing, and watershed [10]. Thresholding is the simple and speedy method of segmentation. Threshold value is constant over the whole image in global thresholding whereas, in case of local thresholding, value varies over the local features. Global thresholding cannot perform well in noisy and low contrast images [10]. Shan et al. employed [11] the automatic seed point selection technique for region-growing method. In this method, thresholding was utilized to create a group of selected regions and then tumor region is determined based on local feature, size of region, and location. Watershed is a robust technique and it yields accurate segmentation than region growing and thresholding. However, the main issue faced in watershed is the selection of marker, therefore, Gómez et al. [12] employed internal and external markers using the Beucher gradient [13] to prevent over-segmentation.
The efficacy of CADx framework mostly depends upon the appropriate feature set. A variety of features are obtained from BUS images and classified through machine learning. The available and most employed features are typically separated into two major groups: Texture-based and shape-based [6]. Numerous texture features are utilized for classing ultrasound breast tumors. For example, Yang et al. [14] performed texture analysis of BUS images that applied gray-scale invariant features via ranklet transform and used support vector machine (SVM) to isolate the malignant lesions from benign ones. Shi et al. [15] employed fractal features, textural features from spatial gray-level dependence (SGLD) matrices, and histogram-based features. The stepwise regression is applied for choosing an optimal subset of features and used fuzzy SVM (FSVM), artificial neural network (ANN) and SVM, and claimed that FSVM produces better results. Lo et al. [16] recommended a system to extract grey level co-occurrence matrix (GLCM) features from ranklet BUS images and illustrated that the result is significant in clinical use, but GCM is sensitive to speckle noise and other artifacts in BUS images. Cai et al. [17] suggested phased congruency-based binary pattern (PCBP) that merges the phase congruency (PC) method with that of the local binary pattern (LBP). The features are classified through SVM with the radial basis function (RBF) and showed robustness of the proposed method.
The shape-based features very effectively differentiated various breast lesions in some researches [18,19] and showed that these features are more appropriate for breast tumor detection. The objective of shape-based features is to measure contour and shape characteristics of breast lesions. Typically, ill-defined and irregular boundaries are observed in the case of malignant tumors [9], so the goal of the shape-based features is to evaluate the lesion margin and shape. However, the efficacy of shape-based features relies on a US scanner, the particular view of the lesion, preprocessing technique, and segmentation algorithm [20].
Deep learning is applied in image processing to solve challenging tasks but it needs further improvement to solve all bottlenecks. There are certain bottlenecks where conventional methods with hybrid features demonstrated the capability of a better solution [21]. Recently, deep learning and convolutional neural network (CNN)-based methodologies are employed for benign or malignant lesion recognition [22,23], but the cost of computation complexity of these methods are a major barriers in clinical applications [24,25]. To overcome this barrier, researchers have been considering various methods to lessen the time and cost related with deep learning application. The advent of deep learning may launch several ways to do something with conventional methods to overwhelm the various challenges deep learning brings (e.g., time, accuracy, computing power, quantity, and characteristics inputs) [21].
In CADx systems, the main focus is the automatic finding and classification of breast lesion. Imaging modalities supporting the type of texture features [14,15,16] and shape-based features [18,19,20] have been employed to identify breast cancer ailment. However, it is still a difficult job to choose appropriate features for finding cancer at its early stage [19]. The goal of our research to enhance classification accuracy, and specially:
  • To develop an automated CADx system to detect breast cancer accurately.
  • To introduce the marker-controlled watershed transformation for efficient segmentation.
  • To extract hybrid feature set incorporating both shape-based and texture features to describe lesions in detail and to overcome the limitations of texture-based methods for BUS images.
To give an understanding of the proposed methodology, the rest of this article is structured in the following sections. Section 2 illustrates the framework of the proposed method, implementation procedure, including description of the employed dataset, segmentation technique, and feature extraction method. Section 3 reports the experimental results and their discussion are presented in Section 4. Section 5 concludes the work with future direction.

2. Proposed Methods

Figure 1 illustrates the proposed CADx system for quantitative ultrasound breast images. The proposed technique is comprised of four phases: Preprocessing to build ultrasound B-mode images from raw ultrasonic data using the Hilbert transform for OASBUD dataset. The second phase is segmentation to extract a lesion part from background tissues; which is performed through robust segmentation technique called marker-controlled watershed transformation. The third phase is feature extraction which is significant for lesion categorization. There are many successful features employed. However, some features are commonly invalid for BUS images. For example, tumors are mostly dark and difficult to find through BUS images and compact detail is required for detection, therefore the hybrid-based approach of shape and texture has been proposed. Finally, the proposed system is validated through different classifiers such ensemble, decision tree, and KNN to differentiate between benign and malignant tumor.

2.1. Image Database

In this research work, a database called OASBUD [26] and BUS images collected at BHE [27] are used. The OASBUD dataset includes raw ultrasound data obtained from 100 patients with 52 malignant breast lesions and 48 benign lesions. For every breast lesion, two perpendicular scans (transverse and longitudinal) were performed. Region of interest (ROI) was outlined by an expert radiologist to specify the lesion part for each scan. The OASBUD dataset comprised of different fields as shown in Table 1 [26].

2.2. Preprocessing

As the OASBUD dataset comprises raw data, we have used the method proposed earlier [26] to rebuild B-mode images from the raw data. First, the envelope of ultrasonic echoes is computed through the Hilbert transform and then log-compressed to 50 dB dynamic range that reconstructs the B-mode (brightness) image, which is a two-dimensional image comprised of bright dots signifying the ultrasound echoes. Figure 2 displays an image of a tumor rebuilt from RF (radio-frequency) data.

2.3. Segmentation

Image segmentation is a very crucial stage that is the key to obtain effective outcomes in medical imaging. The segmentation step can dispose of the subjectivity of human-drawn restrictions [28]. Basically, segmentation divides an image into homogenous regions to precisely detect the contours (e.g., breast lesions) of the regions. Assortment techniques are proposed for image segmentation but none of those can produce a result with better quality for all image types (for example, mammography, positron emission tomography (PET), computed tomography (CT), magnetic resonance imaging (MRI), and ultrasound) [29]. Thus, there is no single segmentation method, which is universally accepted for all kinds of procedures. In the proposed method, marker-controlled watershed transformation [30] has been recommended for tumor segmentation because it performs well to achieve reliable performance in the case of quantitative BUS image having speckle noise, low contrast, and weakly defined [12].
Watershed Transformation
Watershed transformation is a region-based method [31]. The idea of watershed is based on visualizing an image into its topographical view. Figure 3 illustrates the topographic view of an ultrasound image in which the bright portions cover “high” altitudes and dark contain “low” altitudes. Every local minima of an image includes a hole through which water is supplied into various catchment basins. The catchment basins are filled up with water starting from the bottom (i.e., minima of lowest intensity), and continue to high until water level attains the top peak in the topography. Dams are built to avoid the joining of water originating from two or more local minima. Consequently, the topography is divided into regions, also called catchment basins parted by dams, known as watershed lines or watersheds. The quantity of objects that are n outcome of segmentation relies upon the quantity of local minima that occurs in the image. Thus, the existence of several local minima in the object creates the problem of over-segmentation.
Over-segmentation is the main drawback of watershed transformation. It creates many regions owing to the existence of specious minima. To avoid this dilemma, automatic markers are introduced earlier and they are employed inside the required regions. This process dictates the flow of water only to the basins related to each marker. This method is referred to as marker-controlled watershed transformation and it is a powerful technique for breast lesions segmentation [12,31].
The marker image is a binary image also named as marker function, comprising of either single or big marker regions along with true logical variables. Each marker specifies a particular position inside the segmentation function [12] that impose a region to be global minima of the topography through the minima imposition method [33]. Therefore, this technique eliminates all inappropriate minima that are not associated to the marked areas. The minima imposition is a morphological operator [33] and more detail about it can be found in the paper [12]. At last, the potential lesion margins are acquired by calculating the watershed transformation of the minima- imposed image, shown in Figure 4.

2.4. Feature Extraction

After the segmentation of breast lesion from the background, attributes or characteristics are then exploited to recognize a lesion as benign or malignant. Feature extraction is a basic step to obtain the lesion properties which can differentiate this lesion from others. The efficient method of feature extraction can properly extract the features from the segmented image to facilitate and simplify the task of classifiers and more precise results can be achieved. Good features should have uniqueness, integrity, agility, abstractness, and invariance under the geometric structure [34]. It is essential to choose appropriate features and perform their correct assessment for the detection of malignancy [35]. In the feature extraction techniques, two types of image features are frequently used: Texture features and shape features. In this research work, GLCM and shape-based features are extracted.
Grey Level Co-Occurrence Matrix (GLCM) Features
It is a popular technique to represent textural features [36] because it enhances the image details and provides a good interpretation. GLCM is based on 2nd order statistical features by counting the dependency of two brightness values. GLCM is a tabulation of various combinations of gray levels occurrence in an image. GLCM extracts features in two steps; formation of co-occurrence matrix formation and calculating the texture features. GLCM computes the association between two neighboring values through displacement d (distance to the next neighbor usually is equal to one) and angles θ = ( 0 ° , 45 ° , 90 ° , and 135 ° ) representing horizontal, diagonal, vertical, and antidiagonal orientation, respectively as shown in Figure 5. Different statistical features are then extracted from co-occurrence matrix including energy ( f 1 ) , contrast ( f 2 ) , correlation ( f 3 ) , variance ( f 4 ) , sum average ( f 5 ) , inverse difference moment ( f 6 ) , entropy ( f 7 ) , sum variance ( f 8 ) , sum entropy ( f 9 ) , difference entropy ( f 10 ) , difference variance ( f 11 ) , information measure of correlation 1 ( f 12 ) and information measure of correlation 2 ( f 13 ) [36], dissimilarity ( f 14 ) , autocorrelation ( f 15 ) , maximum probability ( f 16 ) , cluster shade ( f 17 ) , cluster prominence ( f 18 ) [37], inverse difference moment normalized ( f 19 ) and inverse difference normalized ( f 20 ) [38] as illustrated in Table 2.
Shape-Based Features
Visual features of lesions are known as shape attributes or features. Such as, triangular shape or circular shapes, diameter of the boundary or perimeter of the border of object etc. Shape attributes can be partitioned into two groups; one is based on the object boundary, and the second is based on features of region. Various features of the shape can be computed including area, which is the number of pixels of the breast lesion and the perimeter of tumor is obtained by computing total number of boundary pixels around the lesion. Usually the lesion features such as area, perimeter, minor axis, and major axis cannot be employed independently for lesion classification. Such features are affected by the size of the lesion. Apart from that, features that are scale-invariant such as elongation, compactness, rectangularity, solidity, roundness, eccentricity, and convexity can be obtained and employed. Formulas used to extract shape features are documented in Table 3 [39].
Illustration of some features such as area, major axis, and minor axis are shown in Figure 6 for more clarity. GLCM uses the arrangement of grey levels and their positions; its statistical characteristics allow faster identification of several varieties of lesions whereas, shape-based features contain all significant properties of an object in a compact descriptor. Employing only one feature type limits the depiction power of the object in terms of classification performance. Their combination constructs a distinct descriptor of feature presenting an optimum representation of the lesion. Thus, texture feature along with the shape-based features documented in Table 4, are merged to develop a hybrid feature set. The hybrid feature set is the concatenation of the GLCM and shape-based features, to provide a total of 27 features.

2.5. Classification

After extraction of the hybrid feature set, classification is done to segregate the benign lesions from the malignant ones. The well-known classifiers, such as KNN, decision tree (DT), and ensemble classifier were applied with different parameters to attain promising accuracy as analyzed below. We analyzed longitudinal views and transverse views separately and applied 10-fold cross-validation [40] for each view. In 10-fold cross-validation technique, the classifier is trained on the union of a 9-fold dataset and the test is performed on remaining dataset. This process continues until the last iteration and computes the overall classification accuracy of 10-fold cross-validation technique. Binary classification generates four possible products that are true positive (TP), true negative (TN), false positive (FP), and false negative (FN). TP is significant when the malignant lesion is recognized by the model as malignant whereas TN arises when a tumor belongs to benign and is recognized as benign. On the other side, FP occurs when a tumor is benign and it is recognized as malignant whereas FN arises when a particular tumor is malignant and is recognized as benign that is presented in Figure 7.
The performance of the proposed model can be calculated through standard matrices containing specificity, accuracy, sensitivity, positive predictive value (PPV), and negative predictive value (NPV).

3. Results

In this study, we have used the OASBUD dataset [26]. This dataset involves 52 malignant and 48 benign images. Each lesion of the OASBUD dataset consists of two scans, one is longitudinal and other is transverse. A second dataset of BUS images collected at BHE [27] included 210 malignant and 437 benign lesions is also employed. The segmented part in the breast lesion exposes cancer affected portion. This malignant region is required to be extracted from the background tissues. Watershed transformation algorithm based on the marker function and minima imposition is used for the segmentation of the BUS images. The hybrid feature set of GLCM and shape-based were used to assess our model for correct classification of breast lesion. The outcomes are concisely reviewed based on the classification performance and the corresponding results based on longitudinal and transvers scan of the OASBUD dataset are given in Table 5 and Table 6, respectively. While the results of BUS images collected at BHE are reported in Table 7.
As shown in Table 5, Table 6 and Table 7, all statistical measures of diagnostic are estimates and presented with confidence intervals [41]. The 97% accuracy with CI of 91.48% to 99.38% is achieved through the proposed system using hybrid features when used with ensemble classifier for longitudinal scan of the OASBUD dataset is presented in Table 5. It illustrates the impact of hybrid features for correct classification of lesion. Furthermore, high values of specificity (97.87% with CI of 88.71% to 99.95%) and sensitivity (96.23% with CI of 87.02% to 99.54%) specify that the hybridization of GLCM and shape-based features has successfully managed to indicate a more distinct judgment between the binary sets, such as benign and malignant lesions. In Table 6, accuracy of 95% with CI of 88.72% to 98.36% using transvers scan is observed.
Similarly, Table 7 showed highest accuracy of (96.6% with CI of 94.90% to 97.86%) using BUS images collected at BHE. Furthermore, high values of specificity (97.70% with CI of 95.81% to 98.89%) and NPV (97.25% with CI of 95.34% to 98.40%) are observed.
As shown in Table 5, Table 6 and Table 7, the ensemble classifier offered superior results as compared to others. The prime reason is that ensemble algorithm uses multiple learning technologies to attain an enhanced discriminative power than any other single classifier [42]. The ensemble decision tree classifiers with RUSBoost have a good result due to limited generalization error in the large number of growing trees in the classifiers. RUSBoost employs a mix of random under sampling and a boosting method to improve performance [43]. Ensemble classifiers are better; however, it consumes more time as compared to decision tree and KNN.
It should be noted that the results also depend upon the nature of the domain, dataset size, and selection of features. Certain classifiers perform well on some type of applications or data better than the others. The overall classification accuracy of employed classifiers, that is, ensemble, KNN, and decision tree have been observed to be 97%, 94%, and 88% for longitudinal scan and 95%, 93%, and 85% for transverse scan of the OASBUD dataset, respectively as shown in Figure 8. Similarly, the observed classification accuracy of BUS images collected at BHE using ensemble, decision tree, and KNN are 96.6%, 95.83%, and 95.36%, respectively. The results prove that the ensemble classifier decision tree model through RUSBoost offers the best accuracy among all the employed classifiers.

4. Discussion

BUS images encompass speckle noise and other artifacts. Artifacts of BUS images are created via the characteristics of ultrasound themselves. They are categorized into four major classes: Degraded images, missing structures, mis-registered location, and falsely perceived objects [44]. Usually, artifacts may produce unnecessary clinical intervention. Such factors also reduce the efficacy of CADx framework that employs texture-based feature extraction. Some texture extraction techniques such as LBP and GLCM are very sensitive to the abovementioned factors. Therefore, only texture analysis does not accurately portray the lesions of BUS images. As shape-based features describes the properties of a lesion in a compact manner. In this paper, hybrid feature set is employed to produce more robust features to accurately characterize breast lesions and to enhance the performance of CADx system.
Furthermore, ensemble classifiers enable a combination of individual classifiers, for example decision trees to boost the predictive capability. Decision trees are mostly suitable for the ensemble method since they are quick. Multiple learners are employed, weighted, and then merged in ensemble method to get a better result of classifier to individual classifiers. This method takes the ‘wisdom of the crowd’ idea in that individuals will pursue and evaluate numerous estimations before formulating a main decision. The most accepted case of selection for ensemble is boosting. It is a common technique for enhancing the operation of a weak learner for example, decision tree [43].
In this paper, we concentrated on enhancing the efficacy of CADx system. In the research community, several techniques and methodologies have been proposed to isolate a benign lesion from malignant one accurately. A comparative analysis has been provided in Table 8 presenting the performance benchmark of the proposed system. Nugroho et al. [45] proposed active contours without edges for segmentation, texture, and geometry analysis are performed for feature extraction and achieved 91.3% accuracy using SVM. Moon et al. [46] employed fuzzy c-means clustering for segmentation, feature analysis was done using echogenicity and morphology, and accomplished 92.50% sensitivity for malignant lesion using the binary logistic regression technique. B. Singh et al. [47] performed shape-based analysis and achieved 84.6% accuracy using ANN. The proposed CADx system delineates better performance due to better segmentation, hybrid features extraction for the shape, and size of lesions and ensemble method of classification as compared to [45,46,47] in terms of accuracy and sensitivity.

5. Conclusions and Future Directions

For an effective diagnosis of breast lesions using ultrasonography images, the expert radiologists mostly search for disrupting attributes in the lesion, which can be characterized as appropriate discrepancies in the normal patterns. Such kind of intelligence is difficult to transfer to an automatic system due to inconsistencies in breast tissues, i.e., presence of speckle noise and weakly defined edges. In order to deal with such scenarios, a hybrid feature vector of texture and shape-based features is used to capture all significant and optimal properties of a lesion. Furthermore, the segmentation of lesion is performed through marker-controlled watershed transformation to avoid over-segmentation. The significance of the proposed model is evaluated on the OASBUD dataset and BUS images collected at BHE. The research work has revealed that the employment of ensemble model with multiple learning classifiers was able to achieve 97% accuracy to improve performance of CADx system.
The limitation of this study is not to employ a feature selection process as such by keeping in view that it may compromise accuracy of the critical task of lesion detection. Our future research will concentrate on the categorization of ultrasonography image in accordance to multiclass BI-RADS level.

Author Contributions

T.S. and A.H. proposed the idea and conceptualization; T.S., A.M., and M.H. performed data analysis, experimentation and scientific discussions, and prepared the original draft; A.H., S.A.K., and M.A. supervised the work, as well as validated the findings; S.H., S.Y., and M.A. helped in revision and organization of the paper. Further, S.H. and S.Y. also supported in funding acquisition. All authors have read and agreed to the published version of the manuscript.

Funding

The work reported in this paper was supported by the National Natural Science Foundation of China under grant number 61672080.

Conflicts of Interest

The authors declare no conflict of interest.

Data Sharing and Availability

The authors confirm that the data generated or analyzed and supporting the findings of this study are available within the article.

References

  1. Jemal, A.; Bray, F.; Center, M.M.; Ferlay, J.; Ward, E.; Forman, D. Global cancer statistics. Cancer J. Clin. 2011, 61, 69–90. [Google Scholar]
  2. Laderian, B.; Fojo, T. CDK4/6 Inhibition as a therapeutic strategy in breast cancer: Palbociclib, ribociclib, and abemaciclib. Semin. Oncol. 2017, 44, 395–403. [Google Scholar] [PubMed]
  3. Kelly, K.M.; Dean, J.; Comulada, W.S.; Lee, S.J. Breast cancer detection using automated whole breast ultrasound and mammography in radiographically dense breasts. Eur. Radiol. 2010, 20, 734–742. [Google Scholar] [PubMed] [Green Version]
  4. Timmers, J.M.H.; van Doorne-Nagtegaal, H.J.; Verbeek, A.L.M.; Den Heeten, G.J.; Broeders, M.J.M. A dedicated BI-RADS training programme: Effect on the inter-observer variation among screening radiologists. Eur. J. Radiol. 2012, 81, 2184–2188. [Google Scholar]
  5. Calas, M.J.G.; Almeida, R.M.V.R.; Gutfilen, B.; Pereira, W.C.A. Intraobserver interpretation of breast ultrasonography following the BI-RADS classification. Eur. J. Radiol. 2010, 74, 525–528. [Google Scholar]
  6. Cheng, H.D.; Shan, J.; Ju, W.; Guo, Y.; Zhang, L. Automated breast cancer detection and classification using ultrasound images: A survey. Pattern Recognit. 2010, 43, 299–317. [Google Scholar]
  7. Flores, W.G.; de Albuquerque Pereira, W.C.; Infantosi, A.F.C. Improving classification performance of breast lesions on ultrasonography. Pattern Recognit. 2015, 48, 1125–1136. [Google Scholar]
  8. Sadad, T.; Munir, A.; Saba, T.; Hussain, A. Fuzzy C-means and region growing based classification of tumor from mammograms using hybrid texture feature. J. Comput. Sci. 2018, 29, 34–45. [Google Scholar]
  9. Jalalian, A.; Mashohor, S.B.; Mahmud, H.R.; Saripan, M.I.B.; Ramli, A.R.B.; Karasfi, B. Computer-aided detection/diagnosis of breast cancer in mammography and ultrasound: A review. Clin. Imaging 2013, 37, 420–426. [Google Scholar]
  10. Xian, M.; Zhang, Y.; Cheng, H.D.; Xu, F.; Zhang, B.; Ding, J. Automatic breast ultrasound image segmentation: A survey. Pattern Recognit. 2018, 79, 340–355. [Google Scholar]
  11. Shan, J.; Cheng, H.D.; Wang, Y. A novel automatic seed point selection algorithm for breast ultrasound images. In Proceedings of the International Conference on Pattern Recognition, Tampa, FL, USA, 8–11 December 2008. [Google Scholar]
  12. Gomez, W.; Leija, L.; Alvarenga, A.V.; Infantosi, A.F.C.; Pereira, W.C.A. Computerized lesion segmentation of breast ultrasound based on marker-controlled watershed transformation. Med. Phys. 2010, 37, 82–95. [Google Scholar] [PubMed]
  13. Rivest, J.-F.; Soille, P.; Beucher, S. Morphological gradients. In Nonlinear Image Processing III; SPIE: Bellingham, WA, USA, 1992; pp. 139–150. [Google Scholar]
  14. Yang, M.C.; Moon, W.K.; Wang, Y.C.F.; Bae, M.S.; Huang, C.S.; Chen, J.H.; Chang, R.F. Robust texture analysis using multi-resolution gray-scale invariant features for breast sonographic tumor diagnosis. IEEE Trans. Med. Imaging 2013, 32, 2262–2273. [Google Scholar]
  15. Shi, X.; Cheng, H.D.; Hu, L.; Ju, W.; Tian, J. Detection and classification of masses in breast ultrasound images. Digit. Signal Process. 2010, 20, 824–836. [Google Scholar]
  16. Lo, C.M.; Chang, R.F.; Huang, C.S.; Moon, W.K. Computer-aided diagnosis of breast tumors using textures from intensity transformed sonographic images. In IFMBE Proceedings; Springer: Basel, Switzerland, 2015; Volume 47, pp. 124–127. [Google Scholar]
  17. Cai, L.; Wang, X.; Wang, Y.; Guo, Y.; Yu, J.; Wang, Y. Robust phase-based texture descriptor for classification of breast ultrasound images. Biomed. Eng. Online 2015, 14, 26. [Google Scholar]
  18. Nemat, H.; Fehri, H.; Ahmadinejad, N.; Frangi, A.F.; Gooya, A. Classification of breast lesions in ultrasonography using sparse logistic regression and morphology-based texture features. Med. Phys. 2018, 45, 4112–4124. [Google Scholar]
  19. Cheng, J.Z.; Chou, Y.H.; Huang, C.S.; Chang, Y.C.; Tiu, C.M.; Chen, K.W.; Chen, C.M. Computer-aided US diagnosis of breast lesions by using cell-based contour grouping. Radiology 2010, 255, 746–754. [Google Scholar]
  20. Rodriguez-Cristerna, A.; Guerrero-Cedillo, C.P.; Donati-Olvera, G.A.; Gómez-Flores, W.; Pereira, W.C.A. Study of the impact of image preprocessing approaches on the segmentation and classification of breast lesions on ultrasound. In Proceedings of the 2017 14th International Conference on Electrical Engineering, Computing Science and Automatic Control, CCE 2017, Mexico City, Mexico, 20–22 October 2017. [Google Scholar]
  21. O’Mahony, N.; Campbell, S.; Carvalho, A.; Harapanahalli, S.; Hernandez, G.V.; Krpalkova, L. Deep Learning vs. Traditional Computer Vision. In Advances in Intelligent Systems and Computing (CVC 2019); Springer: Las Vegas, NV, USA, 2020; pp. 128–144. [Google Scholar]
  22. Byra, M. Discriminant analysis of neural style representations for breast lesion classification in ultrasound. Biocybern. Biomed. Eng. 2018, 38, 684–690. [Google Scholar]
  23. Byra, M.; Galperin, M.; Ojeda-Fournier, H.; Olson, L.; O’Boyle, M.; Comstock, C.; Andre, M. Breast mass classification in sonography with transfer learning using a deep convolutional neural network and color conversion. Med. Phys. 2019, 46, 746–755. [Google Scholar]
  24. Wu, S.; Gao, Z.; Liu, Z.; Luo, J.; Zhang, H.; Li, S. Direct reconstruction of ultrasound elastography using an end-to-end deep neural network. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham, Switzerland, 2018. [Google Scholar]
  25. Kong, B.; Zhan, Y.; Shin, M.; Denny, T.; Zhang, S. Recognizing end-diastole and end-systole frames via deep temporal regression network. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham, Switzerland, 2016; pp. 264–272. [Google Scholar]
  26. Piotrzkowska-Wróblewska, H.; Dobruch-Sobczak, K.; Byra, M.; Nowicki, A. Open access database of raw ultrasonic signals acquired from malignant and benign breast lesions. Med. Phys. 2017, 44, 6105–6109. [Google Scholar]
  27. Al-Dhabyani, W.; Gomaa, M.; Khaled, H.; Fahmy, A. Dataset of breast ultrasound images. Data Br. 2020, in press. [Google Scholar]
  28. Giger, M.L.; Al-Hallaq, H.; Huo, Z.; Moran, C.; Wolverton, D.E.; Chan, C.W.; Zhong, W. Computerized analysis of lesions in US images of the breast. Acad. Radiol. 1999, 6, 665–674. [Google Scholar] [PubMed]
  29. Sharma, N.; Aggarwal, L.M. Automated medical image segmentation techniques. J. Med. Phys. 2010, 35, 3–14. [Google Scholar] [PubMed]
  30. Roerdink, J.; Meijster, A. The Watershed Transform: Definitions, Algorithms and Parallelization Strategies. Fundam. Inform. 2000, 41, 187–228. [Google Scholar]
  31. Gonzalez, R.; Woods, R. Digital Image Processing; Prentice Hall: Upper Saddle River, NJ, USA, 2002. [Google Scholar]
  32. Chen, W.B.; Zhang, X. A new watershed algorithm for cellular image segmentation based on mathematical morphology. In Proceedings of the 2010 International Conference on Machine Vision and Human-Machine Interface, Kaifeng, China, 24–25 April 2010; pp. 653–656. [Google Scholar]
  33. Soille, P. Morphological Image Analysis: Principles and Applications, 2nd ed.; Springer: Berlin, Germany, 2004. [Google Scholar]
  34. Liu, J.; Shi, Y. Image Feature Extraction Method Based on Shape Characteristics and Its Application in Medical Image Analysis. Appl. Inform. Commun. 2011, 224, 172–178. [Google Scholar]
  35. Behnam, H.; Zakeri, F.S.; Ahmadinejad, N. Breast mass classification on sonographic images on the basis of shape analysis. J. Med. Ultrason. 2010, 37, 181–186. [Google Scholar]
  36. Haralick, R.M.; Shanmugam, K.; Dinstein, I.H. Textural Features for Image Classification. IEEE Trans. Syst. Man. Cybern. 1973, SMC-3, 610–621. [Google Scholar]
  37. Soh, L.K.; Tsatsoulis, C. Texture analysis of sar sea ice imagery using gray level co-occurrence matrices. IEEE Trans. Geosci. Remote Sens. 1999, 37, 780–795. [Google Scholar]
  38. Clausi, D.A. An analysis of co-occurrence texture statistics as a function of grey level quantization. Can. J. Remote Sens. 2002, 28, 45–62. [Google Scholar]
  39. Zinovev, D.; Raicu, D.; Furst, J.; Armato, S.G., III. Predicting radiological panel opinions using a panel of machine learning classifiers. Algorithms 2009, 2, 1473–1502. [Google Scholar]
  40. Refaeilzadeh, P.; Tang, L.; Liu, H. Cross Validation, Encyclopedia of Database Systems (EDBS); Arizona State University, Springer: Berlin/Heidelberg, Germany, 2009; pp. 532–538. [Google Scholar]
  41. Fischer, J.E.; Bachmann, L.M.; Jaeschke, R. A readers’ guide to the interpretation of diagnostic test properties: Clinical example of sepsis. Intensive Care Med. 2003, 29, 1043–1051. [Google Scholar]
  42. Rokach, L. Ensemble-based classifiers. Artif. Intell. Rev. 2010, 33, 1–39. [Google Scholar]
  43. Mounce, S.R.; Ellis, K.; Edwards, J.M.; Speight, V.L.; Jakomis, N.; Boxall, J.B. Ensemble Decision Tree Models Using RUSBoost for Estimating Risk of Iron Failure in Drinking Water Distribution Systems. Water Resour. Manag. 2017, 31, 1575–1589. [Google Scholar]
  44. Jankovic, D.; Peng, P.; Sehmbi, H.; Perlas, A. Basics of Ultrasound Imaging. In Regional Nerve Blocks in Anesthesia and Pain Therapy; Springer: Cham, Switzerland, 2015; pp. 27–56. [Google Scholar]
  45. Nugroho, H.A.; Triyani, Y.; Rahmawaty, M.; Ardiyanto, I. Computer Aided Diagnosis using Margin and Posterior Acoustic Featuresfor Breast Ultrasound Images. Telkomnika 2017, 4, 1776–1784. [Google Scholar]
  46. Moon, W.K.; Lo, C.M.; Chen, R.T.; Shen, Y.W.; Chang, J.M.; Huang, C.S. Tumor detection in automated breast ultrasound images using quantitative tissue clustering. Med. Phys. 2014, 41, 042901. [Google Scholar]
  47. Singh, B.K.; Verma, K.; Thoke, A.S. Adaptive gradient descent backpropagation for classification of breast tumors in ultrasound imaging. Procedia Comput. Sci. 2015, 46, 1601–1609. [Google Scholar]
Figure 1. Proposed model.
Figure 1. Proposed model.
Applsci 10 01900 g001
Figure 2. B-mode (brightness) image of the lesion, the image was reconstructed based on the set of radio-frequency (RF) signals (510 RF lines, 1824 samples in every line, and penetration depth = 35.76 mm), where white arrows indicate the lesion [26].
Figure 2. B-mode (brightness) image of the lesion, the image was reconstructed based on the set of radio-frequency (RF) signals (510 RF lines, 1824 samples in every line, and penetration depth = 35.76 mm), where white arrows indicate the lesion [26].
Applsci 10 01900 g002
Figure 3. Sight of watershed transformation [32].
Figure 3. Sight of watershed transformation [32].
Applsci 10 01900 g003
Figure 4. (a) Radiologist outlining; (b) computerized segmentation.
Figure 4. (a) Radiologist outlining; (b) computerized segmentation.
Applsci 10 01900 g004
Figure 5. Illustration of grey level co-occurrence matrix (GLCM).
Figure 5. Illustration of grey level co-occurrence matrix (GLCM).
Applsci 10 01900 g005
Figure 6. (a) Area of breast lesion; (b) minor axis and major axis.
Figure 6. (a) Area of breast lesion; (b) minor axis and major axis.
Applsci 10 01900 g006
Figure 7. Confusion matrix [8].
Figure 7. Confusion matrix [8].
Applsci 10 01900 g007
Figure 8. Classification accuracy graph of OASBUD dataset.
Figure 8. Classification accuracy graph of OASBUD dataset.
Applsci 10 01900 g008
Table 1. Structure of Open Access Series of Breast Ultrasonic Data (OASBUD) dataset.
Table 1. Structure of Open Access Series of Breast Ultrasonic Data (OASBUD) dataset.
FieldDescriptionFieldDescription
IdPatient IDRf1First scan
Rf2Second scanRf2Second scan
Roi1ROI for the first scanRoi2ROI for the second scan
biradsCategory of BI-RADSClassMalignant (1), Benign (0)
Table 2. Formula GLCM features.
Table 2. Formula GLCM features.
FeaturesFormulaFeaturesFormula
f 1 u = 0 P g 1 v = 0 P g 1 G d , θ ( u , v ) 2
Where P g represents number of different intensity levels.
f 2 n = 0 P g 1 n 2 { u = 0 P g 1 v = 0 P g 1 G d , θ ( u , v ) }
Where n = | u v |
f 3 u = 0 P g 1 v = 0 P g 1 G d , θ ( u , v ) ( u α x ) ( v α y ) β x β y
Where   α x ,   α y and β x ,     β y are the means and standard deviations of G x and G y
f 4 u = 0 P g 1 v = 0 P g 1 ( u α ) 2 G d , θ ( u , v )
f 5 u = 0 2 ( P g 1 ) u . G ( x + y ) ( u ) f 6 u = 0 P g 1 v = 0 P g 1 1 1 + ( u v ) 2 G d , θ ( u , v )
f 7 u = 0 P g 1 v = 0 P g 1 G d , θ ( u , v ) l o g   ( G d , θ ( u , v ) ) f 8 u = 0 2 ( P g 1 ) ( u f 9 ) 2 G ( x + y ) ( u )
f 9 u = 0 2 ( P g 1 ) G ( x + y ) ( u ) log G ( x + y ) f 10 u = 0 P g 1 G ( x y ) ( u ) log G ( x y ) ( u )
f 11 u = 0 P g 1 ( u f 10 ) 2 G ( x y ) ( u ) where   f 10 = u = 0 P g 1 u . G ( x y ) ( u )
f 12 H U V H U V 1 max ( H U , H V )
Where G x ( u ) = u = 0 P g 1 G d , θ ( u , v ) , G y ( v ) = u = 0 P g 1 G d , θ ( u , v ) ,   H U = u = 0 P g 1 G ( x ) ( u ) log ( G ( x ) ( u ) ) ,
H V = u = 0 P g 1 G ( y ) ( u ) log ( G ( y ) ( u ) ) ,       H U V = u = 0 P g 1 v = 0 P g 1 G d , θ ( u , v ) log ( G d , θ ( u , v ) ) ,
H U V 1 = u = 0 P g 1 v = 0 P g 1 G d , θ ( u , v ) log ( G ( x ) ( u ) G ( y ) ( v ) ) , H U V 2 = u = 0 P g 1 v = 0 P g 1 G ( x ) ( u ) G ( y ) ( v ) log ( G ( x ) ( u ) G ( y ) ( v ) )
f 13   ( 1 exp [ 2 ( H U V 2 H U V ) ] ) 1 2 f 14 u = 0 P g 1 v = 0 P g 1 ( u v ) G d , θ ( u , v )
f 15 u = 0 P g 1 v = 0 P g 1 ( u , v ) G d , θ ( u , v ) f 18 M A X   G d , θ ( u , v )
f 16 u = 0 P g 1 v = 0 P g 1 ( u + v α x   α y ) 3 G d , θ ( u , v ) f 17 u = 0 P g 1 v = 0 P g 1 ( u + v α x   α y ) 4 G d , θ ( u , v )
f 19 u = 0 P g 1 v = 0 P g 1 G d , θ ( u , v ) 1 + ( u v ) 2 ( P g 1 ) 2 f 20 u = 0 P g 1 v = 0 P g 1 G d , θ ( u , v ) 1 + | u v | 2 ( P g 1 ) 2
Table 3. Formula of shape-based features.
Table 3. Formula of shape-based features.
FeatureFormulaDescription
Elongation 1 M i n o r   A x i s   M a j o r   A x i s   Used to measure the object length.
Roundness 4   *   π   *   A r e a ( c o n v e x _ p e r i m e t e r ) 2 Method to show the level of determination lesion.
Solidity A r e a c o n v e x _ a r e a Used to measure the density of the lesion.
Rectangularity A r e a M a j o r   A x i s   *   M i n o r   A x i s Method to explain resemblance of lesion shape with rectangular shape.
Compactness 4   *   π   *   A r e a ( p e r i m e t e r ) 2 The ratio between the lesion area with circle area.
Convexity C o n v e x _ p e r i m e t e r P e r i m e t e r This technique is the perimeter ratio between convex full of lesion and the lesion itself.
Eccentricity M a j o r   A x i s   2 M i n o r   A x i s   2 M a j o r   A x i s   The proportion of distance between the ellipse focal and major axis.
Table 4. Employed features description.
Table 4. Employed features description.
Feature setSizeDescription
GLCM1 × 20Produce texture features based on second order statistical method by co-occurrence matrix
Shape-based1× 7Compute elongation, compactness, rectangularity, solidity, roundness, eccentricity, and convexity
Table 5. Performance based on hybrid feature vector using longitudinal scan of OASBUD dataset.
Table 5. Performance based on hybrid feature vector using longitudinal scan of OASBUD dataset.
Ensemble (Preset: RUSBoosted trees, ensemble method: RUSBoost, learner type: Decision tree, maximum number of splits: 20, number of learners: 30, learning rate: 0.1)
ScanStatisticValue95% of Confidence Interval
LongitudinalAccuracy97.00%91.48% to 99.38%
Sensitivity96.23%87.02% to 99.54%
Specificity97.87%88.71% to 99.95%
PPV98.08%88.00% to 99.72%
NPV95.83%85.51% to 98.90%
KNN (Preset: Weighted KNN, number of neighbors: 10, distance metric: Euclidean distance weight: Squared inverse)
LongitudinalAccuracy94.00%87.40% to 97.77%
Sensitivity92.59%82.11% to 97.94%
Specificity95.65%85.16% to 99.47%
PPV96.15%86.54% to 98.98%
NPV91.67%81.04% to 96.59%
Decision tree (Preset: Complex tree, maximum number of splits: 100, split criterion: Gini’s diversity index)
LongitudinalAccuracy88.00%79.98% to 93.64%
Sensitivity90.00%78.19% to 96.67%
Specificity86.00%73.26% to 94.18%
PPV86.54%76.27% to 92.78%
NPV89.58%78.80% to 95.22%
Table 6. Performance based on hybrid feature vector using transvers scan of OASBUD dataset.
Table 6. Performance based on hybrid feature vector using transvers scan of OASBUD dataset.
Ensemble (Preset: Ensemble method: RUSBoost, RUSBoosted trees, learner type: Decision tree, number of learners: 30, maximum number of splits: 20, learning rate: 0.1)
ScanStatisticValue95% of Confidence Interval
TransverseAccuracy 95.00%88.72% to 98.36%
Sensitivity 92.73%82.41% to 97.98%
Specificity97.78%88.23% to 99.94%
PPV98.08%88.00% to 99.72%
NPV91.67%81.05% to 96.59%
KNN (Preset: Weighted KNN, distance metric: Euclidean, number of neighbors: 10, distance weight: Squared inverse)
TransverseAccuracy93.00%86.11% to 97.14%
Sensitivity90.91%80.05% to 96.98%
Specificity95.56%84.85% to 99.46%
PPV96.15%86.55% to 98.98%
NPV89.58%78.81% to 95. 21%
Decision tree (Preset: Complex tree, Split criterion: Gini’s diversity index, maximum number of splits: 100)
TransverseAccuracy85.00%76.47% to 91.35%
Sensitivity87.76%75.23% to 95.37%
Specificity82.35%69.13% to 91.60%
PPV82.69%72.35% to 89.72%
NPV87.50%76.60% to 93.74%
Table 7. Performance based on hybrid feature vector using breast ultrasound (BUS) images collected at the Baheya Hospital Egypt (BHE).
Table 7. Performance based on hybrid feature vector using breast ultrasound (BUS) images collected at the Baheya Hospital Egypt (BHE).
Ensemble (Preset: RUSBoosted trees, Ensemble method: RUSBoost, learner type: Decision tree, number of learners: 30, maximum number of splits: 20, learning rate: 0.1)
StatisticValue95% of Confidence Interval
Accuracy96.60%94.90% to 97.86%
Sensitivity94.34%90.32% to 97.04%
Specificity97.70%95.81% to 98.89%
PPV95.24%91.55% to 97.36%
NPV97.25%95.34% to 98.40%
Decision tree (Preset: Complex tree, split criterion: Gini’s diversity index, maximum number of splits: 100)
Accuracy95.83%93.99% to 97.23%
Sensitivity92.96%88.65% to 96.01%
Specificity97.24%95.22% to 98.56%
PPV94.29%90.42% to 96.65%
NPV96.57%94.53% to 97.87%
KNN (Preset: Weighted KNN, number of neighbors: 10, distance weight: Squared inverse, distance metric: Euclidean)
Accuracy95.36%93.45% to 96.85%
Sensitivity92.86%88.49% to 95.95%
Specificity96.57%94.40% to 98.07%
PPV92.86%88.76% to 95.54%
NPV96.57%94.53% to 97.86%
Table 8. Comparison with other methods.
Table 8. Comparison with other methods.
MethodsFeature SetClassifierAccuracy (%)Sensitivity (%)Specificity (%)
Proposed techniqueHybrid features Ensemble97.0096.2397.87
Nugroho et al. 2017 [45]Texture and geometry analysisSVM91.3092.0089.60
Moon et al. 2014 [46]Echogenicity and morphologyLogistic regression-92.50-
B. Singh et al. 2015 [47]texture and shape featuresANN84.60

Share and Cite

MDPI and ACS Style

Sadad, T.; Hussain, A.; Munir, A.; Habib, M.; Ali Khan, S.; Hussain, S.; Yang, S.; Alawairdhi, M. Identification of Breast Malignancy by Marker-Controlled Watershed Transformation and Hybrid Feature Set for Healthcare. Appl. Sci. 2020, 10, 1900. https://doi.org/10.3390/app10061900

AMA Style

Sadad T, Hussain A, Munir A, Habib M, Ali Khan S, Hussain S, Yang S, Alawairdhi M. Identification of Breast Malignancy by Marker-Controlled Watershed Transformation and Hybrid Feature Set for Healthcare. Applied Sciences. 2020; 10(6):1900. https://doi.org/10.3390/app10061900

Chicago/Turabian Style

Sadad, Tariq, Ayyaz Hussain, Asim Munir, Muhammad Habib, Sajid Ali Khan, Shariq Hussain, Shunkun Yang, and Mohammed Alawairdhi. 2020. "Identification of Breast Malignancy by Marker-Controlled Watershed Transformation and Hybrid Feature Set for Healthcare" Applied Sciences 10, no. 6: 1900. https://doi.org/10.3390/app10061900

APA Style

Sadad, T., Hussain, A., Munir, A., Habib, M., Ali Khan, S., Hussain, S., Yang, S., & Alawairdhi, M. (2020). Identification of Breast Malignancy by Marker-Controlled Watershed Transformation and Hybrid Feature Set for Healthcare. Applied Sciences, 10(6), 1900. https://doi.org/10.3390/app10061900

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop