Detection and Classification of Overlapping Cell Nuclei in Cytology Effusion Images Using a Double-Strategy Random Forest
Abstract
:1. Introduction
2. Image Acquisition and Dataset Description
3. Methodology
3.1. Preprocessing
3.2. Nuclei Segmentation
3.3. Post-Processing
3.4. Feature Extraction
3.5. Classification
3.6. Performance Assessment
- TruePositive denotes the number of overlapping nuclei correctly detected as overlapping nuclei.
- TrueNegative represents the number of single nuclei correctly classified as a single nucleus.
- FalsePositive is the number of single nuclei wrongly classified as overlapping nuclei
- FalseNegative is the number of overlapping nuclei missed by our method.
4. Experiment Results and Discussions
5. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Pleural Effusion. Available online: https://en.wikipedia.org/wiki/Pleural_effusion (accessed on 4 September 2018).
- Lee, Y.C.; Light, R.W. Management of malignant pleural effusions. Respirology 2004, 9, 148–156. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Heffner, J.E.; Klein, J.S. Recent advances in the diagnosis and management of malignant pleural effusions. Mayo Clin. Proc. 2008, 83, 235–250. [Google Scholar] [CrossRef]
- Kushwaha, R.; Shashikala, P.; Hiremath, S.; Basavaraj, H.G. Cells in pleural fluid and their value in differential diagnosis. J. Cytol. 2008, 25, 138–143. [Google Scholar] [CrossRef]
- Cytology Exam of Pleural Fluid. Available online: https://www.ucsfbenioffchildrens.org/tests/003866.html (accessed on 5 July 2018).
- Irshad, H.; Veillard, A.; Roux, L.; Racoceanu, D. Methods for nuclei detection, segmentation, and classification in digital histopathology: A review—Current status and future potential. IEEE Rev. Biomed. Eng. 2014, 7, 97–114. [Google Scholar] [CrossRef] [PubMed]
- Doi, K. Computer-aided diagnosis in medical imaging: Historical review, current status and future potential. Comput. Med. Imaging Graph. 2007, 31, 198–211. [Google Scholar] [CrossRef] [PubMed]
- Malpica, N.; de Solorzano, C.O.; Vaquero, J.J.; Santos, A.; Vallcorba, I.; García-Sagredo, J.M.; Del Pozo, F. Applying watershed algorithms to the segmentation of clustered nuclei. Cytometry 1997, 28, 289–297. [Google Scholar] [CrossRef]
- Yang, X.; Li, H.; Zhou, X. Nuclei segmentation using marker-controlled watershed, tracking using mean-shift, and Kalman filter in time-lapse microscopy. IEEE Trans. Circuits Syst. I Regul. Pap. 2006, 53, 2405–2414. [Google Scholar] [CrossRef]
- Yeo, T.T.E.; Jin, X.C.; Ong, S.H.; Sinniah, R. Clump splitting through concavity analysis. Pattern Recognit. Lett. 1994, 15, 1013–1018. [Google Scholar] [CrossRef]
- Bai, X.; Sun, C.; Zhou, F. Splitting touching cells based on concave points and ellipse fitting. Pattern Recognit. 2009, 42, 2434–2446. [Google Scholar] [CrossRef]
- Kumar, S.; Ong, S.H.; Ranganath, S.; Ong, T.C.; Chew, F.T. A rule-based approach for robust clump splitting. Pattern Recognit. 2006, 39, 1088–1098. [Google Scholar] [CrossRef]
- Wang, H.; Zhang, H.; Ray, N. Clump splitting via bottleneck detection and shape classification. Pattern Recognit. 2012, 45, 2780–2787. [Google Scholar] [CrossRef]
- Tafavogh, S.; Catchpoole, D.R.; Kennedy, P.J. Non-parametric and integrated framework for segmenting and counting neuroblastic cells within neuroblastoma tumor images. Med. Boil. Eng. Comput. 2013, 51, 645–655. [Google Scholar] [CrossRef] [PubMed]
- Tafavogh, S.; Catchpoole, D.R.; Kennedy, P.J. Cellular quantitative analysis of neuroblastoma tumor and splitting overlapping cells. BMC Bioinform. 2014, 15, 272. [Google Scholar] [CrossRef] [PubMed]
- Abbas, N.; Abdullah, A.H.; Mohamad, Z.; Altameem, A. Clustered red blood cell splitting via boundary analysis in microscopic thin blood smear digital images. Int. J. Technol. 2015, 3, 306–317. [Google Scholar] [CrossRef]
- Guven, M.; Cengizler, C. Data cluster analysis-based classification of overlapping nuclei in Pap smear samples. Biomed. Eng. Online 2014, 13, 159. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Guerra, L.; McGarry, L.M.; Robles, V.; Bielza, C.; Larranaga, P.; Yuste, R. Comparison between supervised and unsupervised classifications of neuronal cell types: A case study. Dev. Neurobiol. 2011, 71, 71–82. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Alparslan, E.; Fuatince, M. Image enhancement by local histogram stretching. IEEE Trans. Syst. Man Cybern. 1981, 11, 376–385. [Google Scholar]
- Pizer, S.M.; Amburn, E.P.; Austin, J.D.; Cromartie, R.; Geselowitz, A.; Greer, T.; Haar Romeny, B.; Zimmerman, J.B.; Zuiderveld, K. Adaptive histogram equalization and its variations. Comput. Vis. Graph. Image Process. 1987, 39, 355–368. [Google Scholar] [CrossRef]
- Zuiderveld, K. Contrast limited adaptive histogram equalization. In Graphics Gems IV; Academic Press Professional, Inc.: San Diego, CA, USA, 1994; pp. 474–485. [Google Scholar]
- Sreng, S.; Maneerat, N.; Isarakorn, D.; Pasaya, B.; Takada, J.I.; Panjaphongse, R.; Varakulsiripunth, R. Automatic exudate extraction for early detection of Diabetic Retinopathy. In Proceedings of the 2013 International Conference on Information Technology and Electrical Engineering (ICITEE), Yogyakarta, Indonesia, 7–8 October 2013; pp. 31–35. [Google Scholar]
- Choi, W.J.; Choi, T.S. Automated pulmonary nodule detection system in computed tomography images: A hierarchical block classification approach. Entropy 2013, 15, 507–523. [Google Scholar] [CrossRef]
- Oswal, V.; Belle, A.; Diegelmann, R.; Najarian, K. An entropy-based automated cell nuclei segmentation and quantification: Application in analysis of wound healing process. Comput. Math. Methods Med. 2013, 2013, 592790. [Google Scholar] [CrossRef] [PubMed]
- Shannon, C.E.; Weaver, W. The Mathematical Theory of Communication; University of Illinois Press: Urbana, IL, USA, 1949. [Google Scholar]
- Wong, A.K.; Sahoo, P.K. A gray-level threshold selection method based on maximum entropy principle. IEEE Trans. Syst. Man Cybern. 1989, 19, 866–871. [Google Scholar] [CrossRef]
- Soille, P. Morphological Image Analysis: Principles and Applications; Springer Science & Business Media: Berlin, Germany, 2013. [Google Scholar]
- Srinivasan, G.N.; Shobha, G. Statistical texture analysis. World Acad. Sci. Eng. Technol. 2008, 36, 1264–1269. [Google Scholar]
- Kam, H.T. Random decision forest. In Proceedings of the Third International Conference on Document Analysis and Recognition, Montreal, QC, Canada, 14–16 August 1995; pp. 14–18. [Google Scholar]
- Ho, T.K. The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 1998, 20, 832–844. [Google Scholar]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Kursa, M.B. Robustness of Random Forest-based gene selection methods. BMC Bioinform. 2014, 15, 8. [Google Scholar] [CrossRef] [PubMed]
- Genuer, R.; Poggi, J.M.; Tuleau-Malot, C. Variable selection using random forests. Pattern Recognit. Lett. 2010, 31, 2225–2236. [Google Scholar] [CrossRef] [Green Version]
- Díaz-Uriarte, R.; De Andres, S.A. Gene selection and classification of microarray data using random forest. BMC Bioinform. 2006, 7, 3. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Suna, G.; Lia, S.; Caoa, Y.; Lang, F. Cervical cancer diagnosis based on random forest. Int. J. Performabil. Eng. 2017, 13, 446–457. [Google Scholar] [CrossRef]
- Krishnaiah, V.; Narsimha, D.G.; Chandra, D.N.S. Diagnosis of lung cancer prediction system using data mining classification techniques. Int. J. Comput. Sci. Inf. Technol. 2013, 4, 39–45. [Google Scholar]
- Zhu, W.; Zeng, N.; Wang, N. Sensitivity, specificity, accuracy, associated confidence interval and ROC analysis with practical SAS implementations. In Proceedings of the NESUG Proceedings: Health Care and Life Sciences, Baltimore, MD, USA, 14–17 November 2010; p. 67. [Google Scholar]
- Loong, T.W. Understanding sensitivity and specificity with the right side of the brain. BMJ 2003, 327, 716–719. [Google Scholar] [CrossRef] [PubMed]
- Fawcett, T. An introduction to ROC analysis. Pattern Recognit. Lett. 2006, 27, 861–874. [Google Scholar] [CrossRef]
- Zhang, H. The optimality of naive Bayes. In Proceedings of the Seventeenth International Florida Artificial Intelligence Research Society Conference, Miami Beach, FL, USA, 12–14 May 2004. [Google Scholar]
- Shmilovici, A. Support vector machines. In Data Mining and Knowledge Discovery Handbook; Springer: Boston, MA, USA, 2009; pp. 231–247. [Google Scholar]
- Sutton, O. Introduction to k Nearest Neighbor Classification and Condensed Nearest Neighbour Data Reduction; University Lectures; University of Leicester: Leicester, UK, 2012. [Google Scholar]
- Rokach, L.; Maimon, O.Z. Data Mining with Decision Trees: Theory and Applications; World Scientific: Singapore, 2008; Volume 69. [Google Scholar]
No. | Feature Name | Description |
---|---|---|
1. | Area (A) | It is represented as the actual number of pixels inside the nucleus region. |
2. | Perimeter (P) | This is measured by computing the total number of pixels on the nucleus edge. |
3. | Roundness | This is defined by , which represents the similarity between the nucleus region and a circle. It varies between 0 and 1 and a circle’s roundness circularity is equal to 1. |
4. | Solidity | This specifies the proportion of the pixels in the convex hull that is also in the nucleus region. It is formulated as; . |
5. | Equivalent Circular Diameter (EDC) | This is defined as the diameter of a circle with the same area as the nucleus region. It is represented using; . |
6. | Compactness | This specifies the ratio of area and square of the perimeter. It is computed as . |
7. | Eccentricity | This represents the eccentricity of the ellipse that has the same second-moments as the nucleus region. Its value is between 0 and 1. A cell whose eccentricity is 0 is a circle, while 1 is a line segment. |
8. | Local minima | This represents the number of local minimum points in the nucleus region. |
9. | Aspect ratio of the nucleus: | This is represented by the ratio of nucleus width to nucleus height using; . |
10. | Major Axis | This represents the length (in pixels) of the major axis of the ellipse that has the same normalized second central moments as the nucleus region. |
11. | Minor Axis | This specifies the length (in pixels) of the minor axis of the ellipse that has the same normalized second central moments as the nucleus region. |
12. | Elongation | This is represented by the ratio between the major and minor axis using; . |
13. | Actual Diameter (AD) | This is represented by the circle’s diameter circumscribing the nucleus region. It is formulated as; . |
14. | ECD to AD | It is defined as; . |
15. | Convex Area | This represents the number of pixels in the convex nucleus. |
16. | Number of local minima | This is measured by counting the number of local minima in the nucleus region. |
No. | Feature Name | Description |
---|---|---|
1. | Mean | This represents the mean gray values of the nucleus region. |
2. | Standard deviation | This specifies the deviation of gray values of the nucleus region. |
3. | Smoothness | This specifies the local variation in radius lengths of the nucleus region. |
4. | Variance | This is represented using the variance value of the gray values inside the nucleus region. |
5. | Skewness | This defines the skewness of gray values of the nucleus region. |
6. | Kurtosis | This specifies the kurtosis of gray values of the nucleus region. |
7. | Energy | This is represented by the energy of gray values of the nucleus region. |
8. | Entropy | This specifies the entropy of gray values of the nucleus region. |
9. | Entropy | Entropy of entropy filtered image. |
10. | Entropy | Entropy of standard deviation filtered image. |
Double-Strategy RF Algorithm Steps |
---|
1. Prepare training and testing datasets (80–20% ratio) |
2. Train an RF classifier using all features on the training dataset. |
3. Select the most important features. |
4. Create a new ‘selected featured’ dataset containing only those features. |
5. Train a second classifier on this new dataset. |
6. Test the new data using the trained RF classifier. |
7. Compare the accuracy of the ‘full featured’ classifier to the accuracy of the ‘selected featured’ classifier. |
Observational Data | Training | Testing | Total |
---|---|---|---|
Single Nuclei | 2692 | 683 | 3375 |
Overlapped Nuclei | 508 | 117 | 625 |
Total | 3200 | 800 | 4000 |
No. | Feature Name | Category |
---|---|---|
1. | Energy | Textural Feature |
2. | Variance | Textural Feature |
3. | Equivalent Circular Diameter to actual diameter | Geometric Feature |
4. | Eccentricity | Geometric Feature |
5. | Ratio between area and perimeter | Geometric Feature |
6. | Entropy of Local standard deviation filtered Image | Textural Feature |
7. | Actual Diameter | Geometric Feature |
8. | Entropy | Textural Feature |
Classifiers | Performance Measures | |||||
---|---|---|---|---|---|---|
Sensitivity | Specificity | Precision | F Score | Accuracy | G Mean | |
NB | 62.07% | 98.68% | 88.89% | 73.10% | 93.38% | 78.26% |
SVM | 78.45% | 97.51% | 84.26% | 81.25% | 94.75% | 87.46% |
KNN | 79.31% | 97.66% | 85.19% | 82.14% | 95.00% | 88.01% |
DT | 66.67% | 97.07% | 79.59% | 72.56% | 92.63% | 80.45% |
RF | 84.48% | 97.51% | 85.22% | 84.85% | 95.63% | 90.77% |
Classifiers | Performance Measures | |||||
---|---|---|---|---|---|---|
Sensitivity | Specificity | Precision | F Score | Accuracy | G Mean | |
NB | 52.14% | 97.51% | 78.21% | 62.56% | 90.88% | 71.30% |
SVM | 93.16% | 97.22% | 85.16% | 88.98% | 96.63% | 95.17% |
KNN | 90.60% | 98.24% | 89.83% | 90.21% | 97.13% | 94.34% |
DT | 65.52% | 98.68% | 89.41% | 75.62% | 93.88% | 80.41% |
RF | 96.58% | 98.68% | 92.62% | 94.56% | 98.38% | 97.63% |
Methodology | Observational Data | Features/Classifiers | Quantitative Results |
---|---|---|---|
Shape classifier using SVM [13] | 4000 nuclei from CPE images | Five size and shape features Support vector machine | F1 score 84.12% |
Accuracy 95.38% | |||
G mean 90.31% | |||
Data clustering-based identification [17] | 4000 nuclei from CPE images | Three shapes and two local minima based features Fuzzy C Mean Clustering | F1 score 62.15% |
Accuracy 88.13% | |||
G mean 78.23% | |||
Proposed Method | 4000 nuclei from CPE images | Four shapes and four textural features Double-strategy random forest | F1 score 94.56% |
Accuracy 98.38% | |||
G mean 97.63% |
Algorithm Steps | Compuatation Time (Seconds) |
---|---|
Nuclei segmentation using maximum entropy thresholding | 2.07 s |
Geometric and textural features extraction | 2.02 s |
Classification using double-strategy RF | 1.07 s |
Entire Algorithm | 5.17 s |
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Win, K.Y.; Choomchuay, S.; Hamamoto, K.; Raveesunthornkiat, M. Detection and Classification of Overlapping Cell Nuclei in Cytology Effusion Images Using a Double-Strategy Random Forest. Appl. Sci. 2018, 8, 1608. https://doi.org/10.3390/app8091608
Win KY, Choomchuay S, Hamamoto K, Raveesunthornkiat M. Detection and Classification of Overlapping Cell Nuclei in Cytology Effusion Images Using a Double-Strategy Random Forest. Applied Sciences. 2018; 8(9):1608. https://doi.org/10.3390/app8091608
Chicago/Turabian StyleWin, Khin Yadanar, Somsak Choomchuay, Kazuhiko Hamamoto, and Manasanan Raveesunthornkiat. 2018. "Detection and Classification of Overlapping Cell Nuclei in Cytology Effusion Images Using a Double-Strategy Random Forest" Applied Sciences 8, no. 9: 1608. https://doi.org/10.3390/app8091608
APA StyleWin, K. Y., Choomchuay, S., Hamamoto, K., & Raveesunthornkiat, M. (2018). Detection and Classification of Overlapping Cell Nuclei in Cytology Effusion Images Using a Double-Strategy Random Forest. Applied Sciences, 8(9), 1608. https://doi.org/10.3390/app8091608