Next Article in Journal
BVPs Codes for Solving Optimal Control Problems
Next Article in Special Issue
Comparison and Explanation of Forecasting Algorithms for Energy Time Series
Previous Article in Journal
Multi-Criteria Analysis for Business Location Decisions
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Cancer Cell Profiling Using Image Moments and Neural Networks with Model Agnostic Explainability: A Case Study of Breast Cancer Histopathological (BreakHis) Database

1
Department of Automation and Control Processes, Saint Petersburg Electrotechnical University “LETI”, 197376 Saint Petersburg, Russia
2
Radio Engineering Systems Department, Saint Petersburg Electrotechnical University “LETI”, 197376 Saint Petersburg, Russia
3
Department of Mathematics and Informatics, V. I. Vernadsky Crimean Federal University, 298635 Yalta, Russia
4
Computer Science and Engineering Department, Thapar Institute of Engineering and Technology, Patiala 147004, India
*
Author to whom correspondence should be addressed.
Mathematics 2021, 9(20), 2616; https://doi.org/10.3390/math9202616
Submission received: 27 August 2021 / Revised: 1 October 2021 / Accepted: 14 October 2021 / Published: 17 October 2021
(This article belongs to the Special Issue Application of Mathematical Methods in Artificial Intelligence)

Abstract

:
With the evolution of modern digital pathology, examining cancer cell tissues has paved the way to quantify subtle symptoms, for example, by means of image staining procedures using Eosin and Hematoxylin. Cancer tissues in the case of breast and lung cancer are quite challenging to examine by manual expert analysis of patients suffering from cancer. Merely relying on the observable characteristics by histopathologists for cell profiling may under-constrain the scale and diagnostic quality due to tedious repetition with constant concentration. Thus, automatic analysis of cancer cells has been proposed with algorithmic and soft-computing techniques to leverage speed and reliability. The paper’s novelty lies in the utility of Zernike image moments to extract complex features from cancer cell images and using simple neural networks for classification, followed by explainability on the test results using the Local Interpretable Model-Agnostic Explanations (LIME) technique and Explainable Artificial Intelligence (XAI). The general workflow of the proposed high throughput strategy involves acquiring the BreakHis public dataset, which consists of microscopic images, followed by the application of image processing and machine learning techniques. The recommended technique has been mathematically substantiated and compared with the state-of-the-art to justify the empirical basis in the pursuit of our algorithmic discovery. The proposed system is able to classify malignant and benign cancer cell images of 40× resolution with 100% recognition rate. XAI interprets and reasons the test results obtained from the machine learning model, making it reliable and transparent for analysis and parameter tuning.

1. Introduction

Cancer is a generic word used to describe those diseases which are caused due to the abnormal growth of cells in any part of the body. The transformation of cancer cells is a multi-phase process that starts from pre-cancerous lesions to malignant tumors. The factors that contribute to cancer incidence are the use of alcohol, tobacco, physical inactivity, age, pollution, and some other diseases such as Hepatitis C, Hepatitis B, and HIV. According to the World Health Organization (WHO), there were an estimated 19.3 million new cases and 10 million deaths across the world in the year 2020 [1] caused due to cancer. The GLOBOCAN conjectured that half of the total cases and 58.3% of mortality due to cancer occurs in Asia, followed by Europe and the American regions [2]. The most common cancer is related to the breast, followed by lung, colorectal, prostate, and stomach. Lung cancer tops the list for causing deaths, followed by colorectal, liver, stomach and breast cancer. According to the International Agency for Research on Cancer (IARC), the incidence of cancer for males is higher in comparison to females. The statistics on new cancer cases in [3] during the year 2020 for females is shown in Figure 1.
Breast cancer is one of the commonly occurring cancers in females, leading to the creation of a lump in the breast, discharge of blood from nipples, and change in the shape/texture of the nipples/breast. There is an abnormal growth of cells from the breast to lymph nodes or even to other adjoining parts of the body. Treatment for breast cancer depends on the stage of cancer. To diagnose breast cancer, a biopsy is one of the most reliable sources compared to other sources such as X-ray, MRI, ultrasound. Biopsy is the process of collecting a sample of the suspicious tissue using fine-needle aspiration, surgical incision or core needles that are later sent to the laboratory for analysis. The collected tissue is implanted into a block of paraffin from where tissue sections of 3–5 mm are cut and placed on a glass slide to observe under a microscope. To make the nucleus and cytoplasm visible, the slide is stained with H&E (hematoxylin and eosin) or IHC (immunohistochemical), leading to the formation of cancer cell images. On the analysis of these images, experts classify the tumors as benign and malignant, for which experts need years of medical education. Therefore, in order to make this tedious job easier for the experts, we have taken a small progressive step in the automatic classification of tumors. In this paper, cancer cell images are analyzed using image moments and neural nets to classify the benign and malignant tumors, with an explainability feature added to provide the justification for the decisions on the test data. Contributions of the paper are summarized below:
  • Breast cancer histopathology (BreakHis) images for automatic cancer prediction are studied, which is a challenging task even for trained pathologists and trainees. The proposed system will assist in reducing manual repetitive tasks and increase accuracy using the power of machine learning.
  • Zernike moments are used for feature extraction and vectorization, which are rotation invariant and can be made scale and translation invariant. It is an ideal choice for shape detectors where color does not matter and can be ignored for H&E (hematoxylin and eosin) staining.
  • Artificial Neural Networks (ANN) are used for binary classification, and Explainable Artificial Intelligence (XAI) Local Interpretable Model-Agnostic Explanations (LIME) is used to justify test results visually.
This paper is organized into six sections. Section 1 is the introduction, Section 2 represents the related work, Section 3 presents the background on feature extraction, classifiers and XAI. Section 4 documents the methodology used to classify the cancer images. In Section 5, the experiments and performance analysis are discussed, and finally Section 6 presents the conclusions.

2. Literature Review

Medical imaging is one of the emerging fields that need critical analysis of images due to the complex structure of the tissues of different body parts, which has attracted the researchers to work in this field. The first step in medical imaging is the availability of a dataset on which the proposed approaches can be implemented, which is the toughest and most crucial step. Spanhol et al. [4] has provided a publicly available dataset of 7909 breast cancer histopathological images obtained from 82 patients, named BreaKHis, which contains both benign and malignant images. An accuracy of 80 to 85% is achieved on the application of traditional classification techniques. Spanhol et al. [5] also implemented the convolutional neural network on the textural features extracted from BreaKHis dataset images which outperforms other machine learning approaches. However, the Convolutional Neural Network (CNN) has the drawbacks of longer training time, the requirement of expertise to fine tune the CNN, and an increase in the complexity of development of the system. These drawbacks lead Spanhol et al. [6] to the development of DeCAF features, which are given as input to the previous trained CNN model. This proposed model provides fast development along with high accuracy, providing better results than traditional textural features.
In [7], a breast cancer histopathology image classification by assembling multiple compact CNNs is proposed. Compared to reported breast cancer recognition algorithms that are evaluated on the publicly available BreaKHis dataset, the proposed hybrid model achieves comparable or better performance, indicating the potential of combining both local model and global model branches.
Komura and Ishikawa [8] have discussed the challenges associated with histopathological images and analyzed the different machine learning techniques applicable to these images, specifying the solution with the use of deep learning for the problems specific to this analysis. Koelzer et al. [9] illustrate the application of machine learning and artificial intelligence in the field of immune-oncology. Robertson et al. [10] has discussed in detail the image processing techniques, machine learning approaches and deep learning architectures available for the classification of malignant tumors present in the breast histopathology images with the advancement in technology. Li et al. [11] discussed the process of image analysis of cervical histopathology images using various machine vision approaches.
Bychkov et al. [12] combined convolutional and recurrent neural networks and trained to classify the colorectal cancer from the 420 patient’s tumor images. This proposed approach directly provides the outcome without any medical classification. Couture et al. [13] also used deep learning for the classification of 571 breast cancer tumor images for tumor grade, estrogen receptor (ER) status, histologic subtype and rate of recurrence rate. The accuracy achieved for tumor grade is 82%, for ER status 84% and risk of recurrence score (ROR-PT) has 75% accuracy. Klein et al. [14] detected ovarian cancer from MALDI images of 20 patients using machine learning techniques, giving the highest accuracy of 85% for CNN classifier. Table 1 shows the comparative analysis for the state-of-art in the literature review.

3. Background

The following subsections explain the background information, including feature extraction using Zernike moments, classification techniques and explainable artificial intelligence.

3.1. Feature Extraction Techniques

Features are used to describe the image characteristics that distinguish any specific object in the image from the other. A number of feature extraction techniques are being defined in the literature and used by other researchers. Some of the feature extraction approaches are discussed in this section.
Local Binary Patterns (LBP) computes the binary distribution patterns in the neighborhood of each pixel with radius R and neighborhood pixels P. The fundamental idea behind the LBP is to assign the pixel value as one if present pixels in the neighborhood have the value superior or equal to the center pixel, otherwise to assign a zero value, which computes the binary pattern for each pixel from the corresponding neighborhood. Completed BLP (CLBP) is the advanced version of the LBP based on the sign, center pixel, and magnitude parameters that are obtained from the local regions. After performing the global thresholding with the threshold value set at the average of the gray values of the whole image, the binary code of the center pixel is coded. The signs and magnitude parameters are computed and, through a specific operator, are coded to binary format so that a CLBP histogram can be formed. The gray-level co-occurrence matrix (GLCM) is used to statistically characterize the texture features of an image by calculating the occurrence of specific pixel values.
Oriented Features from Accelerated Segment Test (FAST) and rotated Binary Robust Independent Elementary Features (BRIEF) is an efficient substitute to Scale Invariant Feature Transform (SIFT) and Speeded up Robust Features (SURF). It is developed in OpenCV, which uses FAST as a key point detector to detect a large number of key points and BRIEF descriptors in the image. From these key points, it uses the Harris corner detector to find the good features from those points. In comparison to SIFT, the computational cost is very low, but it extracts less features. It is also less sensitive to noise. Local Phase Quantization (LPQ) uses the information extracted from the local phase of the 2-D discrete Fourier transform which is calculated for a rectangular size neighborhood. Using binary coding, the coefficients are represented in the form of integer values in the range of 0 to 255. For the classification of cell phenotype images, there is an introduction of a fast and simple morphological measure, Threshold Adjacency Statistics (TAS). The Parameter Free TAS is the parameter-free version of TAS which is based on the concept of calculating the histogram bins and pixels in reference to the presence of white pixel neighbors for multiple threshold versions of images.
There are various feature extraction methods to extract the relevant information for data analysis [15]. Zernike Moments (ZMs) [16] of an image are similar to Discrete Cosine Transform (DCT) coefficients in their derivation and properties. Essentially, ZMs are projections of an image function along the real and imaginary axes (x- and y-axis) which is convolved by an orthogonal function. Thus, they represent an image in various frequency components which are referred to as the orders (along the radial) and repetitions (along the angular direction). Thus, Z0,0 represents the average intensity, Z1,1 represents the first-order moment, and Z2,0 is similar to variance and so on. Zernike polynomials are orthogonal functions that generate an orthogonal set over the unit circle in a complex plane. In general, image moments [17] are weighted averages of the intensity values of image pixels (or a similar image function) to get the scalar quantities for image interpretation. Moments of different order yield varying information about the image, such as area, the center of mass, and orientation. Zernike moments are rotation invariant and can be made translation and scale invariant by a little modification in the formulas [18] and thus are a very powerful technique for image vectorization for shape detectors on grayscale images.
Zernike Polynomials can be defined by V n m x , y   =   R n m ρ e j m θ . Here n,m are whole numbers such that n − |m| is even and mn. ( ρ , θ ) are the radius and angle of the pixel from origin which implies polar coordinate of (x,y) pixel location. The formula for radial polynomial R n , m ρ is defined as follows:
R n , m ρ = k = 0 n m / 2 1 k n k ! k ! n + m 2 k ! n m 2 k ! ρ n 2 k
Z n , m = n + 1 π x = 1 N 1 y = 1 N 1 f x , y R n m ρ e j m θ
Z n , m ρ , θ = R n , m ρ cos(m θ ) and Z n , m ρ , θ = R n , m ρ sin(m θ ). Z n , m ρ , θ is known as even Zernike polynomials and Z n , m ρ , θ as odd Zernike polynomials, ρ is radial distance and its value lies in [0,1]. Also Zernike polynomials have the property that | Z n , m ρ , θ | ≤ 1.

3.2. Classification Techniques

For classification, different classifiers are used. Support Vector Machine (SVM) is the one of the most commonly used classifiers. It is a supervised learning technique used for regression and classification for mainly 2-class classification problems. The main goal of SVM is to find a hyperplane in an N-space which categorizes the data points distinctly. k- Nearest Neighbor (k-NN) classifies the testing data based on similarity measures such as Euclidean distance and Hamilton distance. Quadratic Discriminant Analysis (QDA) is a model which is based on the assumption that each class follows Gaussian distribution. It is a statistical classifier that distinguishes the data points on the basis of a quadratic decision surface. Random Forest (RF) is an ensemble classifier that constructs multiple decision trees based on different conditions at training time. The classification is done on the basis of the output decision taken by the majority trees.
In ANN [19,20] the atomic unit neuron is an information processing and fundamental component of the operation of the network as shown in Figure 2. Let xi be input signals, wij be the respective weights from xi to xj, f(.) is the activation function, Ij is an input signal to a neuron. For example, the signal from xj from the j-th layer to the k-th neuron (xk) and the weight is written as wkj. Second is the adder component which is a linear combinar using a sigma function:
σ (h,j) = ∑m k = 1 wjkh.xk
Ij = f(σ (h,j)),
where f is an activation function.
Figure 3 shows a simple numerical example using three inputs for the sake of simplicity, as our experiments have used 12 ZMs for input features into the ANN.
Third is the activation function which is also known as the limiting function for a particular amplitude range, e.g., [−1,1]. Hidden neural network weights at layer h are expressed as a matrix Whnm. The generic form of hidden layer at layer h can be expressed as:
σ h = [σ h i]{i = 1 to n}Whnm
I = f(σ h) = [Ik]{k = 1 to n}
Similarly, the output layer can be expressed as
O = f(σ o) = [Ok]{k = 1 to n}

3.3. Explainable Artificial Intelligence (XAI) LIME Model

The LIME model provides explanations which are interpretable by humans and are different in representation from the intrinsic actual features used by the model. As an example, in case of text classification, XAI uses a binary vector containing the probabilities of the important words responsible for the underlying decision made by the classifier. This probability-based representation is easier for humans to analyse as compared to actual complex features, which are relevant only to the machine level and incomprehensible for humans, such as word embeddings. In the case of images, we can obtain pixels x(i,j) ∈ Rd where R is the region enclosed by the closed contours in yellow color by the XAI LIME model.
Let m ∈ M be the set of models which can provide interpretations and m can be represented through a domain highlighted with yellow contours. There are two model interpretations for decision explanations (a) local interpretation works at a single point to explain the decision (b) global interpretation works over the whole dataset in entirety for explanation. LIME calculates the contribution of features on the individual test predictions and renders them in a faithful and easy to understand manner. The general equation of LIME and other general explainability techniques [21,22] is defined as follows:
E x = a r g m i n   L o s s p , m , P x + m
where x is the primitive vector for the feature, p is the original predictor, m is the model for the explanation which is linear for LIME, P(x) is the proximity function which covers the local region around the original feature vector x. Further the loss function is defined as:
L o s s p , m , P x = y Y p y m y 2 P y
where Y covers the locality of x, i.e., neighborhood. Let b(x) be a biased classifier and C be a classifier which is unbiased and makes decisions based upon the underlying sensitive attributes and correlated features. Therefore, the classifier which is agnostic and adversarial is defined as A(x) follows. To create A(x) the deciding factor is if x belongs to X or not where X is the distribution where the X is the distribution where the predictions of B(x) are sampled.
A x = { B x ,     i f   x X   C x ,     o t h e r w i s e
Algorithm 1 Algorithm of XAI LIME is summarized as below [23]
Input: data feature vector x, classifier f, number of features m and super-pixels n for granularity to highlight
Output: coefficients of the explainable linear model
STEP 1: y: = f(x) i.e., prediction by f(.) on x
STEP 2: if i > n goto Step 6
STEP 3: p: = randomPickSuperpixel(x) %permute function
STEP 4: observation: = f(p) i.e., predict p on f
STEP 5: distance: = abs(y-observation)
STEP 6: SimilarityScore: = ComputeSimScore(distance)
STEP 7: x: = xPick(SimilarityScore,m,p)
STEP 8: L: = LinearModelFitting (SimilarityScore,m,p)
STEP 9: Return weights obtained by L

4. Methodology

The general flow of the methodology includes image acquisition through the public BreakHis dataset for cancer cells [4]. The dataset contains four resolution options for the cancer cell images, 40×, 100×, 200×, 400×, and four types for each of the malignant and benign classes. For simplicity of illustration, we only considered 40× images from both classes. Afterwards, image moments were extracted from all the images using Zernike moments.
Zernike moments are applied to the given set of images and fed into (i) PCA (Principal Component Analysis) for visualisation as explained in Section 5.3 and (ii) into neural networks for classification and performance analysis as shown in Figure 4. Zernike moments help to generate vectors from the given set of images which are then input into the neural network classification system for training, validation and testing.
The XAI LIME as discussed in Algorithm 1 is applied on the classified images to highlight the appropriate regions which are contributing to the classification decision.

5. Results and Discussion

5.1. Dataset

The cancer cell image samples are obtained through breast tissue upon surgical open biopsy (SOB) procedure stained using eosin and hematoxylin dyes. They belong to the BreakHis [4] dataset. Image acquisition was performed using the Olympus BX-50 microscope system with various magnifications, such as 40×, 100×, 200×, 400× as depicted in Figure 5, but for simplification of the study, we have considered only 40× images from the benign and malign categories for binary classification. The image format was RGB in PNG format. The nomenclature of the image file names is well organised into the following format: <BIOPSY_PROCEDURE> <TUMOR_CLASS> <TUMOR_TYPE> <YEAR> <SLIDE_ID>-<MAGNIFICATION>-<SEQ>. The ratio of benign:malign images for experiments is 625:1370 and the ratio of training:validation:testing for the experiments is 70:15:15. Standard parameters of the neural networks through the MATLAB R2020b have been used.

5.2. Feature Extraction Using Zernike Moments

Setting the order parameter to 5, a total of 12 ZMs were calculated for each of the 625 + 1370 = 1995 images. Zernike moments are shape extractors and only work on grayscale images. Thus, if the given image is colored, then it automatically gets converted into grayscale prior to the moments’ calculation. Zernike moments are able to capture the efficient set of feature vectors which can be observed through PCA application and visualization as shown in Figure 4.

5.3. Data Visualization Using PCA

Principal components (PCs) are orthogonal unit vectors such that ith vector is perpendicular to all existing i-1 vectors and the variance is maximum for the projected data distribution. Figure 6 depicts the orthogonal principal components in the direction of maximum variance.
Let l be the size of vectors with dimensions n and their coefficient weights are defined by V as follows.
V = v 1 , v 2 , v n
It maps each original vector xi into ti, which is known as the principal component scores which are given by t i = t 1 , t l i and t k i is given by
t k i = x i . w k ,   i = 1 : n , k = 1 : l
The first weight will be optimally computed for maximum variance
w 1 = a r g m a x w T X T X w w T w
Thus k-th component can be found as follows and further details are available in [24]:
Y k = X j = 1 k 1 S w j w j T
Machine learning is dependent upon the underlying data distribution. Therefore, the application of variance-based feature transformation Principal Component Analysis (PCA) [25] was applied on the 1995 × 12 to reduce it to two dimensions for plotting as shown in Figure 7. It is evident from the figure that data is non-overlapping and thus separable through non-linear classifiers. Moreover, Figure 8 renders the relative information gain through latent parameters of PCA which means the relative importance of each of the sorted and transformed newly formed PCA features. We can observe that only the first two PCs contain 0.82 + 0.15 = 0.97 of the total information to define the data distribution. Thus, Zernike moments have happened to the quite effective feature extractor in the underlying case. Relative information gain upon application of PCA on 1995 × 12 data obtained through 12 ZMs for each of 1995 images is shown in Figure 7.

5.4. Classification

To classify the malign and benign classes of the given BreakHis [4] public dataset, we applied the Neural Network Pattern Recognition app using MATLAB R2020b software with standard settings. The architecture of the 2-layer feed forward neural network is applied in the proposed methodology, as shown in Figure 2. Training:validation:testing ratios were 70:15:15. This means that the model is not trained on all the data, and only 70% is used for training. Afterward, 15% of the data which is new to the trained model is used for testing. The total data is split into three mutually exclusive parts: (i) training, (ii) validation and (iii) testing, with ratios 70:15:15. Only 70% of the data is used for training and not all. After training is done on 70% of the data, the next 15% is used to validate, until a good validation is found, and the model is well generalized. Then finally, a fresh and untouched 15% data partition is used for testing. This 70:15:15 ratio is totally random, and the data used training vs testing data is not the same.
The total dataset size used was benign: malign as 625:1375 for 40× images only. For training, 10 hidden neurons were used and scaled conjugate back propagation. Hidden layer selection (one hidden layer) is automatically completed using the Pattern Recognition toolbox in MATLAB [26,27].
The MATLAB toolbox automatically generated weight and bias values of the trained neural network model, along with the code, is available in the Appendix A. Figure 9, Figure 10, Figure 11, Figure 12 and Figure 13 show the ROC curves, confusion matrix, error histogram, cross-entropy, gradient and validation checks for training, validation, testing and overall performance which is perfect under the given experimental conditions. It is observed that the potential of the technique overpowers the simple data distribution and thus yields 100% accuracy, which is specified by the ratio of correct predictions to the total number of predictions (total amount of test data).
The ROC curves are used to graphically represent the performance of the classification model being used to classify the input data at different threshold values in terms of true positive and false positive rate. Figure 9 represents the ROC curves for the training, validation, test and all dataset images present in the BreakHis dataset. The ROC curves for training, validation, test and all dataset images represent the 100% sensitivity and 100% specificity leading for the best classification model.
The confusion matrix is the representation of the correctly and falsely classified data by the classification model. Figure 10 represents the confusion matrix for training, validation, testing and overall test data, showing no false positive and false negative classification in any case by the classification model, which indicates an ideal performance.
After training the neural network, an error histogram is plotted that defines the graphical representation of the values obtained from the difference of the predicted values and the target values. Figure 11 shows that the error value for the proposed model is −7.4 × 10−8, illustrating that the ouput values are marginally higher than the targeted value. In classification problems, it is possible to get 100% accuracy when the classifier is able to separate two classes (benign and malignant). However, the error term measures how far the decision boundary is relative from all (or most) of the points, for example, using the Euclidean distance norm. The error is actually zero, but shows very small negative numbers due to digital representation error out of the calculations consistently with the same value −7.4 × 10−8 for all 1995 samples (errors = targets-outputs). It is similar to the IEEE754 standard to represent floating points in the digital memory.
Figure 12 represents the number of iterations for the best validation performance of the proposed model on the basis of the cross-entropy parameter. For the proposed neural network model, the best performance is achieved at 55 epochs, illustrated by a green circle with the minimum cross entropy value. It can be seen from the graph that for training, validation and testing data, as the epochs increase, there is a decrease in the cross entropy.
Figure 13 represents the gradient and validation checks as the training state parameters against the number of iterations. The gradient value achieved at 55 epochs is 9.1623 × 10−7. It has been observed from the graph that there is negligible validation failure during the span of 55 epochs.

5.5. Discussion

A summary of the results comparison has been rendered in Figure 14, in which the proposed setup outperforms the techniques studied in [4] by the owners of the BreakHis dataset. This does not mean that the proposed setup is perfect, but only implies that Zernike moments are precise and concise shape feature extractors followed by a simple and powerful neural network pattern recognition system. The complete training, validation and testing of the system does not take more than a few seconds of time and thus also suggests that hand crafted features and image moments must be explored before employing deep learning techniques which are computationally expensive, and an intensive hardware setup is needed for GPU.
Figure 14 states the recognition rates by various machine learning techniques consisting of features such as LBP, CLBP, GLCM, ORB, LPQ, PFTAS and classifiers SVM, 1-Nearest Neighbor (1-NN), QDA, RF [4]. PFTAS + QDA [4] has the recognition rate of 83.3 ± 4.1 and in [7] the best recognition rate on the BreakHis dataset for the technique based on assembling multiple compact CNNs is 85.7 ±1.9. The proposed technique, which is a combination of Zernike moments and Artificial Neural Network, is superior with 100% recognition for this dataset.
Machine learning models work like black boxes. Understanding the reasons for the test result is a basic requirement to rely on, perform analysis, and retrain or even deploy modifications in the model. XAI provides this trust in a machine learning model, and LIME [21] is one of the models which is easy to understand and simple to implement. Model agnostic means that LIME can be applied to any model to obtain explanations by perturbing the input data (images, in our case). To explain the image classification, the LIME model draws a mask of yellow pixels to highlight the image segment that a model focuses on the most to make the decision, as shown in Figure 15. These yellow regions contribute the most to explain the semantic visual analysis for the users and performance analysis. In Figure 15, the number of input features has been set to 150 for the LIME model to pick the most significant features. After increasing features to more than 150, the yellow-marked explanations become more granular or sandy. This value (150) has been found through repetitive empirical analysis.

6. Conclusions

Cancer tissues in the case of breast and lung cancer datasets are quite challenging to examine for manual expert analysis of patients suffering from cancer. The proposed automatic breast cancer cell image analysis system has been studied on the public BreakHis dataset [4]. It has been observed that manual examinations performed by histopathologists for cell profiling are time-consuming and require continuous concentration by the experts, which is expensive and requires years of study. In this paper, automatic analysis of cancer cell analysis was proposed with algorithmic and soft-computing techniques to leverage speed and reliability. Only 40× image categories from the BreakHis dataset with binary classes (malignant and benign) were considered, and it was found that by using Zernike moments and neural networking classification, the data distribution is perfectly separable. The underlying assumptions involve clear and preprocessed labeled data with a clear distinction between malignant and benign cell images, as with the BreakHis dataset. Moreover, the total data size was moderate, with 625:1370 for benign:malign classes, making the problem simpler for classification. The model agnostic technique (XAI-LIME) has been used to justify the test results of input images by highlighting the major regions due to which the machine learning algorithm has come up with the test decision.
The future plan is to consider original datasets with a larger size, more image moment techniques and other classification algorithms to explore the problem dimensions. We also plan to set up an image with expert linguistic cues to assist image features, using an additional modality to propose a multimodal learning system with image and text features.

Author Contributions

Conceptualization, H.S.P., A.G., D.K. and A.K.; Methodology, P.C., N.O., A.K., A.G. and H.S.P.; Software, D.K. and H.S.P.; Validation, A.G. and H.S.P.; Formal analysis, D.K., A.K. and H.S.P.; Investigation, A.G., H.S.P., P.C. and N.O.; Resources, A.G. and H.S.P.; Data curation, P.C., N.O., H.S.P. and A.G.; Writing—original draft preparation, H.S.P. and A.G.; Writing—review and editing, D.K., A.K. and H.S.P.; Visualization, P.C., N.O. and A.G.; Supervision, D.K.; Project administration, P.C. and H.S.P.; Funding acquisition, A.K. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Ministry of Science and Higher Education of the Russian Federation by the Agreement № 075-15-2020-933 dated 13.11.2020 on the provision of a grant in the form of subsidies from the federal budget for the implementation of state support for the establishment and development of the world-class scientific center «Pavlov center «Integrative physiology for medicine, high-tech healthcare, and stress-resilience technologies».

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

We used the open BreakHis dataset. https://web.inf.ufpr.br/vri/databases/breast-cancer-histopathological-database-breakhis/ (accessed on 26 August 2021). The only condition for using it is citing [4] (in our reference list).

Acknowledgments

Authors are thankful for the BreakHis labelled dataset collected by Fabio A. Spanhol, Luiz S. Oliveira, Caroline Petitjean, and Laurent Heutte.

Conflicts of Interest

The authors declare no conflict of interests.

Appendix A. Neural Network Weights and Bias Values

  • weight and bias values:
    •     IW: {2x1 cell} containing 1 input weight matrix
    •     LW: {2x2 cell} containing 1 layer weight matrix
    •      b: {2x1 cell} containing 2 bias vectors
Figure A1. The automatically generated weight and bias values of the trained neural network model.
Figure A1. The automatically generated weight and bias values of the trained neural network model.
Mathematics 09 02616 g0a1
  • MATLAB code:
  • % Solve a Pattern Recognition Problem with a Neural Network
  • % Script generated by Neural Pattern Recognition app
  • % Created 20 September 2021 15:40:17
  • %
  • % This script assumes these variables are defined:
  • %
  • % ZMs-input data. (Zernike moments 625x12 sick and 1370x12 healthy vectors)
  • % Y-target data. (625x1 zeros, 1370x1 ones)
  • x = ZMs’;
  • t = Y’;
  • % Choose a Training Function
  • % For a list of all training functions type: help nntrain
  • % ‘trainlm’ is usually fastest.
  • % ‘trainbr’ takes longer but may be better for challenging problems.
  • % ‘trainscg’ uses less memory. Suitable in low memory situations.
  • trainFcn = ‘trainscg’; % Scaled conjugate gradient backpropagation.
  • % Create a Pattern Recognition Network
  • hiddenLayerSize = 10;
  • net = patternnet(hiddenLayerSize, trainFcn);
  • % Choose Input and Output Pre/Post-Processing Functions
  • % For a list of all processing functions type: help nnprocess
  • net.input.processFcns = {‘removeconstantrows’,’mapminmax’};
  • % Setup Division of Data for Training, Validation, Testing
  • % For a list of all data division functions type: help nndivision
  • net.divideFcn = ‘dividerand’; % Divide data randomly
  • net.divideMode = ‘sample’; % Divide up every sample
  • net.divideParam.trainRatio = 70/100;
  • net.divideParam.valRatio = 15/100;
  • net.divideParam.testRatio = 15/100;
  • % Choose a Performance Function
  • % For a list of all performance functions type: help nnperformance
  • net.performFcn = ‘crossentropy’; % Cross-Entropy
  • % Choose Plot Functions
  • % For a list of all plot functions type: help nnplot
  • net.plotFcns = {‘plotperform’,’plottrainstate’,’ploterrhist’, ...
    • ‘plotconfusion’, ‘plotroc’};
  • % Train the Network
  • [net,tr] = train(net,x,t);
  • % Test the Network
  • y = net(x);
  • e = gsubtract(t,y);
  • performance = perform(net,t,y)
  • tind = vec2ind(t);
  • yind = vec2ind(y);
  • percentErrors = sum(tind ~= yind)/numel(tind);
  • % Recalculate Training, Validation and Test Performance
  • trainTargets = t .* tr.trainMask{1};
  • valTargets = t .* tr.valMask{1};
  • testTargets = t .* tr.testMask{1};
  • trainPerformance = perform(net,trainTargets,y)
  • valPerformance = perform(net,valTargets,y)
  • testPerformance = perform(net,testTargets,y)
  • % View the Network
  • view(net)
  • % Plots
  • % Uncomment these lines to enable various plots.
  • %figure, plotperform(tr)
  • %figure, plottrainstate(tr)
  • %figure, ploterrhist(e)
  • %figure, plotconfusion(t,y)
  • %figure, plotroc(t,y)
  • % Deployment
  • % Change the (false) values to (true) to enable the following code blocks.
  • % See the help for each generation function for more information.
  • if (false)
    • % Generate MATLAB function for neural network for application
    • % deployment in MATLAB scripts or with MATLAB Compiler and Builder
    • % tools, or simply to examine the calculations your trained neural
    • % network performs.
    • genFunction(net,’myNeuralNetworkFunction’);
    • y = myNeuralNetworkFunction(x);
  • end
  • if (false)
    • % Generate a matrix-only MATLAB function for neural network code
    • % generation with MATLAB Coder tools.
    • genFunction(net,’myNeuralNetworkFunction’,’MatrixOnly’,’yes’);
    • y = myNeuralNetworkFunction(x);
  • end
  • if (false)
    • % Generate a Simulink diagram for simulation or deployment with.
    • % Simulink Coder tools.
    • gensim(net);
  • end

References

  1. Cancer, WHO. Available online: https://www.who.int/news-room/fact-sheets/detail/cancer (accessed on 7 May 2021).
  2. Bray, F.; Ferlay, J.; Soerjomataram, I.; Siegel, R.L.; Torre, L.A.; Jemal, A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA A Cancer J. Clin. 2018, 68, 394–424. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  3. GLOBOCAN 2020: New Global Cancer Data. Available online: https://www.uicc.org/news/globocan 2020 new global cancer data (accessed on 7 May 2021).
  4. Spanhol, F.A.; Oliveira, L.S.; Petitjean, C.; Heutte, L. A Dataset for Breast Cancer Histopathological Image Classification. IEEE Trans. Biomed. Eng. 2016, 63, 1455–1462. [Google Scholar] [CrossRef] [PubMed]
  5. Spanhol, F.A.; Oliveira, L.S.; Petitjean, C.; Heutte, L. Breast cancer histopathological image classification using Convolutional Neural Networks. In Proceedings of the 2016 International Joint Conference on Neural Networks (IJCNN), Vancouver, BC, Canada, 24–29 July 2016; pp. 2560–2567. [Google Scholar] [CrossRef]
  6. Spanhol, F.A.; Oliveira, L.S.; Cavalin, P.R.; Petitjean, C.; Heutte, L. Deep features for breast cancer histopathological image classification. In Proceedings of the 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Banff, AB, Canada, 5–8 October 2017; pp. 1868–1873. [Google Scholar] [CrossRef]
  7. Zhu, C.; Song, F.; Wang, Y.; Dong, H.; Guo, Y.; Liu, J. Breast cancer histopathology image classification through assembling multiple compact CNNs. BMC Med. Inf. Decis. Mak. 2019, 19, 198. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  8. Komura, D.; Ishikawa, S. Machine learning methods for histopathological image analysis. Comput. Struct. Biotechnol. J. 2018, 16, 34–42. [Google Scholar] [CrossRef] [PubMed]
  9. Koelzer, V.H.; Sirinukunwattana, K.; Rittscher, J.; Mertz, K.D. Precision immunoprofiling by image analysis and artificial intelligence. Virchows Arch. 2019, 474, 511–522. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  10. Robertson, S.; Azizpour, H.; Smith, K.; Hartman, J. Digital image analysis in breast pathology—From image processing techniques to artificial intelligence. Transl. Res. 2018, 194, 19–35. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  11. Chen, L.; Chen, H.; Li, X.; Xu, N.; Sun, H. A review for cervical histopathology image analysis using machine vision approaches. Artif. Intell. Rev. 2020, 53, 4821–4862. [Google Scholar]
  12. Linder, N.; Turkki, R.; Nordling, S.; Kovanen, P.E.; Verrill, C.; Walliander, M.; Lundin, M.; Haglund, C.; Lundin, J. Deep learning based tissue analysis predicts outcome in colorectal cancer. Sci. Rep. 2018, 8, 1–11. [Google Scholar]
  13. Couture, H.D.; Williams, L.A.; Geradts, J.; Nyante, S.J.; Butler, E.N.; Marron, J.S.; Perou, C.M.; Troester, M.A.; Niethammer, M. Image analysis with deep learning to predict breast cancer grade, ER status, histologic subtype, and intrinsic subtype. NPJ Breast Cancer 2018, 4, 1–8. [Google Scholar] [CrossRef] [Green Version]
  14. Klein, O.; Kanter, F.; Kulbe, H.; Jank, P.; Denkert, C.; Nebrich, G.; Schmitt, W.D.; Wu, Z.; Kunze, C.A.; Sehouli, J.; et al. MALDI-imaging for classification of epithelial ovarian cancer histotypes from a tissue microarray using machine learning methods. PROTEOMICS–Clin. Appl. 2019, 13, 1700181. [Google Scholar] [CrossRef] [Green Version]
  15. Brezočnik, L.; Fister, I.; Podgorelec, V. Swarm intelligence algorithms for feature selection: A review. Appl. Sci. 2018, 8, 1521. [Google Scholar] [CrossRef] [Green Version]
  16. Khotanzad, A.; Hong, Y.H. Invariant image recognition by Zernike moments. IEEE Trans. Pattern Anal. Mach. Intell. 1990, 12, 489–497. [Google Scholar] [CrossRef] [Green Version]
  17. Teague, M.R. Image analysis via the general theory of moments. JOSA 1980, 70, 920–930. [Google Scholar] [CrossRef]
  18. Singh, C.; Upneja, R. Error analysis in the computation of orthogonal rotation invariant moments. J. Math. Imag. Vis. 2014, 49, 251–271. [Google Scholar] [CrossRef]
  19. Du, K.L.; Swamy, M.N.S. Neural Networks and Statistical Learning; Springer: London, UK, 2019; p. 988. ISBN 978-1-4471-7452-3. [Google Scholar]
  20. Malik, H.; Singh, M. Comparative study of different neural networks for 1-year ahead load forecasting. In Applications of Artificial Intelligence Techniques in Engineering; Springer: Singapore, 2019; pp. 31–42. [Google Scholar]
  21. Ribeiro, M.T.; Singh, S.; Guestrin, C. Why should i trust you? Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; ACM: New York, NY, USA, 2016. [Google Scholar]
  22. Siddharth, S.; Omare, N.; Shukla, K.K. An Approach to identify Captioning Keywords in an Image using LIME. In Proceedings of the 2021 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS), Greater Noida, India, 19–20 February 2021. [Google Scholar]
  23. Arun, D.; Rad, P. Opportunities and challenges in explainable artificial intelligence (xai): A survey. Arxiv Prepr. Arxiv. 2020, 2006, 11371. [Google Scholar]
  24. Markus, R. What is principal component analysis? Nat. Biotechnol. 2008, 26, 303–304. [Google Scholar]
  25. Svante, W.; Esbensen, K.; Geladi, P. Principal component analysis. Chemom. Intell. Lab. Syst. 1987, 2, 37–52. [Google Scholar]
  26. Baptista, F.; Darío, S.R.; Fernando, M.-D. Performance comparison of ANN training algorithms for classification. In Proceedings of the 2013 IEEE 8th International Symposium on Intelligent Signal Processing, Funchal, Portugal, 16–18 September 2013. [Google Scholar]
  27. Dudzik, M. Towards characterization of indoor environment in smart buildings: Modelling PMV index using neural network with one hidden layer. Sustainability 2020, 12, 6749. [Google Scholar] [CrossRef]
Figure 1. Cancer statistics of new cases for females in year 2020.
Figure 1. Cancer statistics of new cases for females in year 2020.
Mathematics 09 02616 g001
Figure 2. Architecture of the 2-layer feed forward neural network having sigmoid hidden layer and softmax output neurons as output layer applied for the proposed approach.
Figure 2. Architecture of the 2-layer feed forward neural network having sigmoid hidden layer and softmax output neurons as output layer applied for the proposed approach.
Mathematics 09 02616 g002
Figure 3. Mathematical model of a simple ANN example with 3 inputs to demonstrate the implementation (Our technique has 12 inputs from ZMs but only 3 inputs have been rendered for the sake of simplicity of representation).
Figure 3. Mathematical model of a simple ANN example with 3 inputs to demonstrate the implementation (Our technique has 12 inputs from ZMs but only 3 inputs have been rendered for the sake of simplicity of representation).
Mathematics 09 02616 g003
Figure 4. Graphical representation of the framework of proposed technique for cancer cell segmentation.
Figure 4. Graphical representation of the framework of proposed technique for cancer cell segmentation.
Mathematics 09 02616 g004
Figure 5. Sample images of the public BreakHis dataset at different available levels of magnification (40×, 100×, 200×, 400×).
Figure 5. Sample images of the public BreakHis dataset at different available levels of magnification (40×, 100×, 200×, 400×).
Mathematics 09 02616 g005
Figure 6. The arrows signify eigenvectors (of covariance matrix) scaled by the square root of eigenvalues and their origin is at the mean.
Figure 6. The arrows signify eigenvectors (of covariance matrix) scaled by the square root of eigenvalues and their origin is at the mean.
Mathematics 09 02616 g006
Figure 7. First two principal components of 12 Zernike moments (image features). The data clusters are quite separable (nonlinearly) which explains the good accuracy reached by 2-layer feed forward ANN on the given data distribution.
Figure 7. First two principal components of 12 Zernike moments (image features). The data clusters are quite separable (nonlinearly) which explains the good accuracy reached by 2-layer feed forward ANN on the given data distribution.
Mathematics 09 02616 g007
Figure 8. Relative information gain for principal components. Since 5–12 are quite small, they are not shown in the figure. Values of PCs are 0.82, 0.15, 0.03, 0.01 and approximately 0, respectively.
Figure 8. Relative information gain for principal components. Since 5–12 are quite small, they are not shown in the figure. Values of PCs are 0.82, 0.15, 0.03, 0.01 and approximately 0, respectively.
Mathematics 09 02616 g008
Figure 9. ROC curves of the training, validation, test and all data images in the BreakHis dataset using 40× images for malignant and benign classes.
Figure 9. ROC curves of the training, validation, test and all data images in the BreakHis dataset using 40× images for malignant and benign classes.
Mathematics 09 02616 g009
Figure 10. Confusion matrix for training, validation, testing and overall test data.
Figure 10. Confusion matrix for training, validation, testing and overall test data.
Mathematics 09 02616 g010
Figure 11. Error histogram for training, validation and testing showing no error on the given data distribution using proposed technique.
Figure 11. Error histogram for training, validation and testing showing no error on the given data distribution using proposed technique.
Mathematics 09 02616 g011
Figure 12. Cross-entropy versus epochs for validation performance analysis.
Figure 12. Cross-entropy versus epochs for validation performance analysis.
Mathematics 09 02616 g012
Figure 13. Graphs for gradient and validation failure versus epochs. Most of the time there was no validation failure during all 55 epoch trials.
Figure 13. Graphs for gradient and validation failure versus epochs. Most of the time there was no validation failure during all 55 epoch trials.
Mathematics 09 02616 g013
Figure 14. Comparison of cancer recognition rates among the state-of-art and the proposed technique.
Figure 14. Comparison of cancer recognition rates among the state-of-art and the proposed technique.
Mathematics 09 02616 g014
Figure 15. Input malignant image in 40× zoom and its explanation through XAI LIME model for the proposed model. The yellow highlighted segments in the image (on the right) mark the most relevant regions or features which are responsible for the decision made by machine learning model.
Figure 15. Input malignant image in 40× zoom and its explanation through XAI LIME model for the proposed model. The yellow highlighted segments in the image (on the right) mark the most relevant regions or features which are responsible for the decision made by machine learning model.
Mathematics 09 02616 g015
Table 1. Comparative analysis table for the state-of-art in the literature review. The proposed technique (ZM + ANN) is relatively more accurate, uses a larger dataset, is faster in computation and simple in implementation.
Table 1. Comparative analysis table for the state-of-art in the literature review. The proposed technique (ZM + ANN) is relatively more accurate, uses a larger dataset, is faster in computation and simple in implementation.
Ref.YearDataset/Type of ImagesContributionPros/Cons
[4]2016Breast cancer histopathological images BreaKHis datasetBreaKHis dataset collection from 7909 breast cancer histopathology images acquired on 82 patientsMax accuracy is 85% using basic image features and traditional machine learning methods.
[5]2016BreaKHis datasetApplied CNN on the collected BreaKHis datasetAccuracy improved slightly than [4] but still room for enhancement
[6]2017BreaKHis datasetIn between solution of [4,5] i.e., using deep features to input for the classifiersBetter accuracy than [4] and in some cases of [5]
[7]2019BreaKHis datasetAssembling multiple CNNs and embedding Squeeze Excitation Pruning (SEP) block to remove redundant channels and reduce overfittingTime consuming as compared to traditional supervised ML models
[8]2018Histopathological imagesReview article about digital pathology with machine learningLimitations are color variation, artefacts, intensity variations, multiple magnification levels to select from
[9]2019Colorectal cancerPrecision immunoprofiling by image analysis and artificial intelligenceAdvanced image analysis and AI techniques should be explored
[10]2018Breast cancerAI and deep learning techniques have been used for diagnostic breast pathologyContribution is to identify patterns not visible for the eye of a pathologist or so called ‘imaging biomarkers’ using deep learning
[11]2020Colorectal adenocarcinoma tissue A review of 1988–2020 for cervical histopathology image analysis using machine vision New AI and image processing algorithms should be explored. Model performance is based upon underlying data distribution.
[12]2018Colorectal cancer Tissue analysis in colorectal cancer using deep learningDirectly predict patient outcome with AUC = 0.69, without any intermediate tissue classification; samples from 420 patients
[13]2018Breast cancerBreast cancer grade prediction using image analysis and deep learningDuctal vs. Lobular (94% accuracy), limitation is that dataset size was small
[14]2019Ovarian cancer from MALDI imagesTissue microarray (TMA) for ovarian cancer histotypes using MLCNN and NN are suitable for epithelial ovarian cancer (EOC), sensitivity (69–100%) and specificity (90–99%)
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Kaplun, D.; Krasichkov, A.; Chetyrbok, P.; Oleinikov, N.; Garg, A.; Pannu, H.S. Cancer Cell Profiling Using Image Moments and Neural Networks with Model Agnostic Explainability: A Case Study of Breast Cancer Histopathological (BreakHis) Database. Mathematics 2021, 9, 2616. https://doi.org/10.3390/math9202616

AMA Style

Kaplun D, Krasichkov A, Chetyrbok P, Oleinikov N, Garg A, Pannu HS. Cancer Cell Profiling Using Image Moments and Neural Networks with Model Agnostic Explainability: A Case Study of Breast Cancer Histopathological (BreakHis) Database. Mathematics. 2021; 9(20):2616. https://doi.org/10.3390/math9202616

Chicago/Turabian Style

Kaplun, Dmitry, Alexander Krasichkov, Petr Chetyrbok, Nikolay Oleinikov, Anupam Garg, and Husanbir Singh Pannu. 2021. "Cancer Cell Profiling Using Image Moments and Neural Networks with Model Agnostic Explainability: A Case Study of Breast Cancer Histopathological (BreakHis) Database" Mathematics 9, no. 20: 2616. https://doi.org/10.3390/math9202616

APA Style

Kaplun, D., Krasichkov, A., Chetyrbok, P., Oleinikov, N., Garg, A., & Pannu, H. S. (2021). Cancer Cell Profiling Using Image Moments and Neural Networks with Model Agnostic Explainability: A Case Study of Breast Cancer Histopathological (BreakHis) Database. Mathematics, 9(20), 2616. https://doi.org/10.3390/math9202616

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop