1. Introduction
Breast cancer is the most common female disease in the word, diagnosed primarily in over-fifties women [
1,
2,
3].
Mammography (MX), introduced in the 1960s, plays a pivotal role in cancer screening, detection, and follow-up, despite the availability of various other breast imaging modalities, such as ultrasound (US) and breast Magnetic Resonance Imaging (MRI), due to its properties and qualities [
4,
5].
Contrast-enhanced mammography (CEM) is a diagnostic technique that combines the advantages of standard full-field digital mammography (FFDM) with the intravenous administration of an iodinated contrast medium, highlighting neovascularity associated with actively growing malignancy. The use of contrast agents in cancer detection is based on the phenomenon that neoplasms induce angiogenesis for further tumor growth. Contrast medium can pass through the walls of new rapidly formed vessels into the (tumor) interstitium, causing enhancement.
In addition, dual-energy CEM is a quick and well-tolerated examination that eliminates breast density as a limiting factor when interpreting two-dimensional (2D) mammograms by utilizing a dual-energy acquisition system, which then generates a subtracted image to outline areas of enhancement.
Previous studies have evaluated the sensitivity of CEM compared to conventional digital MX, US, and MRI [
6,
7,
8]. CEM sensitivity has been reported in the range 90–100% [
9], much higher in comparison to MX and US alone [
8]. CEM allows to identify additional occult cancers at mammography, to assess more accurately the disease extent, and to guide surgical and treatment planning [
8,
9,
10,
11].
However, several studies have reported that CEM sensitivity increases compared to conventional MX at the expense of specificity that ranges in the interval of 58–70% [
12]. This issue could be resolved using quantitative feature extracted by CEM images.
Recent significant advancements within the field of medical image analysis relies on the application of Artificial Intelligence (AI) methods for the processing of large quantities of data from different imaging modalities [
13,
14].
Radiomics is the process of extracting quantitative properties, named features, from images. This feature extraction activity is typically realized by means of pattern recognition algorithms and provides, as a result, a set of numbers, each one representing a quantitative description of a specific either geometrical or physical property of the image portion under consideration. In oncological applications, examples of features are tumor size, shape, intensity, and texture, collectively providing a comprehensive tumor characterization, called the radiomics signature of the tumor [
15,
16,
17].
The texture features analysis characterize intensity histogram and the relationships between different intensity levels within lesion; moreover, it describe spatial and spectral frequency patterns [
18,
19,
20].
Several authors proposed radiomics analysis for breast cancer detection and classification, based on lesion heterogeneity information (textural features) and opportune classifiers [
21,
22,
23,
24,
25,
26,
27,
28,
29,
30,
31,
32].
This work has the objective to estimate the diagnostic accuracy of textural features extracted by dual-energy CEM images by carrying out univariate and multivariate statistical analyses, using artificial intelligence approaches in the classification of benign and malignant breast lesions.
2. Methods
2.1. Patient Selection
From October 2017 to April 2018, 80 consecutive patients with known unifocal and or multifocal/multicentric breast lesions (mean age ± standard deviation of 50.7 ± 11.6 years (range 26–78)) were enrolled in this prospective study according to regulations issued by the local Institutional Review Board. All women gave their written informed consent for research purposes. Inclusion criteria: patient with known breast lesions, histologically proven, and patient underwent dual-energy CEM examination in both craniocaudal (CC) and mediolateral oblique (MLO) projections.
Exclusion criteria were breast implants, pregnancy or possible pregnancy, a contraindication to the intravenous use of iodinated or gadolinium-chelated contrast agent (e.g., allergy to either agent or severely impaired renal function), inability to keep upright immobility during the examination, neoadjuvant chemotherapy treatment, hormone treatment (the most common forms of hormone therapy for breast cancer work by blocking estrogen hormones from attaching to receptors on cancer cells or by decreasing the body’s production of hormones) or radiation therapy at the time of imaging.
Overall, 104 suspected breast lesions of 80 enrolled patients were analyzed. A total of 19 patients were excluded: 6 for contraindication to the intravenous use of iodinated or gadolinium-chelated contrast agent; 4 for neoadjuvant chemotherapy treatment; 4 for breast implants; 2 for inability to keep upright immobility during the examination; and 3 for hormone treatment or radiation therapy.
2.2. Imaging Protocol
All CEM examinations were performed with a dual-energy mammography system (Hologic’s Selenia® Dimensions® Unit, Bedford, MA, USA). Approximately two minutes after an intravenous injection of 1.5 mL/kg body weight of iodinated contrast medium (Visipaque 320; GE Healthcare, Inc, Princeton, NJ, USA) at a rate of 2–3 mL/s, each woman was properly positioned in a CC view.
Two images were obtained per each breast through a low-energy exposure (26–31 kVp, according to breast thickness and density) and a high-energy exposure (45–49 kVp), straddling the k-edge of iodine. The low-energy image served as the equivalent of a two-dimensional FFDM examination, which was previously demonstrated [
33,
34,
35]. The high-energy image is not diagnostic and is used for post-processing purposes. The total X-ray dose delivered to the patient for a pair of low- and high-energy images was estimated to be between 0.7 and 3.6 mGy, depending on breast thickness (30 to 80 mm) and tissue composition (0 to 100% glandular tissue); it is approximately 1.2-times the dose delivered in a standard single-view digital mammography [
33,
34,
35].
The two previously acquired images are then digitally subtracted of each other to produce a resultant recombined image that highlights contrast enhancement uptake area and gives functional information [
10], as it can be seen in
Figure 1. Four and eight minutes after contrast agent administration, each breast were compressed in the MLO view: early-MLO and late-MLO view images were respectively acquired (
Figure 1).
2.3. Histopathological Analysis
The reference standard was histopathologic examination. Overall, 104 samples of 80 patients were analyzed histopathologically: 65 surgical specimens for malignant lesions and 39 samples (surgical specimen or fine needle aspiration cytology (FNAC), core or Tru-Cut needle biopsy, and vacuum assisted breast biopsy (VABB)) for benign lesions. Histopathological examination was performed before CEM examination. This procedure did not change the shape of the lesion and therefore of the segmentation of the lesions.
Tumor stages were classified according to the system implemented by the American Joint Committee on Cancer staging. Ductal carcinoma in situ and invasive cancer tumors were counted as malignant lesions including, also, the papillary breast cancer that is a very rare type of invasive ductal breast cancer that accounts for fewer than 1% of all breast cancers.
All other results, including fibroadenoma, ductal hyperplasia, dysplasia, cysts, fibrosis and adenosis, were considered non-malignant lesions (
Table 1).
2.4. Image Processing
Two expert radiologists, with 22 and 15 years of experience respectively, performed segmentation by drawing manually the regions of interest (ROIs) first separately and then together and in accordance with each other.
The margins of the breast lesions were defined on dual-energy subtracted (DES) images, where contrast uptake was emphasized, both in CC and in early and late MLO view. An example is shown in
Figure 2.
DES images of each woman were processed in MATLAB (The MathWorks, Inc., Natik, MA, USA [
36]) and for each ROI radiomics textural features were extracted. They provide information in the spatial distribution of intensity levels in a neighborhood.
The Texture Toolbox of MATLAB
® realized by Vallières et al. [
37], which includes 48 parameters, calculated according to the Image Biomarker Standardization Initiative [
38], was considered. The package implements wavelet band-pass filtering, isotropic resampling, discretization length corrections and different quantization tools. The toolbox can be downloaded at
https://it.mathworks.com/matlabcentral/fileexchange/51948-radiomics (accessed on 15 April 2019).
The textural features (
Table 2) include both first-order features and second-order ones; an extra detailed description for each one has been provided in the
Appendix A.
2.5. Statistical Analysis
The statistical analysis included univariate and multivariate approaches and were performed with the RStudio software [
39].
2.5.1. Univariate Analysis
The intra-class correlation coefficient (ICC) was calculated to assess the robustness of manual segmentation for the radiomics features obtained considering the two volumes of interest segmented separately by two expert radiologists.
For two-groups comparisons, we used the non-parametric Wilcoxon–Mann–Whitney test for continuous variables. Receiver operating characteristic (ROC) analysis was performed.
To individuate the optimal cut-off value for each feature the Youden index was calculated. Area under ROC curve (AUC), sensitivity (SENS), specificity (SPEC), positive predictive value (PPV), negative predictive value (NPV), and accuracy (ACC) were obtained considering the optimal cut-off values identified maximizing the Youden index [
40].
A p-value < 0.05 was considered as significant. However, false discovery rate (FDR) adjustment according to Benjamini and Hochberg for multiple testing was considered.
2.5.2. Multivariate Analysis
Multivariate analysis was carried out following the pattern recognition model and using linear classifier (linear discriminant analysis—LDA), decision tree (TREE),
k-nearest neighbours (KNN), artificial neural network (NNET), and support vector machine (SVM) to assess the diagnostic accuracy using all extracted metrics of textural parameters. In recent years, several authors have deepened the subject in their work [
18,
41,
42,
43,
44]. Moreover, a brief description of each classifier has already been discussed in a previous article [
14].
Configuration settings for each classifier are provided in
Table 3.
The classification analysis has been cross-validated using 10-fold cross validation (10-fold CV) and the Leave-one-out cross validation (LOOCV) approaches and median values of AUC, accuracy, sensitivity, specificity, PPV e NPV were obtained.
Each classifier received the same set or subset of robust features, identified by a feature selection with the least absolute shrinkage and selection operator (LASSO) method [
45]. In the LASSO method, 10-fold cross-validation was used to select the optimal regularization parameter alpha, as the average of mean square error of each patient was the smallest. With the optimal alpha, features with a non-zero coefficient in LASSO were reserved. Note that the shrinkage requires the selection of a tuning parameter (lambda) that determines the amount of penalization. Feature selection was carried out considering both the λ value with the minimum Mean Squared Error (minMSE) and the largest λ value within one standard error of it (1SE) [
46].
In addition, the presence of less-represented classes has led the use of two techniques to synthesize data and, therefore, to help balance the classes (malignant and benign) giving the possibility to better train the considered classifiers: the self-adaptive synthetic over-sampling (SASYNO) approach and the adaptive synthetic sampling (ADASYN) approach. This allowed to boost the performance both overall and in terms of confusion matrix [
47,
48,
49,
50,
51].
A brief informal description of LASSO method parameters and of SASYNO and ADASYN functions has already been discussed previously [
52].
The best model was chosen considering the highest area under the ROC curve and highest accuracy.
3. Results
The ICC for radiomics textural features was excellent (median value 0.92, range 0.87–0.98), for all extracted features by DCE-MRI.
Table 4 reports the diagnostic accuracy of textural parameters for CC and early and late MLO view in terms of AUC and
p-value.
Figure 3,
Figure 4 and
Figure 5 show ROC curves trend of significant textural features: MAD, IQR, VARIANCE, COARSNESS, BUSYNESS, and STRENGHT for CC projection, VARIANCE, BUSYNESS and STRENGHT for early-MLO projection and GLV_RLRLM, GLV_GLSZM, COARSNESS, BUSYNESS, and STRENGHT for late-MLO projection.
Considering these results, the univariate analysis showed low accuracy and area under the curve for all considered features in the differentiation of benign and malignant lesions.
Figure 6,
Figure 7 and
Figure 8 show the boxplots, related to above-mentioned parameters, to separate benign from malignant lesions.
As regard multivariate analysis, only the most useful results for the purposes of this work will be reported.
Table 5,
Table 6,
Table 7 and
Table 8 report the performance achieved by best classifiers to discriminate benign from malignant lesions.
The best performance considering the CC view (ACC = 0.75; SENS = 0.75; SPEC = 0.74; PPV = 0.75; NPV = 0.74; AUC = 0.82) was reached with a DT trained with LOOCV and balanced data (with ADASYN function) and a subset of three features (by LASSO and λminMSE). The subset of three robust textural features includes MAD, VARIANCE and LRLGE.
The best performance considering the early-MLO (ACC = 0.77; SENS = 0.80; SPEC = 0.73; PPV = 0.75; NPV = 0.78; AUC = 0.83) was reached with a NNET trained with LOOCV and balanced data (with ADASYN function) and a subset of ten features (by LASSO and λminMSE). The subset of ten robust textural features includes MEAN, MAD, RANGE, IQR, VARIANCE, CORRELATION, RLV, COARSNESS, BUSYNESS, and STRENGTH.
The best performance considering the late-MLO (ACC = 0.73; SENS = 0.74; SPEC = 0.71; PPV = 0.71; NPV = 0.75; AUC = 0.82) was reached with an NNET trained with LOOCV and balanced data (with ADASYN function) and a subset of eleven features (by LASSO). The subset of eleven robust textural features includes MODE, MEDIAN, RANGE, RLN, LRLGE, RLV, LZLGE, GLV_GLSZM, ZSV, COARSNESS, and BUSYNESS.
However, the best result (ACC = 0.87; SENS = 0.86; SPEC = 0.87; PPV = 0.88; NPV = 0.86; AUC = 0.90), were obtained considering 144 textural features extracted from all three mammographic projections (CC, early MLO, and late MLO) at the same time with an SVM trained with LOOCV and with balanced data (with ADASYN function).
Figure 9 shows the ROC curve of the best classifiers.
4. Discussion and Conclusions
Numerous studies have demonstrated increased sensitivity and specificity of CEM compared MX, FFDM, Ultrasound, or Digital Breast Tomosynthesis (DBT). In addition, CEM rivals the sensitivity of more costly and time-consuming examinations, such as MRI and molecular breast imaging [
9,
11,
14,
53,
54,
55,
56,
57,
58,
59].
Studies of CEM focused on imaging analysis with an auxiliary system for reporting, and on radiomics features for benignant and malignant differentiation [
60,
61].
The radiomics analysis of tumor features extracted from CEM images combined with qualitative and quantitative information on morphology and functionality represents an important tool for breast tumor characterization [
60,
61,
62,
63]. A summary data was provided in the
Table 9 in order to report the main findings using radiomic and artificial intelligence analysis on CEM.
Marino et al. [
62] investigated the potential of CEM and radiomics analysis for the noninvasive differentiation of breast cancer invasiveness, hormone receptor status, and tumor grade. Radiomics features were derived from first-order histogram (HIS), co-occurrence matrix (COM), run length matrix (RLM), absolute gradient, autoregressive model, the discrete Haar wavelet transform (WAV), and lesion geometry. They reported that radiomics analysis with CEM has potential for noninvasive differentiation of tumors with different degrees of invasiveness, hormone receptor status, and tumor grade.
La Forgia et al. [
63] evaluated radiomics features for predicting histological outcome and two cancer molecular subtypes, namely, Human Epidermal growth factor Receptor 2 (HER2)-positive and triple-negative suggesting an interesting role for radiomics in CEM to predict histological outcomes and particular tumors’ molecular subtype.
In this study, we aimed to evaluate radiomics analysis with texture features extracted by dual-energy CEM images in the classification of malignant and benign breast. We performed the evaluation considering both a univariate analysis and a multivariate analysis using pattern recognition approaches.
The univariate analysis on features extracted from CC view shows statistically positive results for MAD (AUC = 0.72), IQR (AUC = 0.71) and VARIANCE (AUC = 0.72) among first order gray-level statistics, in addition to COARSNESS, BUSYNESS and STRENGTH as regards Neighborhood Gray-Tone Difference Matrix. Considering these results, the univariate analysis showed low accuracy and area under the curve for all considered features in the differentiation of benign and malignant lesions.
As regards multivariate analysis, using the unbalanced dataset showed no significant results; in particular, specificity assumed low values. After a balancing operation (with ADASYN function) and a features selection operation (with LASSO approach) higher values of accuracy, specificity and AUC were measured. A TREE trained with balanced data and a subset of three robust textural predictors (MAD, VARIANCE and LRLGE) achieved the best performance with an accuracy of 0.75 and an AUC of 0.82.
As regards the early MLO view univariate analysis, higher AUC values occurred in correspondence with VARIANCE, BUSYNESS and STRENGTH (AUC = 0.70 in all three cases). With the multivariate analysis, a balancing operation (both with the ADASYN function and the SASYNO function) and a features selection operation allowed to reach good improvements. The best performance (accuracy of 0.77 and AUC of 0.83) was obtained with a NNET trained with balanced data (with ADASYN function) and a subset of robust textural features including MEAN, MAD, RANGE, IQR, VARIANCE, CORRELATION, RLV, COARSNESS, BUSYNESS, and STRENGTH.
With the univariate analysis on features extracted from late MLO view, significant results were measures in correspondence of COARSNESS (AUC = 0.74), BUSYNESS (AUC = 0.71), and STRENGTH (AUC = 0.70), in addition to GLV as regards both the Gray-Level Run-Length Matrix and the Gray-Level Size Zone Matrix (AUC values of 0.70 in either case).
As for the multivariate analysis, the best performance (accuracy of 0.73 and AUC of 0.82) using late MLO features was reached with an NNET trained with balanced data (with ADASYN function) and a subset of robust textural features including MODE, MEDIAN, RANGE, RLN, LRLGE, RLV, LZLGE, GLV_GLSZM, ZSV, COARSNESS and BUSYNESS.
Presently, the determination of the breast malignancies is performed with biopsy. Unfortunately, this examination is invasive and then, there is great interest in alternative, non-invasive and cheaper methods to derive the same information directly from the radiologic images. Another important observation of automated systems for radiomics analysis is that these are not affected by factors such as breast density or background parenchymal enhancement (BPE), which may be limiting breast lesions differentiation. Consequently, these systems also preserve their reliability in more complex breast investigations. Therefore, also that textural analyses is again little used in clinical routine practice; in this context, the radiomics analysis of tumor textural features extracted from CEM images represent an important robust tool for breast tumor characterization.
The small cohort of studied patients represents an initial finding to validate by increasing the sample size of the study in the future. The segmentation of the ROIs slice by slice was manual, and this can be time-consuming, can be operator variable, can be difficult in the segmentation of multicentric lesions or in cases of background parenchymal enhancement. In this study is not reported a comparison of accuracy in the breast lesions detection of CEM respect to FFDM. The future endpoint is to include in the analysis an automatic segmentation of the lesions. The radiomics analysis did not consider tumor histological differences, while the integration of texture metrics combined with histopathology results may provide other important prognostic information for the classification of malignant breast lesions both in early and late phase. This could improve the performance in the classification problem and allow classifying breast lesions according grading and histotype. We did not evaluate the possible impact on texture values considering the different impact of the “void sign” after VAAB vs. post FNAC. This could be evaluated in a future study.
In conclusion, multivariate analyses using pattern recognition approaches optimized by adaptive synthetic sampling and feature selection operations obtained the best results to separate benign and malignant lesions. Overall, textural features extracted from CC view images, showed the best performance. However, the best results (ACC = 0.87; AUC = 0.90) were reached by considering all textural features extracted from dual-energy CEM images, using an SVM trained with balanced data. These findings represented a preliminary experience that should be validate with a larger sample size including patients by different centers.