1. Introduction
Prostate cancer is the second most frequently diagnosed cancer in men, with approximately 1.4 million diagnoses worldwide in 2020 [
1,
2]. The basic methods used in the early diagnosis of prostate cancer are laboratory tests, and, in particular, the determination of the PSA tumor marker. Since its discovery in 1979 and through its clinical use in the 1980s and 1990s, the prostate-specific antigen (PSA) has evolved into an invaluable tool for the discovery, assessment, and monitoring of prostate cancer in men. Increased serum PSA levels are likely a product of disturbed cellular architecture in the prostate gland [
3]. PSA is believed to correlate highly with prostate disease; it is produced by many glands but is also associated with non-malignant diseases [
4]. Moreover, the influence of hormonal therapy influences the detection of prostate cancer [
5]. After the introduction of MR scanners to medical practice, the first attempt to diagnose prostate glands was made by Damadian in the early 1970s; ten years later, it was recognized as a valuable tool for the process of diagnosing prostate diseases [
6,
7]. However, the cost of these examinations has raised some doubts [
8] as to overdiagnosis [
9]; the role of MR is well established in the process of diagnosing prostate diseases [
10], especially after the introduction of diffusion-weighted techniques and contrast enhancement to the protocol [
11,
12]. The experience gained over the years of using of MRI techniques in prostate imaging, especially where correlated with biopsies [
13], resulted in the formation of imaging standards, PIRADS 1 and PIRADS 2, where the PIRADS 4 and PIRADS 5 stages are linked to a high probability of the presence of cancer and associated with high PSA levels [
14]. MRI has taken on an increasingly significant role in the diagnostic process with the development of computed systems for image segmentation and analysis, and algorithms for prostate tumor detection have been proposed [
15].
However, manual analysis is laborious and dependent on well-trained specialists in the field of diagnostic imaging. Therefore, efforts have been made to develop a computer-aided diagnosis (CAD) tool. It has been proven that automatic approaches improve the specificity, as compared to moderately experienced radiologists, for more difficult tumors [
16] and allow for global standardization across radiological centers [
17]. The described CAD systems use various modalities, from T2-weighted MRI (T2W-MRI) [
18] to prostate biomarkers, and diffusion-weighted MRI (dw-MRI) supported with level-sets methods for prostate delineation [
19].
Grayscale MR images are the sum of pixels of different brightness and grayscale levels. Patterns formed by sub-sets of similar pixels can be differentiated with the use of texture maps where pixel distribution changes can be characterized with the application of mathematical formulas [
20,
21,
22]. A plethora of textural features that cannot be recognized by the human eye can be characterized on the basis of mathematical analyses. This encoded information can be revealed by texture feature maps [
22].
Magnetic resonance imaging (MRI) is a non-invasive method of inspecting internal bodily structures. The high quality of the gathered data allows specialists to determine whether the observed tissue corresponds to clinically significant or non-significant prostate cancer [
23,
24].
Over time, the use of artificial intelligence (AI) methods to enable the creation of a CAD system has been widely investigated. In their review, Booven et al. [
25] focus on artificial neural networks (ANNs) as a method of analyzing a variety of data connected with prostate cancer detection.
Starting with prostate-specific antigen (PSA) value analysis, rather than other medical markers, and developing an understanding of MRI images, biomarker diagnosis, histopathological data utilization, etc., the authors conclude that deriving textural features is the most promising approach for prostate cancer description.
Several other reviews support this claim. For instance, Harmon et al. [
26] show that AI enables a correlation between pathological and radiological information about the patient when multiparameter MRI (mp-MRI) is considered. Describing the mp-MRI with radiomics (quantitative imaging of features that are poorly recognized by the human eye, but are understandable for computers) supported with AI algorithms allows for the prediction of cancer aggressiveness [
27] or general prostate cancer analysis [
28]. The influence of deep learning models, which have recently gained popularity, on healthcare professionals working with MRI has been investigated by Liu et al. [
29]; meanwhile, Wildeboer et al. [
30] concentrated on building a CAD system with this technology.
A significant amount of research addresses the problem of determining the Gleason score using the textural features calculated for various modalities of MRI. Wibmer et al. [
31] build a binary classifier using Haralick textural features [
32] for the determination of cancerous and non-cancerous prostate images. A similar approach shows that textural features derived from T2W-MRI, dw-MRI, and dynamic contrast-enhanced MRI (DCE) effectively distinguish the Gleason scores [
33,
34,
35]. More recent research that is still based on first- and second-order histogram features distinguishes up to five grades in the Gleason Grade Group [
36,
37].
Radiomics features are used to determine the aggressiveness of prostate tumors [
38,
39]. When calculated for types of mp-MRI including T2W-MRI, dw-MRI, apparent diffusion coefficient imaging (ADC), and DCE, its performance is comparable with the Prostate-Imaging Reporting and Data System (PIRADS) [
40,
41,
42]. On the other hand, when radiomics features are supported by additional medical risk factors, a model based on multivariate logistic regression allows for the classification of clinically significant and non-significant prostate cancers [
43,
44]. Using mp-MRI data for feature calculation is justified in previous research where support vector machine (SVM) models based on mp-MRI outperformed those constructed using T2W-MRI or ADC data only.
Recently, deep learning approaches have become a popular method of designing models for building a CAD system [
45], detecting and grading prostate cancer [
46], and differentiating clinically significant and non-significant prostate cancers [
47,
48]. In the last case, the performance was comparable to that of the PIRADS system.
There are also methods for automated cancer detection based on T2W-MRI data only [
49,
50], which perform the automatic grading of prostate biopsy specimens in correlation with Gleason grades based on SVM models [
51]. There are also solutions based on textural features derived from mp-MRI for detecting transition zones in prostate tumors [
52], and models that use bi-parametric MRI texture analysis for detecting and evaluating high-grade prostate cancer [
53].
2. Materials and Methods
2.1. Method Overview
The study protocol was developed according to the Declaration of Helsinki and the Declaration of Good Clinical Practice [
54]. All images were anonymized prior to processing to ensure the security of personal data. In addition, written consent from the Local Ethics Committee was obtained to conduct this study, 1072.6120.21.22 (23 February 2022). The study used data from 125 patients aged 27 to 87 years. MRI images were acquired (1.5 T Siemens Avanto, Enlargen, Germany) during normal diagnostic procedures, which met the standard of the PIRADS protocol with the presence of T2-weighted axial sequences with 2 mm slices and a distance factor of 0 and diffusion-weighted sequences with the use of a single shot echo planar sequence (EPI) made with b value equal to 0–800–1500 and a distance factor of 0. Finally, T1-weighted sequences were designed for post-contrast evaluation. The parameters of the T2-weighted sequence utilized in the present study are summarized in
Table 1. Images were converted from 12 bits to 8 bits by linearly scaling from min to a range of 0–255. Then, images that met the stable (repeatable) conditions were selected for the analysis of the texture features. After analyzing clinical data, information on PSA value was obtained for 41 patients (values 0.39–19 ng/mL) (see
Appendix A).
In the present work, we aimed to verify whether it is possible to distinguish with a high probability between patients with prostate cancer and healthy patients. The content of T2W-MRI scans was analyzed, while the PSA was the reference. For the MRI scans, the textural features were calculated using the MaZda software. For the data processing, manually annotated masks were used. As a result, the texture features were calculated for three different regions: the inner prostate only, the outer prostate only, and the inner and outer prostate regions treated as a whole. In order to create a binary classification model, multiple-instance learning (MIL) with a support vector machine (SVM) was adopted. This approach corresponds well with the characteristics of the collected dataset, where one PSA value is known for a patient described with several MRI scans. At the same time, the cancerous tissue may be visible only in one part. It was assumed that PSA = 1, 2, 3 correspond to a healthy person (label = 0) and PSA ≥ 4 is for an ill one (label 1). Finally, the leave-one-out methodology was applied.
2.2. Prostate MRI Dataset
From the initial dataset, 326 MRI images of prostates collected from 41 patients were selected, in which the images were free of imaging artefacts and segmentation and visualization of the lesions were possible despite the rigorous PIRADS protocol. The number of projections per patient, which include the prostate, varies between 4 and 11 scans.
Figure 1 shows the histogram of the number of MRI scans per patient. Each scan is supported with a manual annotation marking the inner (blue) and outer (orange) regions of the prostate, as depicted in
Figure 2. For one patient, the outer prostate region was not annotated because this anatomical structure was absent. The mean age was 64.6 ± 9.8 with median age 65. The mean level of PSA in the cohort was 6.1 ± 3.9 with the median at 5.14. More detailed information on the cohort is given in
Appendix A.
The image data with the corresponding segmentations are all publicly available on the website Zenodo [
55].
2.3. Textural Features
The MaZda software was used to calculate the textural features for the annotated regions in the MRI scans. This approach was chosen due to the ease of setting the number of significant bits per image as a method parameter. This approach is not straightforward, necessitating the implementation of the radiomics Python library, which is also often used for textural feature determination. As we had access to several methods for texture description that may be preprocessed and parameterized in several ways, around 7000 features were calculated to describe a region. A brief overview of the applied methods is presented in this section. Since those methods are well known in the domain of image analysis and are also frequently used as MRI scan descriptors, we refer the reader to the MaZda manual for more details and formulas.
Before the textural features were calculated, the image color space was converted to YUV. Next, data normalization and quantization were undertaken if necessary. This step diminishes the brightness differences between the analyzed images, which could arise due to differences in the acquisition hardware. The system supports several normalization techniques, the details of which are given in
Table 1, and also enables gray-level coding to a chosen number of bits. The preprocessed images were normalized in the region of interest (ROI) according to the methods shown in
Table 2; the textural features were calculated using the following methods. The descriptors are derived only for the selected region: the inner prostate, outer prostate, or inner and outer prostate.
An image brightness histogram (Hist) is the simplest statistical description of the image content. It allows for the calculation of the following textural features: area, mean, variance, skewness, kurtosis, and percentiles. The gradient map features (Grad) describe the rapid illumination changes in small neighborhoods. This feature map is then transformed into a histogram, allowing us to obtain the following textural features: mean, variance, skewness, kurtosis, and non-zero elements. An autoregressive model (Arm) investigates the influence of neighboring pixels on each other. The method searches for the optimal solution by returning information about directional influence described by four theta parameters and an additional sigma parameter that conveys information about the error standard deviation.
The gray-level co-occurrence matrix method [
32] is a more complex approach that concentrates on local neighborhoods. This matrix holds information about the counts of co-occurrence of pixel gray levels next to each other. Then, from this information, several textural features are computed: the angular second moment, contrast, correlation, sum of squares, invariance deformation moment, sum of averages, sum variance, entropy, difference of variance, and difference of entropy. The method acronym is GLCM followed by one letter indicating the direction of runs supplied by the value indicating the distance between pixels considered as neighbors: H—horizontal, V—vertical, Z—diagonal, as shown by the middle part of the letter, N—the other diagonal.
Unlike previously used approaches, the gray-level run-length matrix [
56] describes the content by considering a larger region. Here, the image analysis approach reflects the way humans perceive images. The high-quality images have rapid color changes (short runs), while keeping the same colors next to each other (long runs) is characteristic of images of low quality. In this work, the GRLM acronym followed by one letter indicating the direction of runs (H, V, Z, or N) is given. The information about the run length for pixel gray values creates a base matrix from which several textural features are derived: area, short-run emphasis, long-run emphasis, gray-level non-uniformity, mean gray-level non-uniformity, run-length non-uniformity, mean run-length non-uniformity, and fraction.
In local binary patterns (LBPs) [
57], the neighborhood of each pixel also influences its understanding. The pixel information is coded considering a circular neighborhood of 4, 8, or 12 pixels. A histogram is prepared from the generated codes, and each of its bins becomes a textural feature described by the LBP or circularity analysis method: over-complete, transition, center-symmetric. This is followed by the number of neighbors.
Another popular method for image description uses histograms of oriented gradients (HOGs) [
58]. This technique divides the image into small blocks, for which the gradients are calculated. Their orientations are binned into histograms. For cells (regions composed of several blocks), the histograms are concatenated and normalized to become a feature vector. Using the MaZda software, the HOG method is calculated for the following numbers of bins: 4, 8, 16, and 32.
The frequency components in a local neighborhood are also analyzed using a Gabor transform (Gab). This method is parameterized with the Gaussian envelope, orientation, the period of the sinewave, and the magnitude. On the other hand, the discrete wavelet transform application is also used to derive image information. In this case, the Haar wavelet is used and its energies in the sub-bands become features.
Considering both the large number of methods that can be used to describe the content of MRI prostate scans and the ease of its parameterization, a set of around 7000 features was calculated for each image. Please see the feature naming convention given in
Appendix B.
2.4. Multiple-Instance Learning with a Support Vector Machine
In the present problem, several scans are described by one label. Therefore, multiple-instance learning seems to be the best approach to classification. This is a type of supervised learning; however, instead of having a separate label for each sample, all samples with one label are analyzed together and referred to as a bag. Then, the classifier decides that a bag of samples is negative when all samples are negative, and that it is positive when at least one sample within a bag is positive. When applying this method to our problem, the MIL approach returns a “healthy” label if all the MRI scans in the bag are negative, while it returns an “ill” label when at least one MRI scan depicts cancerous tissue. This approach reflects the fact that cancer might be noticed only in one scan, while the other parts are not infected. This approach works with the support vector machine as a binary classifier. For our purposes, the “mil” Python library was used.
Figure 3 shows an overview of the presented approach.
3. Results
In this study, three sets of experiments were performed. Each of them was concerned with textural features calculated for different prostate regions annotated on the MRI scans, namely, the inner, outer, and inner and outer regions. Some of the textural features could not be calculated for all scans and have therefore been disregarded. For the rest of the scans, the procedure was as follows. First, the Pearson correlation between pairs of textural features was evaluated. Since the number of features is large (6678 features for the prostate inner region, 4898 for the prostate outer region, and 6718 for the total prostate region) in comparison to the number of samples (41, 40, and 40, respectively), it was decided to prepare the MIL-SVM models using only a pair of non-correlated features. Using a smaller number of representative features is also beneficial because it improves the SVM convergence. We assumed that pairs of features with a correlation higher than 50% should be disregarded (around 10% of all possible combinations). Then, the leave-one-out procedure was adopted to verify the model’s generality for each pair of textural features (giving around 60,000/54,000/64,000 possible pairs). This methodology assumes that the model is trained with n-1 samples, using one sample for testing, and that the experiment is repeated n times. The performance of the model was evaluated according to its accuracy, specificity, recall, and F1 score metrics. Both the leave-one-out procedure and SVM had a weight-balanced flag set, since there were 28 positive and 13 negative samples. For SVM, the linear kernel was applied with the regularization factor c = 10.
Table 3,
Table 4 and
Table 5 gather the 20 best results recorded for each type of experiment; the outcomes for the remaining pairs of features are attached as
Supplementary Materials (Table S1). The highest scores are reached when the inner prostate region is considered (
Table 3), with slightly worse outcomes for the whole prostate (
Table 2); meanwhile, the outer region of the prostate seems to be slightly less significant (
Table 4). However, these differences are very small. It is interesting to note that, depending on the region, different textural features play significant roles. This effect might be induced by the shape of the region of interest, as it is circular in the case of the inner prostate and whole prostate and elongated when only the outer part is analyzed. The elongation with a slight rotation of objects on MRI scans must have influenced the calculation of descriptors, as the computation of many of them depends on the chosen angle. Despite these differences and obstacles, very high F1 scores were recorded: 92.86%, 91.23%, and 89.80% for the inner, whole, and outer prostate regions, respectively.
As we can see from
Table 3, the best differentiation between clinically significant and non-significant prostate cancers is achieved when the gray-level co-occurrence matrix summed variance feature is used along with Gabor wavelet magnitude. For the first textural feature, slightly better outcomes (92.86%) are attained when the image is preprocessed with min–max normalization (M) using only six significant bits to calculate the matrix in the horizontal direction (H) with stride equal to five. Changing the direction to diagonal (Z) with a slightly varying stride (4, 5), as well as using a different number of bits (4–8), diminishes the performance to 90.91%, while recall is maintained at the same level (92%) and the precision deteriorates from 92% to 89%. In all cases, the second textural feature is calculated for the original image using the Gabor wavelet with constant Gaussian envelope (24) in the horizontal direction and a sinewave period equal to 12. Since the dataset is highly imbalanced, the AUC was not calculated, as it works well only with balanced data and is too optimistic in other cases. Applying two textural features and concentrating on Gabor frequencies (YN7Gab12N6Mag and YD8Gab24H12Mag or YN8Gab12N6Mag and YD8Gab24H12Mag) also give very high F1 scores: 91% and 89%, respectively. The combination of the Gabor magnitude feature (YM8Gab12H6Mag) with histogram parameters (YS8HistDomn01) and other metrics calculated from the gray-level co-occurrence matrix (YM8GlcmZ5InvDfMom) also performs very well, with an F1 score around 88%.
When analyzing the whole prostate region, the best performance was recorded when combining Gabor frequency features with histogram percentiles, reaching an F1 score of 91.12% with recall of 89.66% and precision of 92.86%. Since there are similar outcomes, we can assume that using quantization (diminishing the number of input bits) in the case of Gabor wavelet calculation does not influence the performance, as similar results are achieved for data using from 4 up to 8 bits. Here, the horizontal orientation of a sinewave with period equal to 8 and Gaussian envelope equal to 16 was applied. The data were normalized using min–max transformation to calculate the histogram percentile using 8-bit data. The following results are derived mostly for pairs of features that focus on the image frequency description and not on some textural features, in an effort to express the human perception of the visual information.
The best outcomes for the outer prostate region are recorded when the correlation feature from the gray-level co-occurrence matrix is supported with the Gabor wavelet feature. Next, the combination of two features is used to describe the following texture frequencies. In the best case, the GLCM method is applied to min–max-normalized images; again, changing the number of significant bits (from 5 to 8) does not influence the F1 score—89.80%—with recall of 100% and precision of 81.48%. The second textural feature is derived from a normalized 8-bit image transformed with a Gabor filter with a Gaussian envelope of 24 radios, calculated in a diagonal orientation (Z) with a sinewave amplitude equal to 12.
4. Discussion
Also known as human kallikrein peptidase 3 (hK3), PSA is a member of the kallikrein gene family [
59]. Ectopic PSA expression has been found at lower concentrations in malignant breast tissue, normal breast tissue, breast milk, and adrenal and kidney cancer. PSA is highly organ specific, as it is mainly produced by prostate epithelial cells. As evidenced by its imperfect performance as a diagnostic biomarker, PSA is not cancer specific and its values can be elevated in men with benign and malignant prostate diseases [
60]. An increase in PSA is secondary to the loss of the barrier provided by the basal layer and the basal membranes of the prostate, which is a mandatory condition for the hormone to enter the circulation. However, loss of the barrier may occur in the case of different prostate diseases (BPH, prostatitis, prostate cancer) and in connection with prostate manipulation of the prostate (prostate massage, prostate biopsy) [
61].
Screening for prostate cancer with the use of multiparametric MRI (mp-MRI) is still one of the most controversial topics in the urological literature [
62]. Screening results reveal a significantly increased diagnosis of prostate cancer with the use of mp-MRI with detection of less advanced prostate cancer, however, with no overall survival benefit was observed [
63]. National USA recommendations against PSA-based screening resulted in a reduction in the use of PSA for early detection and were associated with higher rates of advanced disease [
64]. The inclusion of mp-MRI can improve a screening protocol, as it reduces the number of men who undergo biopsies while detecting more high- and intermediate-grade prostate cancer [
65,
66]. The IP1-PROSTAGRAM study (PSA > 3 ng/mL; MRI Prostate Imaging Reporting and Data System (PIRADS) > 2) showed the highest detection of prostate cancer by MRI compared to transrectal ultrasound-guided prostate biopsy (TRUS) in a population screening setting [
65]. Moreover, standard TRUS is not reliable in detecting prostate cancer and the diagnostic yield of additional biopsies performed on hypoechoic lesions is negligible [
67]. New sonographic modalities such as micro-Doppler, sonoelastography, or contrast-enhanced US provided promising preliminary findings, alone or combined into the so-called “multiparametric US” [
68,
69]. In the multiparametric US vs. multiparametric MRI to diagnose prostate cancer (CADMUS) trial, multiparametric US detected 4.3% fewer prostate cancer cases while submitting 11.1% more patients to biopsy than MRI [
70]. The 68 m labeled prostate-specific antigen (PSA) positron emission tomography (PET PSMA) method is gaining increasing attention in the medical community as a method with increasing clinical potential [
71,
72]. It was proven that the diagnostic performance of PET PSMA is better than multiparametric magnetic resonance (mp-MRI) presented in clinical studies of intermediate-risk cancer, with diagnostic confidence reaching 80% in comparison to 60% for mp-MRI [
73]. For advanced cancer, reported confidence was up to 99% in comparison to 87–97% reported for mp-MRI [
74]. As PET PSMA gains increasing attention and influences treatment change in patients previously diagnosed with MR [
75,
76], there is a need to improve diagnostic techniques based on mp-MRI, especially in regard to the performance of 1.5 T scanners, as obtained results are less significant compared to those presented by 3T scanners [
77,
78].
Therefore, in asymptomatic patients with PSA values of 2–10 ng/mL, additional methods should be used to decide whether to refer the patient for a prostate biopsy, the result of which determines the treatment when cancer is diagnosed. These methods include cancer risk calculators as well as imaging-based diagnostics [
79]. If PSA level is sufficiently high in correlation with the clinical picture (especially palpation methods), the margin for misleading diagnoses is very narrow [
80]. However, in the many cases with intermediate levels of PSA, other diagnostic techniques, including diagnostic imaging, are needed to provide further support.
Magnetic resonance imaging (MRI) has been used for the non-invasive evaluation of the prostate and surrounding tissues since the 1980s. Advances in technology (both in software and hardware) have led to the development of multiparameter MRI (mp-MRI), which combines T2W anatomical imaging with functional and physiological assessments, including diffusion-based imaging (DWI) and its derivatives diffusion coefficient map (ADC) and dynamic MRI with contrast enhancement (DCE), and sometimes other techniques, such as in vivo MR proton spectroscopy [
81].
MRI is currently a highly important step in the process of diagnosing suspected prostate cancer [
23]. On the basis of over 30 years of experience gained in MR prostate studies, classification systems were created and are constantly updated [
82]. However, it is widely acknowledged that a great deal of training is required to determine the image patterns behind prostate cancer, and there are still unclear images where diagnosis is not certain [
83]. Therefore, clinical correlation is highly important, especially in relation to levels of PSA antigens [
84].
Applying a pair of textural features to create an artificial intelligence model for the automatic detection of patients with high (≥4) scores of PSA as a cutoff value for determination of possible malignancy of the lesion is a promising approach, with an F1 score equal to 92%. In our experiments, we verified whether increasing the number of features could improve the outcome. In order to decrease the number of search possibilities, we concentrated on features that participated in generating models with F1 scores higher than 80%. For this approach, all those features were used as input, but principal component analysis was applied to select the most discriminative data. Nonetheless, these results did not outperform the presented ones, so are not discussed in detail.