2.1. Prediction Performance
In order to predict and characterize anti-angiogenic peptides, it is very important to choose a useful classifier with informative features for the design of an accurate predictor as well as providing good understanding of anti-angiogenic activities of peptides. In this study, the five basic features (i.e., AAC, DPC, PCP, PseAAC, and Am-PseAAC) as well as their combinations (i.e., AAC+PseACC, AAC+Am-PseACC, PseACC+Am-PseACC, and AAC+PseACC+Am-PseACC) were selected as input features for training RF models followed identifying good combination of the five aforementioned features.
Performance comparisons of the various feature types was performed for models built via 5-fold CV and independent validation test on the
data set that was subjected to 1 random split and 10 rounds of random splits on the dataset as shown in
Table 2 and
Table 3, respectively. As noticed in
Table 2, the highest test accuracy and MCC of 72.22% and 0.45, respectively, was achieved using the PseAAC feature. Meanwhile, the Am-PseAAC and ACC performed well with the second and third highest test accuracies of 72.22% and 72.12%, respectively. In order to yield better prediction performance, we also utilized the combinations of the top 3 important features (i.e., ACC, PseAAC and Am-PseAAC) to train the prediction models. The combination of PseACC and Am-PseACC reached a test accuracy and MCC of 77.78% and 0.56, respectively, while the combination of AAC and PseACC provided the second highest test accuracy and MCC of 75.93% and 0.52, respectively. In the case of the prediction results from 10 rounds of random splits, from amongst the top 3 important features,
Table 3 shows that AAC had the best performance with a test accuracy and MCC of 73.33 ± 1.01% and 0.47 ± 0.02, respectively. Meanwhile, the combined features of AAC+PseACC and AAC+PseAAC+Am-PseAAC yielded the first and second highest test accuracy and MCC of 74.81 ± 1.01%/0.50 ± 0.02 and 74.07 ± 1.31%/0.49 ± 0.02, respectively.
As mentioned in the section on the Benchmark dataset, it is not fair to compare our results with existing methods because AntiAngioPred was trained on the
dataset. Therefore, in this study, the
dataset was also utilized to develop the prediction models for comparative purposes. Performance comparisons of the RF models with various sequence features are summarized in
Table 2 and
Table 3. The highest test accuracy and MCC of 77.50 ± 1.77% and 0.56 ± 0.03 was achieved by using the combined features consisting of AAC, PseACC, and Am-PseACC. Meanwhile, the AAC feature and the combined feature of AAC+PseACC performed well as it afforded the second and third highest test accuracy and MCC of 77.00 ± 2.09%/0.55 ± 0.04 and 75.50 ± 1.12%/0.52 ± 0.02, respectively. As seen in
Table 2 and
Table 3, prediction results for the
dataset are quite consistent with that of the
dataset.
Furthermore, from
Table 2 and
Table 3, the experimental results can be briefly summarized hereafter. Each of the three single features including AAC, PseACC, and Am-PseACC are benefical for predicting anti-angiogenic peptides with test accuracies of >73% and >77% when performed on
and
datasets, respectively. Furthermore, prediction results for the
dataset were better than that of the
dataset thereby indicating that the position of the first fifteen residues plays a vital role in discriminating anti-angiogenic from non-antiangiogenic peptides (
Table 2 and
Table 3). This observation is in good consistency with the study of Ramaprasad et al. [
27]. The best prediction performance for both
and
datasets as evaluated via independent validation test from 10 rounds of random splits were achieved by using the combined features of AAC, PseACC, and Am-PseACC. For convenience, we will refer to this RF method built with the combined feature of AAC, PseACC, and Am-PseACC as TargetAntiAngio.
2.3. Biological Space
The analysis of feature importance can provide a better understanding of the mechanistic details governing the anti-angiogenic activity of peptides. As mentioned above, in this study, the informative features of AAC, DPC, and PCP were used to characterize the anti-angiogenic activity of peptides. In order to select informative features, this study utilized the RF model because of its built-in ability of feature importance estimation and its great prediction performance. The value of mean decrease of Gini index (MDGI) is adopted to rank and estimate the importance of each AAC and DPC features. Such information is derived from analysis of the
dataset that consists of 137 anti-angiogenic and 137 non-antiangiogenic peptides.
Table 5 lists the percentage values of the 20 amino acids for both anti-angiogenic and non-anti-angiogenic along with their amino acid compositional difference between the two classes along with their MDGI values. Features with the highest MDGI value is considered as the most important as it significantly contributed to the prediction performance. As seen in
Table 5, the 10 top-ranked informative amino acids with the highest MDGI values are Cys, Ser, Val, Ala, Leu, Arg, Glu, Lys, and Pro afforded MDGI values of 15.90, 14.43, 9.58, 9.21, 8.41, 8.31, 6.68, 6.59, 6.40, and 5.52, respectively. Meanwhile, from amongst the 10 informative amino acids, the analysis of AAC with the percentage of certain residues on anti-angiogenic peptides suggested that Cys, Ser, Arg, and Pro are dominant in anti-angiogenic peptides, while Val, Ala, Leu, Glu, and Ile are dominant in non-antiangiogenic peptides at the significance level of
p-value ≤ 0.05.
Furthermore, the sequence logo of the first and last fifteen residues at the N- and C-terminal regions of both anti-angiogenic and non-antiangiogenic peptides were created to visualize the positional information for each amino acid as shown in
Figure 3. The overall stack height of each position indicates its sequence conservation while the size of the residue represents its propensity.
Figure 3a,c shows that Pro, Ser, Trp, Cys, and Gly as well as Cys, Ser, Gly, Pro, and Arg are abundant at the first 15 residues from the N- and C-terminal regions, respectively, of anti-angiogenic peptides. However, only Leu and Ala are abundant at the last 15 residues from the C-terminal region of non-antiangiogenic peptides. Thus, information gathered from the sequence logo illustration shows crucial amino acid residues that could potentially be used for discriminating anti-angiogenic from non-antiangiogenic peptides. Moreover, Cys, Ser, and Arg are seen to be favored by anti-angiogenic peptides, especially at the C-terminal region. These analyses were in good consistency with the feature importance as estimated using MDGI values where Cys, Ser, and Arg are ranked 1, 2, and 6, respectively (
Table 6).
The heatmap of feature importance for the DPC feature can be seen in
Figure 4, from which, the 20 top-ranked informative dipeptides with the highest MDGI values are SP, TC, CG, CS, SC, TR, RT, PF, AS, HG, LI, PC, RP, AA, SL, AL, ST, IV, RR, and AD. From amongst the top 20 informative dipeptides, there are 6 dipeptides (SP, TC, CG, CS, SC, and TR) with MDGI values larger than 1.45. In addition, 4 out of the 6 top-ranked informative dipeptides (TC, CG, CS, and SC) consist of Cys, while 3 out of the 20 top-ranked informative dipeptides (TR, RT, and RP) consist of Arg. As mentioned previously, Cys and Arg were the first and sixth important amino acids with the highest MDGI values of 15.90 and 8.31, respectively. These results reinforced the importance of Cys and Arg for the anti-angiogenic activity of peptides. Furthermore, detailed analysis of these two amino acids are discussed below.
Cys provided the largest MDGI value and results shown in
Table 5 displayed that the percentage composition of Cys residues are found to be significantly different in a comparison between anti-angiogenic (0.047%) and non-antiangiogenic (0.014%) peptides producing significant p-value < 0.05. Many studies have reported that Cys is the preferred residue for anti-angiogenic activity [
30,
31,
32,
33]. Cys is classified as a polar, non-charged amino acid containing sulfur which, when oxidized, could form a disulfide bond. It stabilizes the tridimensional structure, which is essential for extracellular proteins that might be exposed to virulent conditions. Peptides containing multiple disulfide bridges are more resistant to thermal denaturation and is also crucial for maintaining their biological activity [
34]. In 1997, a globular protein namely, endostatin was first discovered by Folkman and coworkers as an endogenous inhibitor of angiogenesis [
35]. Mass spectrometry demonstrated that endostatin contains two disulfide bonds: Cys162-302 and Cys264-294 [
31,
32]. In addition, histological sections of tumors from saline-versus-endostatin-treated Lewis lung carcinomas were analyzed for apoptosis and angiogenesis. The results showed that the apoptotic index of tumor cells increased 7-folds (
p-value < 0.001) while angiogenesis was completely suppressed in tumor cells (
p-value < 0.001) for the endostatin treated mice [
35]. Furthermore, Hiraki et al. [
36] performed site-directed mutagenesis of chondromodulin-1 (ChM-1) as to assess the importance of Cys toward the function of ChM-1. The results disclosed that the ChM-1 mutant, which had all eight Cys residues replaced by Ser, lost the inhibitory effect of VEGF-A that subsequently stimulated the migration of human umbilical vein endothelial cells (HUVEC) due to the lack of disulfide bonds. Remarkably, Ser at positions 83 and 99 on the replaced ChM-1, revealed a decreased cell migration (150%) as compared to that of VEGF-A (350%). This result indicated that the disruption of one disulfide bond cannot neutralize its migratory effect. In addition, the Δ (Cys83 Cys99) rhChM-1 mutant lacking the 17 amino acid residues from Cys
83-Cys
99 and but retained three disulfide bonds, still appeared to exhibit its inhibitory effect [
37]. Similarly, Chlenski et al. [
20] designed and synthesized two peptides consisting of FSEC (CELDENNTPMC) and FSEN (CQNHAKHGKVC) from FS-E (CQNHCKHGKVCELDENNTPMC) by linking Cys 1 to Cys 3 and Cys 2 to Cys 4, owing to the need to construct simpler peptides with less complex structures. FS-E is classified in the group of secreted protein acidic and rich in cysteine (SPARC). In this study [
20], the authors divided the experimental processes into three parts including: (i) endothelial cell migration assay (ii) inhibition of neuroblastoma tumor growth and (iii) inhibition of tumor induced angiogenesis. Firstly, in order to evaluate the capability of the two simple peptides to inhibit endothelial cell migration, HUVEC were treated with serial dilution of FSEC and FSEN by monitoring the percentage of stimulation compared with beta-fibroblast growth factor (bFGF) as a positive control. For the former, in vitro experiment was demonstrated by the inhibition of human umbilical vein endothelial cells (HUVECs) migration with an EC
50 of 1 pM. Secondly, an in vivo experiment was demonstrated via a mice model in which mice with subcutaneous neuroblastoma xenografts were treated with the FSEC peptide for 2 weeks. The FSEC-treated mice were compared to the control group (PBS) and it was revealed that the inhibition of tumor growth was observed as deduced from the decreasing tumor weight (
p = 0.01). Lastly, a paraffin section of xenografted mice was stained using green CD31 (PCAM-1) positive endothelial cells and red SMA-positive pericytes whereby the quantity of tumor blood vessels was calculated as the area occupied of staining. Results revealed that FSEC was significantly reduced in FSEC treated xenografts as compared to vesicle treated control (p-value < 0.001). This study also indicated that FSEC, which is a modified linear peptide containing disulfide bonds, has the ability to completely abrogate angiogenesis thereby leading to tumor growth inhibition. Their results is consistent with previous studies that SPARC can inhibit breast cancer progression [
38], ovarian metastasis [
33] with the overexpression of endogenous angiogenic inhibitors such as somatostatin, angiostatin, and endostatin, which also represents negative correlation with poor prognosis of cancer patients [
39,
40].
Furthermore, Yang X et al. [
41] modified wild-type (WT) kringle5 (K5), which has been shown to contain anti-angiogenic activity with higher potential than angiostatin, by disruption of its disulfide bond distribution. K5mut1 was designed by deleting amino acid residues outside the kringle domain whereas Cys462-Cys451 is still located in the WT K5. Additionally, K5mut2 was constructed by removing Cys462, thereby leading to the loss of one disulfide bond. The effect of WT K5 and its deleted mutation on endothelial cell proliferation, cell apoptosis, and tumor growth were evaluated by the percentage of cell viability, flow cytometry and tumor weight, respectively. In vitro results showed that K5mut1 was able to decrease endothelial cell proliferation by 2-fold, enhancing endothelial cell apoptosis. Moreover, in vivo experiment was revealed that the weight of liver tumor in a mouse model was gradually decreased compared to mice treated with wild-type K5. Meanwhile, K5mut2 lacking one Cys, lost all its inhibitory effects. In summary, anti-angiogenic peptides containing Cys residues that formed disulfide bonds play an important role in (i) inhibiting blood vessel proliferation through the activation of angiostatin contributes to a lack of nutrients and blood supply to tumor cells [
20,
42], (ii) increasing anti-angiogenesis via reduction of specific receptors for pro-angiogenic molecules, (iii) inducing cell apoptosis [
35,
43], and (iv) balancing opposing signals in the tumor microenvironment [
44].
Although our prediction model showed that Cys is the most important amino acid for the inhibition of blood vessel proliferation and tumor growth, other peptides which does not contain Cys have also demonstrated anti-angiogenic activity. Recent advances in biotechnology have led to the discovery of numerous biologically active peptides. The challenge is to also increase other physicochemical properties such as the bioavailability and as such pharmaceutical techniques such as liposome, hydrogel, nanoparticle, and targeted drug delivery system should be utilized for improvement of the potency of anti-angiogenic peptides. For example, tumstatin peptides binds to avB3 integrin on proliferating endothelial cells and also localizes to the target tumor. Moreover, when combined with bevacizumab (anti-VEGF antibody), an increase in its efficacy against tumor progression was observed [
45]. Thus, the design of therapeutic peptides utilizes appropriate amino acids for bringing about the intended effect as to target specific mechanisms of interest. Representing the sixth largest MDGI value (
Table 5), the percentage composition of Arg residues is found to be significantly different between anti-angiogenic (0.088%) and non-antiangiogenic (0.055%) peptides at a significance level of p-value < 0.05. Bae et al. [
46] identified hexapeptides from peptide libraries in order to investigate their effects on the binding of VEGF to their receptors. The authors found that the most important amino acids for inhibitory activity included Arg, Lys, and His. Meanwhile, three peptides RRKRRR, RKKRKR, and hexa-arginine (RRRRRR) were demonstrated to be the most effective inhibitor with IC
50 values of 2, 3.4, and 3.8 μM, respectively. In addition, the interaction between hexapeptides and VEGF was investigated by monitoring the binding of labeled VEGF
165 to endothelial cells. Results showed that Arg-rich (AR) hexapeptides directly binds to VEGF
165 (K
D = 5, 2 and 22 μM). Furthermore, the proliferation assay also confirmed that AR hexapeptides inhibited HUVE cell proliferation by VEGF
165 in a concentration-dependent manner without cytotoxicity. Moreover, the essential role of hexapeptides containing basic charged amino acid resides was elucidated via blocking the metastasis of human colon carcinoma cells. Results disclosed that RRKRRR decreased the number of metastatic nodules by 16% as compared to that of the control whereas hexa-Lys (KKKKKK) showed minor inhibitory effects (80% of control). Conversely, the peptide with negative charge (EEFDDA) appeared to show no inhibitory activity at all. In addition, Xiong et al. [
47] demonstrated that treatment of cells with 0.05 mmol/L of L-Arg for 7 days caused endothelial dysfunction as measured by the enhanced superoxide anion and decreased NO production. Thus, the chronic L-Arg supplementation is potent for accelerating endothelial cell senescence expression with the up-regulation of Arg-II. Moreover, Arg was utilized to create a synthetic RGD (Arg-Gly-Asp) integrin ligand sequence for improving the tumor cell targeting capability of therapeutic peptides [
48]. Xu et al. [
49] synthesized HM3 peptide (IRRADRAAVPGGGG) and added RGD (IRRADRAAVPGGGG-RGD) in their investigation on the inhibitory effect. The experimental result showed that it could significantly inhibit the migration of the HM3 peptide into endothelial cells. Besides, Matrigel and aortic ring tests conducted in a mice model also revealed that HM3 could potentially inhibit angiogenesis. Similarly, Buerkle et al. [
50] explored the effect of cyclic RGD peptide as an α
v-integrin antagonist on angiogenesis, microcirculation, growth, and metastatic formation of solid tumors. Results indicated that the cyclic RDG peptide reduced blood vessel density as well as diminished tumor growth and metastasis. Additionally, Kando et al. [
51] developed a liposomal drug targeted to membrane type-1 matrix metalloproteinase by modification with stearoyl Gly-Pro-Leu-Pro-Leu-Arg (GPLPLR). The authors observed that the modified liposome showed high binding ability to HUVEC and increased accumulation in tumor cell (> 4-fold). In summary, peptides containing Arg induced anti-angiogenic activity and contributed to the inhibition of tumor growth via (i) the binding of peptides to the main body of VEGF including the N- and C-terminal ends [
46] and (ii) increasing the specificity to targeted tumors as Arg confers a small positive charge there allowing cell binding via electrostatic interactions with the negatively charged cell membranes thus, leading to arrested tumor growth [
52].