3.1. Correlation Analysis and Multiple Linear Regression
The limit values of the log BB index are different for many of the proposed BBB penetration prediction models [
8,
9,
24]. The log BB = −0.52 is the logical division into BBB+ and BBB–. The logarithmic value of −0.52 corresponds to a 30% brain–to–plasma ratio of the compound [
25]. The optimal classification threshold is usually 0 to −1 [
25,
26]. Studies conducted earlier defined the BBB+ limit as log BB ≥ −0.9 [
27]. The calculations in this study use all the rates of BBB penetration through the 37 observed drugs. The parameter used (B2) is based on the log BB and describes the division into BBB +/− [
19], and the following limit values were also used: B2 > −0.90 and B2 > −0.52. Another applied descriptor: CNS +/− is described in the literature [
17].
In order to combine two pharmacokinetic phenomena—the drug penetration of blood–milk and the blood–brain barrier, the observation of molecular descriptors of 37 compounds was introduced. MDs which, according to the literature, usually affect BBB +/− and M/P are: binding to plasma proteins—PB, the acid–base nature of the compounds—acid/base and PhCharge—physiological charge. The correlation matrix that describes the relationships between these indicators in the study group is presented in
Table 3.
The connection of B2, B2 > −0.90 descriptors and PB is clearly visible. Additionally, it was assumed that the best correlation of the binding strength with the plasma proteins is connected with the B2 descriptor. On its basis, the availability of drugs to the brain can be determined. The choice between B2 > −0.90 and B2 > −0.52 is subjective, so the range of drug properties related to the B2 measure was investigated.
In milk penetration correlation analyses (
Table 4), APIs indicated as CNS− were removed from a group of 37 cases. Only potentially CNS+ remain, in line with the theory that centrally acting drugs are the most dangerous. There are 25 such CNS+ compounds, but the introduced LLL H, LactMed, and PB indicators further reduced this number (n = 21).
The significant correlation coefficient (inverse relationship) between the bioavailability of the drug in breast milk M/P with the value of PBcode and PB (R = 0.52 and R = −0.42 respectively) indicates that the high level of protein binding reduces the risk of drug penetration into breast milk. The danger of treatment during the lactation period is associated with the nature of biologically active compounds: acids (a; code: −1) are the safest. They strongly bind to human plasma albumin. The bases (b; code: 1) are the least secure. Here, the ratio of milk content (M) to plasma content (P) will be the greatest. The correlation matrix also shows that the relationship between the acid/base, B2 and M/P is very similar (R = 0.54). Such a result confirms the relationship of drug binding power with proteins and their bioavailability in compartments isolated by the blood–brain and blood–milk barriers.
The main task was to study the interrelationships and relationships between the bioavailability of drugs in the CNS and the dangers of their use in women during breastfeeding. Therefore, the MLR analysis was performed for the dependent variable B2. A summary of the regression obtained is presented below. The test result is very good, not only because of the high correlation coefficient R = 0.91, which explains 83% of the total variability of B2 (n = 33), but also due to the introduction of the M/P parameter to the model. The relationship between B2 and M/P is, as expected, directly proportional (
Figure 1, Equation (1)).
R = 0.91, R2 = 0.83, F(3,29) = 47.05, p < 0.00000, s = 0.2811; n = 33
Q2LOO = 0.78, SDEP = 0.2839, PRESS = 2.893, SPRESS = 0.2875, Q2LMO = 0.80
The B2 index is most strongly represented by the number of hydrogen bond acceptors (HA) and the molecular surface area of the drug (Sa). Interesting results were also obtained from the MLR model of the dependent variable PB (Equation (2)).
R = 0.76, R
2 = 0.57, F(5,27) = 7.244,
p < 0.00020, s = 0.2417
The repetition of all MDs associated with the B2 value (HA, M/P, Sa) in the resulting PB model suggests the inclusion of the chromatographic data connected with protein binding in explaining the penetration of APIs into the CNS and breast milk. Due to this connection, the chromatographic systems describing the protein binding capacity became analytical models for two pharmacokinetic phenomena—the distribution to the CNS and milk.
Further analyses were based on the chromatographic data: R
f values from TLC (NP and RP mode) and HPLC retention times were collected using two columns: HSA and IAM. Initially, a study of the results of using chromatographic data for a set of CNS +/− cases (n = 37) was performed and presented in the following correlation matrices (
Table 4), then all types of chromatographic systems used were checked for the group of bioavailable CNS+ (n = 25) (
Table 5).
Both groups of cases (CNS+/− and CNS+ only) clearly associate bioavailability in breast milk with the behavior of compounds tested in a thin layer chromatography environment. They apply both to the stationary phase, obtained by modifying silica gel plates (NP), and to silica gel plates, which are silanized (RP). Slight differences between the groups of cases—CNS+/−, CNS+ result from the size of the groups. The results of experiments using HPLC yielded worse results.
An additional modifier was the R
f value of drugs on non–protein coated plates (control plate—C). The introduction of the R
f/C parameter—NP/C and RP/C—allows the observation of chromatographic effects related only to the presence of the protein in the stationary phase. The modified data are presented separately in the following correlation matrices from various chromatographic experiments (
Table 6).
The observation of the correlation of chromatographic data by type of chromatography indicates RP TLC as the system that best describes the characteristics of drug excretion into breast milk in the group of CNS-active APIs. The RP/C descriptor also has the highest correlation with M/P (R = −0.57).
3.2. Random Forest Regression
RF regression analyses were performed using seven MDs. Four models were created, each time using different chromatographic data (Rf values from NP TLC and RP TLC or log k obtained from HPLC-HSA, HPLC-IAM experiments). The dependent variable was the B2 parameter, similar to the MLR analysis. The molecular descriptors that gave the best results and therefore were used in the models are listed in the
Table 7:
The predictor importance plot revealed that TLC data (MD no. 7 on the plots), with an indication of the reversed phase system, had the greatest contribution to the creation of the RF model. This impact is comparable to the contribution of log P or HD. HPLC-HSA log k values show a significantly lesser effect on regression, and HPLC-IAM data indicates a negligible influence. (
Figure 2).
The most important MD in RF models is the number of hydrogen bond acceptors (HA). Only PB reveals an inversely proportional relationship, which is the expected conclusion. The M/P parameter shows a similar contribution as physicochemical MDs such as PhCharge or log P. This information creates another link between crossing the BBB and the blood-milk barrier.
RF regression models were ultimately built with only TLC data, as reported by the predictor importance plots. R
2 and Q
2 values oscillated around 0.85–0.87 and 0.78–0.80, respectively (
Figure 3 and
Figure 4), which is a satisfactory result. The best fit of the model can be seen in the B2 value range from −1.5 to 0; extreme values are less correlated.
With each model calculation in the RF method, different decision trees are involved. Therefore, the RF regression was repeated 20 times to obtain averaged results.
Table 8 shows the mean values of R
2 and Q
2 from the models. The data from NP TLC gives slightly higher results of the model and cross-validation determination coefficients, but it is not a significant difference.
3.3. Cluster Analysis
This analysis was performed to compare the variability of the blood–brain barrier permeation parameters and the chromatographic data describing them.
The parameter CNS+/− was used as an indicator of the bioavailability of drugs in the brain. As an indicator of pharmacotherapy safety in the feeding period, the M/P code parameter was used, which was created by scaling the M/P parameter. The M/Pcode parameter ranges from 1–4. The values 1, 2, 3 and 4 have cases where the determined M/P level does not exceed 0.3, 0.8, 1.1 and 4, respectively. The logical conclusion also remains that drugs with a lower concentration in milk than in blood can be considered safer (M/P−). If the M/P value exceeds 1, the use of pharmacotherapy in a nursing woman is inadvisable or prohibited (M/P+). The study group was observed with the possibility of dividing it into two or four clusters: drugs CNS+ and CNS− and M/Pcode 1–4, according to the level of safety of use.
The application of two clusters (
Appendix A Figure A1) is not effective. The CNS+/− takes the value of 0.5 and 0.8 for clusters with values 0 and 1, and all M/P values, in both clusters that are below 1 (instead of <1 and >1); this means that both clusters gather mixed cases. Such analysis is not effective. This is due to the considerable complexity of both phenomena, which are not obviously defined, even though biomedical models.
The splitting of the cases into four clusters improved the result (
Figure 5). In the case of M/P values, the four clusters split into two groups—clusters 2 and 4 have an M/P− < 1, and clusters 1 and 3 are M/P+ > 1. At the same time, clusters 1 and 3 showed M/P+ (3.4–3.8) values, and clusters 2 and 4 corresponded to M/P− (1.1–2.2). The definition of the CNS+/− value strictly follows this division. An interesting observation is the distribution of the values of the chromatographic data. In both TLC and HPLC formats, all extreme data (the smallest and largest R
f and log k values) correspond to clusters 1 and 3. The intermediate clusters are 2 and 4.
Assuming that the inaccuracy of the division of cases into M/P+ and M/P− may be caused by a particularly difficult and ambiguous differentiation of CNS+/− APIs, a grouping analysis of k–means excluding CNS− drugs was performed.
An M/P value of less than 1.0 shows that only minimal amounts of drugs are transferred into the milk; these types of drugs are classified as low risk (LR). Drugs with an M/P value of 1.0 or more may be present in breast milk at a higher concentration than in the mother’s plasma and are classified as high risk (HR). Thus, this leads to more drug being transported into the infant’s body and, consequently, to side effects [
12].
As the results of these experiments depend on various uncontrolled variables such as laboratory conditions, geographic region and lactation time, the reliability of these experimental data is questionable. In addition, there are many drugs for which M/P ratios have not been determined, so it is necessary and useful to develop some theoretical methods such as the Quantitative Structure Property Relationship (QSPR) for predicting M/P values. In QSPR approaches, the chemical structures of compounds are quantitatively correlated with their biological activity [
28].
In the current analysis, an additional parameter, HR/LR, was introduced, where HR—high risk (code value 1) of passing the drug into milk. The second value is LR–low risk (code with a value of 0). The result of the analysis, with the application of the 4 CNS+ APIs clusters, are shown in the graph below (
Figure 6).