2. Experimental Data Sets and Computational Methods
The collected solubility data sets from the literature (a total of 79 data sets) were fitted to the Jouyban–Acree model and explained in detail for each analysis. Drug concentrations were converted using the molecular masses of the drugs and/or the density values of the saturated solutions. The solvent compositions were converted employing the density of the solvent mixtures taken from the literature. Various combinations of the solute/solvent concentration units were analyzed in this work.
Table 1 lists the details of these expressions. Obviously, all fraction concentration units, i.e., mole fraction, mass fraction, and volume fraction, varied in the range of 0.0–1.0. The minimum molar concentration of the investigated data points was 3.8 × 10
−6 mole/L or 1.0 × 10
−3 g/L (for sulfadiazine datum dissolved in water at 293.2 K, SN = 59) and the maximum value (16.2 mole/L correspond to 2786.6 g/L) belonged to sulfanilamide in 1,4-dioxane + water (SN = 70, 0.5 + 0.5 mole fractions) at 323.2 K. A major part of these wide variations was compensated in the two first terms of the Jouyban–Acree model, in the logarithmic scale, and the range of the variation of the obtained excess values were much narrower.
The Jouyban–Acree model, the most accurate cosolvency model [
8], is described as follows:
where
x1,T,
x2,T, and
xm,T represent the solubility of the solute in mono-solvents, one and two, and mixed solvents in various concentration units (in this work, mole fraction, molar, and g/L) at a temperature of ‘
T’, respectively. The
w1 and
w2 stand for the concentrations of the mono-solvents, one and two, in the absence of the solute. In this work, these parameters are expressed in various units (mole fraction, mass fraction, and volume fraction). Terms of
Ji are the parameters of the model and are computed by regressing analysis of (
) against (
), (
), and (
). The number of parameters (np) is usually two but, for some cases, up to three or even four can be used.
The experimental solubility data (
), in the current work, were fitted to the model and the back-calculated solubility data (
) were used to compute some indices of error evaluation, including the percentage of mean relative deviation (
MRD%), relative mean square deviation of arithmetic scale (
RMSD1),
RMSD of logarithmic scale (
RMSD2), error in arithmetic scale (
E1), and error in logarithmic scale (
E2), computed using Equations (2)–(6):
where
N is the number of data points in each set.
3. Results and Discussion
The solubility data of each drug expressed in the units of mole fraction, molar, and gram/liter in the binary solvent mixtures with the solvent compositions expressed in various units of mole fraction, mass fraction, and volume fraction, defined as codes 1–9 (see
Table 1 for details), were fitted to Equation (1). More details of the collected data sets are listed in
Supplementary Table S1. The back-calculated data were used to compute various error evaluation criteria. When overall
MRD% values (for codes 1–9) were classified according to the drug, the largest value was obtained for the ketoconazole data sets (overall
MRD% = 25.7) and the smallest value was observed for the dapsone data sets (overall
MRD% = 4.9). The obtained error values for each numerical method, which were expressed in
MRD%, are listed in
Supplementary Table S2. The largest
MRDs% for codes 1–3 were observed for the solubility of ketoconazole in the carbitol + water system (SN = 12) and those for codes 4–9 were obtained for ketoconazole in the acetonitrile + water system (SN = 11).
Figure 1 illustrates the overall
MRD% and their standard deviations (SDs) for 79 data sets and different numerical analysis codes. As can be seen from the results, there was no significant difference in overall
MRD% values for codes 1–3 and 4–9; however, there was a significant difference among these subgroups. These results mean that the drug concentration was not an affecting parameter on the fitness capability of the Jouyban–Acree model when
MRD% was considered as an error criterion; however, the concentration of the solvents in the absence of the drug might affect the fitness of the model to the experimental data. Careful examination of the distributions of the various solvent compositions revealed that the mean value of the mole fractions was 0.36, whereas those of the mass and volume fractions were 0.52 and 0.52. Our earlier observations showed that, with the equal distances among the fractions (i.e., mean fraction of 0.50), the Jouyban–Acree model provided the most accurate correlations, and the observed differences among codes 1–3 (expressed as mole fraction) with codes 4–9 (expressed in volume or mass fractions) could be justified by the skewness of the mole fractions. Another difference in these analyses was several variations in the numerical values of the model constants of Equation (1) and, also, the number of significant
J terms of the Jouyban–Acree model. As an example, the
J0,
J1,
J2, and the obtained
MRD% values for the solubility data of sulfadiazine in acetonitrile + methanol mixtures (SN = 60) are listed in
Table 2. The mean of mole (0.46), mass (0.50), and volume (0.50) fractions of the solvent composition in this set was relatively equal. Similar investigations were carried out on the solubility data of paracetamol in PEG 400 + water (SN = 55), with the mean of the mole (0.14), mass (0.50), and volume (0.48) fractions of the solvent composition. The highest deviations from 0.50 was observed for the mole fraction data, and the obtained
MRD% for code 1 was 14.2%. Meanwhile, the corresponding values for mass (code = 4) and volume (code = 7) fractions were 3.3 and 3.0%. In another data set, i.e., meloxicam in ethanol + ethyl acetate (SN = 55), with the mean of the mole (0.50), mass (0.58), and volume (0.60) fractions of the solvent composition, the
MRDs% for codes 1, 4, and 7 were 14.2, 3.3, and 3.0%, respectively.
Table 3 lists the effects of different numbers of the
J terms and the
MRD% values for SN = 60. As was expected, employing more curve-fitting parameters, i.e., the
J terms, more accurate correlations could be obtained. According to the theoretical justification of the Jouyban–Acree model [
8,
9], the
J terms represent the non-ideal mixing behavior of the solution. For ideal mixing behavior, all
J terms were non-significant constants and the Jouyban–Acree model reduces to the Yalkowsky model [
10]. The Yalkowsky model is an algebraic linear model which consider an ideal mixing for solvent mixtures without any energy exchanges. This model is expressed as:
Supplementary Table S3 lists the obtained results, employing
RMSD1 as an accuracy criterion. The overall 10
5 RMSD1 values varied from 125.3 (for code 7) to 8,948,236 (for code 3). Concerning the solvent compositions, the order of
RMSD1 values for the mole fraction solubility of the drugs was volume fraction (125.3) < mass fraction (132.8) < mole fraction (312.0). The corresponding orders concerning the molar and g/L drugs’ concentrations were volume fraction (3302.3) < mass fraction (3354.0) < mole fraction (8948.2) and volume fraction (3,302,273) < mass fraction (3,353,971) < mole fraction (8,948,236). It seems that the numerical values of drug solubility in the saturated solutions were the governing parameters in
RMSD1 calculations, in which the overall 10
5 RMSD1 for the drugs’ mole fraction solubilities was 190.0, or
, and those of molar and g/L were 5201.5 and 5,201,493, respectively. The largest 10
5 RMSD1 values (for codes 1–3, 5, 6, 8, and 9) were observed for the solubility of sulfanilamide in the 1,4-dioxane + water system (SN = 70) and was obtained for sulfadiazine in 1-propanol + water (SN = 59) for codes 4 and 7.
Supplementary Table S4 reports the
RMSD2 accuracy criterion for the investigated systems. The overall 100
RMSD2 values varied from 10.5 (for codes 8 and 9) to 19.8 (for code 1). Concerning the solvent compositions, the order of 100
RMSD2 values for the mole fraction solubility of the drugs was volume fraction (10.5) < mass fraction (10.9) < mole fraction (19.8). The corresponding orders concerning the molar and g/L drugs’ concentrations were volume fraction (10.5) < mass fraction (11.0) < mole fraction (19.4) and volume fraction (10.5) < mass fraction (11.1) < mole fraction (19.4). Similar to the
RMSD1 case, the numerical values of drug solubility in the saturated solutions were the governing parameters in
RMSD2 calculations in which the overall 100
RMSD2 were 13.7, 13.6, and 13.7, respectively, for the drugs’ mole fraction, molar, and g/L solubilities. The largest 100
RMSD2 value (for codes 1–3) was observed for the solubility of ketoconazole in carbitol + water system (SN = 12) and was obtained for ketoconazole in NMP + ethanol (SN = 14) for codes 4–9.
Supplementary Table S5 lists the details of
E1 for different codes where the largest
E1 values were observed for the solubility of sulfanilamide in 1,4-dioxane + water (SN = 70) for codes 1–9.
Table 4 lists the overall
E1 values obtained for the various drugs investigated according to the investigated codes.
Supplementary Table S6 reports the details of the
E2 values.
E1 and E2 are the absolute error, or variances in the arithmetic and logarithmic scales, respectively. The absolute error uses the same scale as the data being measured. Therefore, in the case of solubility with the g/L unit (especially in the arithmetic scale), the high values can be recorded as error criteria that make the data comparison difficult. RMSD1 and RMSD2 are root-mean-square deviations in the arithmetic and logarithmic scales, respectively. These error criteria are the mean square root of the variance and, similar to absolute error, are related to units of measurements. However, MRD%, as a mean relative deviation, facilitates the comparison between datasets or models with different scales due to normalizing the data by dividing the variance to the observed values. From this point of view, the MRD% definition is similar to %RSD (relative standard deviation which is used as a repeatability and reproducibility index for repeated measurements) and may be the best error criterion.
Furthermore, as the solubility data for the investigated systems (different solutes and different solvents) lies in different data value ranges and with considering the magnitude of data for high soluble compounds which show a high absolute error for both arithmetic and logarithmic scales, the comparison between different systems for finding the system with high error is not possible. Herein, MRD% can be a helpful error metric for the comparison of different systems. This is because this metric, with its normalizing data property, puts the data in a similar and comparable range.
Correlations between various error criteria against
MRD% are shown in
Figure 2. Code 4 values were chosen as the reprehensive one for showing the correlations. As presented in
Figure 2, good correlations were observed between the
E2 and
RMSD2 error criteria vs.
MRD%, so that the data scattered around the line. However, in the case of
RMSD1 and
E1, some significant deviations were observed when the models assessed using the
MRD% criterion.
In another effort, the effect of outlier data points on the error indices’ behavior were also investigated. For this purpose, we intentionally changed a datum in several reported data sets and studied the trend of each error metric. Code 4 values were, again, chosen as the reprehensive one for showing the correlation. For example, the solubility data value for dapsone in the mixture of ethanol + water (SN = 5) at 298.2 K in the ethanol mass fraction of 0.5 (i.e., 0.000930) was changed to 0.00930. The error increase was from 8.9% to 10.5 for MRD%, from 24.74 to 83.28 for 105 RMSD1, from 1.37 to 2.24 for E1, from 11.52 to 24.46 for 100 RMSD2, and from 0.089 to 0.11 for E2. In another investigation, the solubility data value for naproxen in the mixture of ethylene glycol and ethanol at 298.2 K in an ethylene glycol mass fraction of 0.5 (i.e., 0.0135) was changed to 1.35. The error increase was from 1.7% to 15.46 for MRD%, from 27.36 to 17959.59 for 105 RMSD1, from 1.97 to 260.33 for E1, from 2.4 to 60.30 for 100 RMSD2, and from 0.02 to 0.2 for E2. Large deviations were observed for overestimated/underestimated data points, but all error criteria could be employed to detect outliers.