Next Article in Journal
Structure and Cytotoxicity of Novel Lignans and Lignan Glycosides from the Aerial Parts of Larrea tridentata
Next Article in Special Issue
Evaluating High-Variance Leaves as Uncertainty Measure for Random Forest Regression
Previous Article in Journal
Design Principles of Large Cation Incorporation in Halide Perovskites
Previous Article in Special Issue
Rapid Identification of Potential Drug Candidates from Multi-Million Compounds’ Repositories. Combination of 2D Similarity Search with 3D Ligand/Structure Based Methods and In Vitro Screening
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Improved Lipophilicity and Aqueous Solubility Prediction with Composite Graph Neural Networks

1
Department of Pharmaceutical Chemistry, University of Vienna, Althanstraße 14, A-1090 Vienna, Austria
2
Servier Research Institute-CentEx Biotechnology, 125 Chemin de Ronde, 78290 Croissy-sur-Seine, France
3
Inte:Ligand Software Entwicklungs und Consulting GmbH, 74B/11 Mariahilferstrasse, 1070 Vienna, Austria
*
Author to whom correspondence should be addressed.
Molecules 2021, 26(20), 6185; https://doi.org/10.3390/molecules26206185
Submission received: 30 August 2021 / Revised: 30 September 2021 / Accepted: 8 October 2021 / Published: 13 October 2021

Abstract

:
The accurate prediction of molecular properties, such as lipophilicity and aqueous solubility, are of great importance and pose challenges in several stages of the drug discovery pipeline. Machine learning methods, such as graph-based neural networks (GNNs), have shown exceptionally good performance in predicting these properties. In this work, we introduce a novel GNN architecture, called directed edge graph isomorphism network (D-GIN). It is composed of two distinct sub-architectures (D-MPNN, GIN) and achieves an improvement in accuracy over its sub-architectures employing various learning, and featurization strategies. We argue that combining models with different key aspects help make graph neural networks deeper and simultaneously increase their predictive power. Furthermore, we address current limitations in assessment of deep-learning models, namely, comparison of single training run performance metrics, and offer a more robust solution.

Graphical Abstract

1. Introduction

Oral bio-availability, drug uptake, and ADME-related properties of small molecules are key properties in pharmacokinetics. For drugs to reach their intended target, they need to pass through several barriers either by passive diffusion or carrier-mediated uptake typically mediated by lipophilicity and aqueous solubility. Compounds with poor solubility are unable to achieve that and, therefore, pose a higher risk in attrition and overall cost during development [1].
Methods based on deep-learning have proven successful in predicting molecular properties [2] and are becoming more and more a routine part of the modern computer-aided drug design toolbox for molecular design and med-chem decision support. Since molecules can be represented as graphs, an obvious approach is to employ a graph-based architecture for deep-learning, which leads to the utilization of graph-based neural networks (GNNs). These kinds of networks are capable of learning representations for a specific task in an automated way and, therefore, can eliminate the complicated feature engineering process where domain specialists have to select the list of descriptors themselves [3]. They became increasingly popular in the last few years [4,5,6] especially due to their success in chemical property prediction [5,7,8,9,10,11].
One of the first GNN models used for physicochemical property prediction was introduced by Micheli [12] in 2009. It predicted the boiling point of alkanes with a recursive architecture for structured data input and achieved an improved state-of-the-art performance. Lusci et al. [13] were the first to apply an undirected cyclic graph recurrent neural network on predicting aqueous solubility successfully. In the following years, several recurrent, spatial, and spectral graph-based neural networks were introduced [3,14,15,16]. One of them was the message passing framework, which was extended to include directed edges [3]. This network, called directed-edge message passing network (D-MPNN), is one of the most successful GNNs to predict chemical properties [1].
Despite the success, one important limitation with message passing networks is the graph isomorphism problem, meaning that they are unaware of the structural role of each node or edge [17]. Most standard GNNs, such as the D-MPNN, are incapable of distinguishing between different types of graph structures to determine whether they are topologically identical [18]. Compounds, such as naphthalene and 1,1-bi(cyclopentane), are perceived as the same structure by these networks. This can be problematic because they have vastly different chemical properties. To address this issue, graph isomophism networks (GIN), another group of GNNs, have recently received attention [18,19]. They are capable of distinguishing between these compounds by reformulating the message passing framework to incorporate the Weisfeiler–Lehman (WL) hierarchy. They try to be at least as expressive as the Weisfeiler–Lehman graph isomorphism test (WL-test) [20] and have shown good results in chemical property prediction [18,19] despite often falling short with respect to speed and accuracy to other frameworks, such as the D-MPNN [21]. Inspired by the key property of the GIN and the success of the D-MPNN framework, we combined the key characteristics of both architectures. By doing so we not only address the isomorphism problem but also incorporate one of the most successful and powerful GNN frameworks to improve lipophilicity and aqueous solubility prediction.
When comparing new machine learning architectures with previously published methods, the standard approach is to compare single performance metrics, such as root mean squared error (RMSE) values on a test set to show model performance [21,22]. This can lead to reproducibility issues as stochastic algorithms like neural networks can vary greatly in their prediction, even without changing their hyperparameters, simply by using different training, validation, test set splits or non-deterministic weight initializations [23,24]. One of the reasons for this is the complex landscape that optimizers have to navigate through in modern machine learning models. In real world applications these landscapes can have multiple local minima and it is especially hard for non-deterministic optimization algorithms like stochastic gradient descent to find the global minimum, therefore often retrieving different results when repeated [25]. This problem can be intensified by using small datasets with different random splits for training and evaluation. Such an approach can lead the optimization algorithm into different local minima and makes it almost impossible for the model to generalize [2]. It is, therefore, difficult to compare different deep-learning model architectures with each other even when using the same data [23]. Another challenge is especially prominent in the GNN domain, where the optimal features for node or edge representation are unknown. Deep-learning benchmark studies often use the same data but different representations for their input data which makes it difficult to make a fair comparison between the models [2,3].
To mitigate these problems, we use the exact same data split to train, evaluate, and test each of the used models with different node and edge features, as well as learning strategies to obtain an average performance independent of the used features and training approaches. Such a procedure is time consuming as multiple models have to be evaluated several times. Nevertheless, obtaining a better overview of the behaviour of GNNs under these different constraints will facilitate the understanding of these architectures and ultimately help advance GNNs beyond the current hype to more explainable and robust models.
Our contribution is a novel graph neural network architecture called directed edge graph isomorphism network (D-GIN). It extends the directed edge message passing (D-MPNN) framework [1] by the graph isomorphism network (GIN) [18]. An overview of the D-GIN model is shown in Figure 1. Our novel architecture shows improved performance compared to its individual, less complex networks, and we demonstrate that combining models with different key aspects help make graph neural networks deeper while simultaneously increasing their predictive power. We evaluated our models by applying different learning and featurization strategies and compared their average performance under different constraints.

2. Materials and Methods

This section gives a detailed overview of the used data, molecular representation, and the different machine learning methods used throughout this work. The most common notations are shown in Table 1.

2.1. Experimental Data

A total of 10,617 molecules annotated with experimentally derived logD and logP values or logS and logP values were used for model training and predictions. The selected molecules were derived from the Delaney lipophilicity dataset containing experimentally evaluated logD and logP values at pH 7.4 [26] and an aqueous solubility set with logS and logP values [27]. Each dataset was evaluated and molecules were neutralized in both sets. For the aqueous solubility data, salts were stripped off and molecules with logS values lower than −10.0 or higher than 0.0 were removed. The original preprocessed and post-processed data can be found in the GitHub repository [28]. The splitting of each dataset into three subsets for training, evaluation, and testing was completed randomly in a ratio of 81:9:10 for the (training, evaluation, and testing). The data splitting was performed with the same seed for each of the models to be able to compare them using the exact same training, evaluation, and test data. The minimum value of each of the logD, logP, and logS properties was used as an offset to ensure only positive property values. The resulting lipophilicity dataset consisted of 4174 compounds. In total, 3380 were used for training, 376 for evaluating and model selection, and 418 for testing. The post processed solubility dataset contained 6443 molecules. Overall, 5219 compounds were allocated for training, 579 for evaluation, and model selection, and 645 for testing.

2.2. Training Approaches

The training strategies differ in the used dataset and the training target (logD, logP, or logS). Under these constraints, seven different types of strategies were used. The first multi-task learning strategy used a combined approach of logD, logP, and logS values referred to as “logD/P/S”. Three additional multi-task strategies utilized a combination of two physicochemical properties and are notated as either “logD/P”, “logD/S”, or “logS/P”. Three other single task strategies are only learned on a single physicochemical property and are referred to as either “single task logD”, “logP”, or “logS”. When physicochemical properties from different datasets were used, the individual datasets were first split into training, evaluation, and test sets. Afterwards, each physicochemical property was evaluated and tested individually so that the evaluation and test results of the multi-task learning approaches can be compared to those with a single-task learning strategy.
When testing either single-, or multi-task models, the combined root mean squared error (RMSE) for all properties was calculated as the measure for the best model. For logP, we only used the results from either the first multi-task approach (“multi-task logS/D/P”) or the single-task approach with logP values. The reasoning behind this was to use the same test and evaluation data for all models while trying to avoid an unbalanced data bias in favor of logP values. When training with two physicochemical properties where one was logP, we only used the data that had both properties. For example, when training on the lipophilicity dataset which had logP and logD values, we did not include logP compounds from the aqueous solubility dataset and vice versa.

2.3. Molecular Graphs

A graph is defined as G = ( V , E ) , where V is a set of nodes and E denotes a set of edges. Let v V be a node with feature vector x v and e u v E be an edge pointing from u to v with feature vector x e u v . The adjacency matrix A shows the connectivity of the nodes and in our case it was binary as we did not weigh any connections. It is defined as a n × n matrix with A u v = 1 if e u v E and A u v = 0 if e u v E . We use directed, heterogeneous graphs where e u v e v u . Heterogeneous graphs contain different types of nodes and edges with their corresponding featurizations.

2.4. Molecular Featurization

Five different types of edge and vertex featurizations X were being used for the GNNs. The detailed description of x and x e can be found in Table A1, Table A2, Table A3, Table A4, Table A5 and Table A6 in the Appendix A. The feature vectors for the non-GNN models consist of 8 different settings-fingerprints (ECFP or MACCSKeys-shown in Table A7 in the Appendix A) used either in combination with standardized RDKit [29] descriptors or without the descriptors. The descriptors were a combination of all possible and standardized RDKit descriptors, which had a total length of 208. The parametrization of the ECFP was either 1024, 1536, or 2048 bits with a radius of 4. Featurization 3 (Table A1 in the Appendix A) and 4 (Table A2 in the Appendix A) only differ in the way the size of ring systems are being represented. Either as a float value calculated by 1 divided by the size of the ring or as a one-hot encoding with 10 possibilities. The node and edge featurization in 5 (Table A3 in the Appendix A) includes two node features (chemical element and formal charge) and one edge feature (bond order). Featurization 6 (Table A4 in the Appendix A) includes the same node description as 5 and the edge featurization of 3. Featurization 7 (Table A5 in the Appendix A) has the same node featurization as 3 and the same edge featurization as 5. Featurization 8 (Table A6 in the Appendix A) includes a set of optimized node and edge features. This was performed by using a trained D-GIN model and then removing one node or edge feature at a time and observing the RMSE of the prediction. The five node features and the three edge features that had the biggest impact on the RMSE were then taken as the featurization. The graphs and their featurization were implemented using python version 3.7.8 and the toolkit CDPKit [30].

2.5. Directed-Edge GIN (D-GIN) and Reference Models

D-GIN is an extension of the directed-edge message passing neural network of Yang et al. [1] without the additional feature engineering in combination with the graph isomorphic network (GIN) of Xu et al. [18]. Its high level representation can be seen in Figure 1. The principle construction of the network can be seen in the Equations (1)–(8). First, the directed edges were initialized as
h u w 0 = τ ( W i n i t ( c a t ( x u , x e u w ) ) )
followed by a t 1 , , T iteration of
m u w ( t + 1 ) = k N ( u ) / w h k u 0 , if t = = 0 . k N ( u ) / w h k u t , otherwise .
h u w ( t + 1 ) = τ ( h u w 0 + W m m u w t + 1 )
after which the messages for each directed-edge was being summed as
m u = w N ( u ) h u w T
then the message m u was being concatenated as
h u = c a t ( m u , x u ) , if D - GIN . ( W a g g ( c a t ( m u , x u ) ) , if D - MPNN .
and another message passing over l 1 , , T 2 was performed by
h u ( l ) = w N ( u ) h w , if D - GIN . x u , if GIN .
h u ( l + 1 ) = ( W a g g ( 1 + ϵ ) h u 0 + h u ( l ) )
afterwards the updated feature vectors h n o d e T of each node were aggregated over the whole molecule as
h G = h ( H ( T ) ) h .
The readout phase was then defined as y ^ = f ( h G ) where f ( · ) was a feed-forward neural network. The D-MPNN consisted of Equations (1)–(5) but then used the hidden feature vectors for each node directly by applying Equation (5) and then immediately Equation (7) to encode the whole graph as h G .
GIN on the other hand was initialized and trained, as shown in Equation (6) in order to update the hidden feature vectors of each node. After l update step, the hidden feature vector of each node served as the input of Equation (7) to achieve the aggregated representation h G for the whole graph. D-GIN used all of these functions in a combined way described above Equations (1)–(8). The main principle behind this approach was to first use the key aspect of directed-edge message passing to propagate information via directed-edges to form messages Equations (1)–(4), which then updated the hidden node features Equations (5). These updated hidden node features were then used in the GIN message passing to further propagate information Equations (6) and (7) while also learning ϵ . These two information propagation phases are the key aspects of the two different sub-architectures.

2.6. Graph Neural Network Implementation, Training, and Hyper-Parameter Search

All GNNs have been implemented and trained using python version 3.7.8 and TensorFlow 2.3.0 [31]. We used TensorFlow’s keras models as our super-class and then transferred Equations (1)–(8) into the “fit” method of the keras model. A hyper-parameter search was conducted to find the best parameters which were further used to train all models. Further details on the hyper-parameters are given in the corresponding model’s configuration files accessible via the graph_networks Github repository [28].
Each GNN model type was trained twice with either 24 different settings when training on the logD or logS property or 12 on the logP property-in total 48 or 24 training runs per model type were performed. Each non-GNN model type was trained with 8 different settings. For training, evaluation, and testing we split each of the datasets as described in Section Experimental Data. Each of the GNNs were trained for 1600 epochs and the model with the best performance was identified using RMSE as the evaluation metric on the validation set. To evaluate the model type performance, we used the model with the best RMSE of the two runs performed for each model setting. When evaluating the average model type performance, the average RMSE of the different model settings was used for the calculation. To evaluate models with several properties, we summed all RMSEs. For example, when using logD and logP for training, we summed the RMSE of the logD and logP prediction on the evaluation set to receive a combined RMSE. When the combined RMSE was below the last best combined RMSE, the model weights were saved. We used these models to test the model on the test set. Each model was run two times and the results with the best test set performance were taken.
Additionally, the 95% confidence interval range was calculated by applying bootstrapping 100 times while leaving out 10% of the test dataset.
To generate consensus models between GNN and non-GNN models, we combined the best GNN model for each physicochemical property with the best non-GNN model. We did this by adding the predicted log values of one model with the other and then divided it by two. These hybrid models are then called according to their GNN model type plus consensus (e.g., D-GIN cons.).

2.7. Other Machine Learning Approaches

We used the random forest (RF), support vector machine (SVM), and k-nearest neighbor (K-NN) implementations of scikit-learn (Version 0.23.2 [32]). Default hyperparameters were used. The featurization is described in Table A7. When using descriptors as input, we standardized them with the scikit-learn StandardScaler. For the fingerprints and descriptors we used version 2020.09.2 of the RDKit [29] python package. Each of the models were trained in a single-task manner for each of the property values.

2.8. Hardware and Run-Time

Calculations were performed on machines within the Department of Pharmaceutical Sciences at the University of Vienna, Austria. We ran each model on a single CPU (Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz). The run-time to fit the used RF, SVM, and KNN models with 3380 compounds on logD property values is approximately 50 s (RF), 25 s (SVM), and 0.5 s (KNN). When training the GNN model types on the 3380 logD compounds it takes for each epoch approximately 56 s (D-GIN), 35 s (D-MPNN), and 28 s (GIN).

3. Results and Discussion

For clarity, we define certain terms used throughout this publication that might have ambiguous meanings. The term “model type” refers to different kinds of machine learning algorithms. For example, a model type can be RF, SVM, KNN, D-GIN, GIN, or D-MPNN. The term “model” refers to a trained model instance with particular training and featurization strategies. The term “training strategy” is used to distinguish between different single- and multi-task training approaches trained with a combination of molecular properties. For example, logD/S/P is used to show that logD, logS, and logP values were used during training. The term “featurization strategy” is used to describe the different node and edge features utilized for the models to train on (Table A1, Table A2, Table A3, Table A4, Table A5 and Table A6 in the Appendix A). In addition, we distinguish between consensus (cons.) and non-consensus models. These hybrid models are a combination of the best GNN and best non-GNN models (SVM and D-GIN for logD and logS and RF and D-GIN for logP). To obtain consensus predictions, the predicted property values of the two models were combined and averaged. The averaged values were used as “new” predictions for the RMSE calculation and referred to their GNN model type plus cons (e.g., D-GIN cons).
Overall, 6 different machine learning model types were used in this study. The three GNN model types were D-MPNN, GIN, and D-GIN. The three non-GNN model types were random forest (RF) regression, support vector machines (SVM), and the k-nearest-neighbor (KNN) algorithm. Each model type was trained with the same hyperparameters, but 7 different learning strategies and 6 different node/edge featurization strategies. We trained each GNN model type for each physico-chemical property with all possible strategies twice. Subsequently, the best performing model from each of the two runs (measured on the evaluation set) was selected resulting in 24 models for the logD and logS property and 12 for the logP property, which were then used on the test set and their performance was reported.
The results of this approach are reported and discussed in two parts. First, we discuss different GNNs and non-GNN methods used in this work to identify the best performing model type according to its average performance across all used strategies (discussed in Section General Model Performance). Subsequently, we investigate the impact of the 6 different training strategies (i.e., multi-task vs. single task learning), as well as different featurizations on the performance (discussed in Sections Impact of Molecular Featurization and Impact of Training Strategies).
A dataset of 10,617 molecular structures with annotations for one of the three physico-chemical properties was assembled for model training, evaluation, and testing. It included 4174 logD, 6443 logS, and 10,617 logP experimentally measured values. The same training, evaluation, test set was used for all GNN and non-GNN model types.

3.1. General Model Performance

In the following, the reported results vary by the used model type. Each combination of featurization and training strategy was used to calculate a total of 24 RMSE values for the logD and logS property, and 12 for the logP property per model type. This resulted in a RMSE distribution shown in Table 2 and Figure 2. For each of these distributions, the average, minimum, and maximum RMSE was calculated and will be reported and discussed subsequently.
Table 2 shows the RMSE distribution average of the different machine learning model types regardless of their training and featurization strategy on the hold-out test set. For each value the standard error of the mean was calculated and added.
For logD property prediction, the D-GIN model type performed with mean, minimum, and maximum logD RMSE of 0.615 ± 0.039, 0.553, and 0.7048, and the corresponding consensus model with 0.575 ± 0.0192, 0.548, and 0.622, making it the best performing model type (results shown in Table 2, and Figure 2). The consensus GIN performed on average (distribution mean of logD RMSE values of 0.666 ± 0.029) better than the best non-GNN method (distribution mean logD RMSE of 0.740 ± 0.068).
For the logS prediction, the best model type was the D-GIN consensus model with a average RMSE value of 0.738 ± 0.028 (shown in Table 2 and Figure 2). It performed on average better than the best performing non-GNN model type (SVM), which performed with an average RMSE value of 1.006 ± 0.154 (but it also had a single run with a RMSE value of 0.729 making it the model type with the best single run performance and highlighting the importance of multiple repetitions for reporting model type performances). The consensus D-MPNN also outperformed the D-GIN.
The consensus D-GIN (average RMSE value of 0.455 ± 0.028) and consensus D-MPNN (average RMSE value of 0.475 ± 0.027) showed the best average performance for logP prediction (Table 2, and Figure 2). The RF and SVM model types also performed with low minimum RMSE values of 0.470 and 0.493, respectively. However, their average RMSE values (RF: 0.681 ± 0.224 and SVM: 0.693 ± 0.134) were higher than the D-GIN and D-MPNN model types.
Consensus models are often used in deep learning applications typically combining either different models that were trained on slightly different training data or multiple model architectures with different strengths and weaknesses. Nevertheless, further investigations are required to give a rationale of why in all our invested cases, the consensus models performed better than their individual counterparts. Furthermore, it should be noted that a direct comparison between the average performance of the GNNs and non-GNN models (RF, SVM, and KNN) can be difficult since the amount of information about a single molecule fed to each of the different model classes is quite different. For example, the non-GNN methods used a wide range of different descriptors and fingerprints shown in Table A7.
Figure 3, Figure 4 and Figure 5 show the best performing model architectures for prediction of each physicochemical property. Each plot shows the RMSE values for each GNN model applying all training and featurization strategies. It should be noted that the performance of many model types with different training or features do not significantly differ from each other and their CI overlap. Some trends are still visible: in Figure 3, Figure 4 and Figure 5, regardless of the physicochemical property, the D-GIN model type (shown in blue) performs overall better than the D-MPNN (shown in orange) or the GIN (shown in green).
The reason why the D-GIN outperforms the GIN and D-MPNN could be its higher complexity and network depth. It uses the key aspects of both sub-models and might be able to better abstract higher-order features. This could be facilitated by including skip connections between edge feature extraction mainly performed in the first (D-MPNN) and node feature extraction while learning ϵ in the second (GIN) part. This increased complexity could have helped to perform better than its individual parts.

3.2. Impact of Molecular Featurization

The average performance of each featurization strategy across all model types and training strategies is shown in Table 3. Considering the performance for all physicochemical properties, featurization strategy 5 showed the highest RMSE (mean logD/logS/logP RMSE of 0.813 ± 0.099, 1.099 ± 0.180, and 0.760 ± 0.110). This trend was also observed when separating according to the model type (shown in Table 4 and Figure 6). The reason for the relatively bad performance of featurization 5 might be that it only included two node properties (chemical element and formal charge), as well as only a single edge feature (bond order-Table A3 in the Appendix A).
Featurization 6 (Table A4 in the Appendix A) also displayed considerably worse performance than other strategies when used in combination with the GIN architecture, for which the mean RMSE performance for logD and logS properties were worse than using featurization strategy 5. One explanation could be that the GIN utilizes node features quite extensively and featurization 6 only included two node feature types similar to featurization 5. The additional edge features in strategy 6 without the appropriate architecture to deal with them could push the optimizer of the GIN network into the wrong direction rather than help with the property prediction.
Although it is easy to identify bad featurization strategies, it is difficult to come up with an unambiguous recommendation for the best performing featurization strategy. The mean RMSE across all training strategies and model types in Table 3 show that featurization 3 and 4 (Table A1 and Table A2 in the Appendix A) achieved very good results for logD with a RMSE value of 0.689 ± 0.079 and 0.694 ± 0.072, for logS with a RMSE 0.954 ± 0.146 and 0.948 ± 0.142 and for logP with a RMSE 0.596 ± 0.120 and 0.591 ± 0.105, respectively. Both featurization strategies utilize the maximum number of node and edge features used in this work. They only differ in the way molecular ring sizes are described. Featurization 3 used a float value calculated by 1 divided by the size of the ring system whereas featurization 4 used a one-hot encoding of ten instances (0,3,4,5,6,7,8,9,10,11).
Table 4 shows the mean RMSE values concerning featurization and model type. As performance criteria for featurization strategies we used the sum of model ranks in Table 4. Applying this approach, featurization 3 with two models as best and three models as second-best performers achieved a better ranking than featurization 4 with one model ranked best and two models as second best. Both strategies perform similarly well. Featurization 8 (shown in Table A6 in the Appendix A) used a set of optimized node and edge features. Node and edge features were optimized by masking single edge and node features at a time and evaluating their impact on the test set RMSE. The five node features and the three edge features that had the biggest impact on the RMSE were subsequently used. This approach also revealed that the size of ring systems for the node features appears to be of importance and was, therefore, included in 8. Using featurization 8, we were able to achieve two times the second-best performance. It shows an average good performance, but not as good as featurizations 3 or 4, even though its edge and node features were selected for maximum impact on the final prediction. The mean RMSE of featurization 6 and 7 (Table A5 in the Appendix A) in Table 3 show diminished results compared to featurization 3 and 4.
When evaluating the rank score, the featurization strategy that performs either best or second-best for each physicochemical property, the best featurization strategy was number 6. It was used in four of the best performing runs and once in a second-best run. However, it only performed well in combination with two GNN architectures (D-GIN and D-MPNN) and strongly underperformed with the GIN. The D-GIN and D-MPNN architecture types use primarily edge features for their information propagation and featurization strategy 6 provided these. It utilized only two-node feature types, potentially reducing the noise for the feature extraction to a minimum in this setting.
On average, featurization strategies 6 and 7 performed similarly well. However, when separating the results at a model type level, it became evident that there was a strong model architecture dependency, so it seems important to choose the features according to the architecture at hand (Figure 3, Figure 4 and Figure 5). Furthermore, featurization 3 might perform worse than featurization 6 or 7. Nevertheless, when unsure which features to use, simply adding more features could be the safer option rather than using less. This observation is also supported by comparing featurization 3 or 4 to, e.g., 6, 7, or 8.
When analyzing the results for the non-GNN models and their different featurizations, the mean RMSE variance was large in comparison to the GNN models. Moreover, in similar deep-learning benchmark studies that predicted molecular properties, predominantly fingerprints have been used. From Table A13, Table A14 and Table A15 in the Appendix A, one can see that especially featurizations that include descriptors in addition to fingerprints perform exceptionally well. We think that when comparing GNN with non-GNN models, differences in used features should be taken into consideration when trying to identify and understand (deep-learning) method performance.

3.3. Impact of Training Strategies

The impact of different training strategies are shown in Table 3. The lowest mean logD RMSE can be obtained by a multi-task strategy that involves learning on both logD and logP values. This is similar to the best training strategy for the logS property, which is a multi-task approach including logS and logP properties. As for the logP property, the best approach is a single-task strategy including logP values, however the multi-task approach which combines all physicochemical properties achieves similarly good performance.
When analyzing the logD/S/P RMSE predictions with respect to training strategy and model type, Table 5 and Figure 7, Figure 8 and Figure 9 show that there is no particularly favorable learning strategy for any of the model types. The datasets used in this study are specific for one particular physicochemical property. When comparing different learning strategies we thus focused on one particular physicochemical property for each model type. Starting with the results for the prediction of the logD property in Table 5, we can see that the overall best model (red asterisk), as well as the two best models for each model type (dark gray), are multi-task models. In particular, the models with a combination of logD and logP properties perform well.
Considering all combinations of training and featurizations strategies for each model, the learning strategy with the best average, as well as the best minimum logD RMSE was obtained using the logD/P multi-task training approach resulting in RMSE values of 0.719 ± 0.105 and 0.553, respectively (Table 3. Yet, using this multi-task learning strategy we also obtained single run performance worse than using a single-task learning strategy with only logD values, showcasing once more the importance of validating multiple learning and featurization strategies. The results are similar for the prediction of logS values: again, the multi-task learning strategy performs better than its single task counterpart. The best model for logS prediction was obtained by training on logD, logS, and logP values. Considering all combinations of training and featurization strategies for each model, the best average, minimum, and maximum logS RMSE of 0.979 ± 0.166, 0.795, and 1.325, respectively, was observed during the multi-task training with all properties. We should note here that while it seems that the average performance is improved by multi-task learning, the variance of model performance is also increased.

4. Conclusions

We introduced the directed-edge graph isomorphism network (D-GIN), a novel graph neural network that showed improved average performance for aqueous solubility and lipophilicity prediction compared with other baseline models. We showed that by combining different models with distinct key characteristics, we can increase the depth of the model while also improving its predictive power. Furthermore, applying different training strategies and featurizations constraints enables to obtain more information regarding general, average model performance. This strategy showed that the D-GIN model outperforms other machine-learning models on average and argued that comparing the mean performance rather than single metric values of the best performing model type gives more insight into the general behavior and ultimately facilitates a better understanding and higher robustness of deep-learning models.
In concurrence with previous publications [33,34,35], we showed that there is a tendency towards multi-task learning approaches for the GNNs utilized in this survey. On average they performed better than their single-task counterpart for the corresponding physicochemical property. We could not find clear evidence that more than two properties increase the model’s performance.
Furthermore, we highlighted that the usage of additional features did not improve the GNN model performance. However, we also conclude that very little featurization led to the worst performance. In general it is necessary to be aware of the type of GNN that is used and whether its architecture focuses more on edge or node features. When trying to obtain the best performing model it can be advisable to do feature engineering, but when in doubt which features to use, it can be safer to use more than less. We showed that this awareness can help improve the GNNs predictive power at hand.
For the non-GNN models, we could conclude that by excessively adding descriptors to the molecular fingerprint the performance of these models increases substantially. We further argued that for future comparisons it would be advisable to include not only fingerprints but also descriptors to the non-GNN baseline models to be more competitive.
By combining the best GNN model with the best non-GNN model we could see a slight improvement in the overall performance in all cases. Consensus models have often shown to improve performance. However, in this case, further investigations are needed to attain to a conclusion on why this is the case.
We showed that advanced deep-learning methods such as GNNs do have great potential in the physicochemical property prediction area and, when applied properly, can serve as a promising and robust method for any computer-aided drug discovery pipeline, especially for chemical property prediction.

Author Contributions

Conceptualization: O.W., M.W., T.S. and T.L. Methodology: O.W. Software: O.W. and T.S. Investigation: O.W. Writing—Original Draft: O.W. Writing—Review and Editing: O.W., M.K., M.W., S.D.B., T.L. and T.S. Funding Acquisition: T.L. and C.M. Resources: T.L. and C.M. Supervision: T.S. and T.L. All authors have given approval to the final version of the manuscript.

Funding

This research received no external funding.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Python package used in this work (release v0.1): https://github.com/spudlig/graph_networks. Data is available on https://zenodo.org/record/5137613#.YQortyWxVhG and was accessed on 26 July 2021.

Acknowledgments

The authors thank Servier Research Institute (IDRS), Inte:Ligand GmbH and the University of Vienna for financial and advisory support. Open Access Funding by the University of Vienna.

Conflicts of Interest

There are no conflict of interest or disclosure associated with this manuscript.

Appendix A

The appendix includes informational materials that show the featurization of the GNN and non-GNN baseline models in Table A7, in Figure A1 and Figure A2 the individual models and their corresponding names can be seen. These are the same Figures as in the main body but include the unique identifiers. These identifiers show what kind of model type, featurization and training approach was used when looked up in Table A8, Table A9, Table A10, Table A11, Table A12, Table A13, Table A14, Table A15, Table A16, Table A17, Table A18 and Table A19. The run-time to fit the used RF, SVM, and KNN models with 3380 compounds on logD is approximately 50 s (RF), 25 s (SVM), and 0.5 s (KNN). When training the GNN model types on the 3380 logD compounds it takes for each epoch approximately 56 s (D-GIN), 35 s (D-MPNN), and 28 s (GIN).
Table A1. Node and edge featurization of type 3. Each node or edge featurization vector consisted of a concatenation of the different one-hot encoded or floating point feature vectors according to their possible states (if present) and corresponding size-36 for nodes and 20 for edges.
Table A1. Node and edge featurization of type 3. Each node or edge featurization vector consisted of a concatenation of the different one-hot encoded or floating point feature vectors according to their possible states (if present) and corresponding size-36 for nodes and 20 for edges.
FeaturePossible StatesSize
chemical elementH, C, N, O, S, F, P, Cl, Br, I10
calculated formal charge−2, −1, 0, 1, 25
CIP configurationR, S, None, either4
hybridization statesp, sp2, sp3, sp3d, sp3d2, none6
amide centeryes, no2
present in aromatic ringyes, no2
ring size1/size1
nr. of hydrogens0, 1, 2, 3, 4, 56
bond order1, 2, 33
conjugatedyes, no2
rotate−ableyes, no2
amide bondyes, no2
present in aromatic ringyes, no2
present in ringyes, no2
ring size1/size1
CIP configurationnone, E, Z, trans, cis, either6
Table A2. Node and edge featurization of type 4. Each node or edge featurization vector consisted of a concatenation of the different one-hot encoded or floating point feature vectors according to their possible states (if present) and corresponding size-45 for nodes and 29 for edges.
Table A2. Node and edge featurization of type 4. Each node or edge featurization vector consisted of a concatenation of the different one-hot encoded or floating point feature vectors according to their possible states (if present) and corresponding size-45 for nodes and 29 for edges.
FeaturePossible StatesSize
chemical elementH, C, N, O, S, F, P, Cl, Br, I10
calculated formal charge−2, −1, 0, 1, 25
CIP configurationR, S, None, either4
hybridization statesp, sp2, sp3, sp3d, sp3d2, none6
amide centeryes, no2
present in aromatic ringyes, no2
ring size0, 3, 4, 5, 6, 7, 8, 9, 10, 111
nr. of hydrogens0, 1, 2, 3, 4, 56
bond order1, 2, 33
conjugatedyes, no2
rotate-ableyes, no2
amide bondyes, no2
present in aromatic ringyes, no2
present in ringyes, no2
ring size0, 3, 4, 5, 6, 7, 8, 9, 10, 111
CIP configurationnone, E, Z, trans, cis, either6
Table A3. Node and edge featurization of type 5. Each node or edge featurization vector consisted of a concatenation of the different one-hot encoded or floating point feature vectors according to their possible states (if present) and corresponding size-15 for nodes and 3 for edges.
Table A3. Node and edge featurization of type 5. Each node or edge featurization vector consisted of a concatenation of the different one-hot encoded or floating point feature vectors according to their possible states (if present) and corresponding size-15 for nodes and 3 for edges.
FeaturePossible StatesSize
chemical elementH, C, N, O, S, F, P, Cl, Br, I10
calculated formal charge−2, −1, 0, 1, 25
CIP configuration--
hybridization state--
amide center--
present in aromatic ring--
ring size--
nr. of hydrogens--
bond order1, 2, 33
conjugated--
rotate-able--
amide bond--
present in aromatic ring--
present in ring--
ring size--
CIP configuration--
Table A4. Node and edge featurization of type 6. Each node or edge featurization vector consisted of a concatenation of the different one-hot encoded or floating point feature vectors according to their possible states (if present) and corresponding size-15 for nodes and 20 for edges.
Table A4. Node and edge featurization of type 6. Each node or edge featurization vector consisted of a concatenation of the different one-hot encoded or floating point feature vectors according to their possible states (if present) and corresponding size-15 for nodes and 20 for edges.
FeaturePossible StatesSize
chemical elementH, C, N, O, S, F, P, Cl, Br, I10
calculated formal charge−2, −1, 0, 1, 25
CIP configuration--
hybridization state--
amide center--
present in aromatic ring--
present in ring--
nr. of hydrogens--
bond order1, 2, 33
conjugatedyes, no2
rotate-ableyes, no2
amide bondyes, no2
present in aromatic ringyes, no2
present in ringyes, no2
ring size1/size1
CIP configurationnone, E, Z, trans, cis, either6
Table A5. Node and edge featurization of type 7. Each node or edge featurization vector consisted of a concatenation of the different one-hot encoded or floating point feature vectors according to their possible states (if present) and corresponding size-36 for nodes and 3 for edges.
Table A5. Node and edge featurization of type 7. Each node or edge featurization vector consisted of a concatenation of the different one-hot encoded or floating point feature vectors according to their possible states (if present) and corresponding size-36 for nodes and 3 for edges.
FeaturePossible StatesSize
chemical elementH, C, N, O, S, F, P, Cl, Br, I10
calculated formal charge−2, −1, 0, 1, 25
CIP configurationR, S, None, either4
hybridization statesp, sp2, sp3, sp3d, sp3d2, none6
amide centeryes, no2
present in aromatic ringyes, no2
ring size0, 3, 4, 5, 6, 7, 8, 9, 10, 111
nr. of hydrogens0, 1, 2, 3, 4, 56
bond order1, 2, 33
conjugated--
rotate-able--
amide bond--
present in aromatic ring--
present in ring--
ring size--
CIP configuration--
Table A6. Node and edge featurization of type 8. Each node or edge featurization vector consisted of a concatenation of the different one-hot encoded or floating point feature vectors according to their possible states (if present) and corresponding size-26 for nodes and 7 for edges.
Table A6. Node and edge featurization of type 8. Each node or edge featurization vector consisted of a concatenation of the different one-hot encoded or floating point feature vectors according to their possible states (if present) and corresponding size-26 for nodes and 7 for edges.
FeaturePossible StatesSize
chemical elementH, C, N, O, S, F, P, Cl, Br, I10
formal charge−2, −1, 0, 1, 25
CIP priority ruleR, S, None, either4
hybridization statesp, sp2, sp3, sp3d, sp3d2, none6
amide center--
aromaticity--
ring sizefloat (1/size)1
nr. of hydrogens--
bond order1, 2, 33
conjugated--
rotate-ableyes, no2
amide bond--
aromaticity--
present in ringyes, no2
ring size--
CIP priority rule--
Table A7. Non-GNN featurization. The identifier is used as reference.
Table A7. Non-GNN featurization. The identifier is used as reference.
IdentifierFingerprintRadiusnr. BitsDescriptor
10ECFP41024No
11ECFP41536No
12ECFP42048No
13MACCSKeys--No
14ECFP41024Yes
15ECFP41536Yes
16ECFP42048Yes
17MACCSKeys--Yes
Figure A1. The first y-axis specifies the logD RMSE and the secondary x-axis the corresponding r 2 values for each GNN model. The D-GIN is colored blue, the D-MPNN orange and the GIN green. Each of the boxes represents one different model run. The kernel density of each model is shown on the very left side. The red lines correspond to the 95% confidence intervals. The model names are a combination of model type (D-GIN, GIN, D-MPNN), training approach and featurization type-a detailed description of each model name can be found in Table A8, Table A9, Table A10, Table A11, Table A12, Table A13, Table A14, Table A15, Table A16, Table A17, Table A18 and Table A19 in the Appendix A.
Figure A1. The first y-axis specifies the logD RMSE and the secondary x-axis the corresponding r 2 values for each GNN model. The D-GIN is colored blue, the D-MPNN orange and the GIN green. Each of the boxes represents one different model run. The kernel density of each model is shown on the very left side. The red lines correspond to the 95% confidence intervals. The model names are a combination of model type (D-GIN, GIN, D-MPNN), training approach and featurization type-a detailed description of each model name can be found in Table A8, Table A9, Table A10, Table A11, Table A12, Table A13, Table A14, Table A15, Table A16, Table A17, Table A18 and Table A19 in the Appendix A.
Molecules 26 06185 g0a1
Figure A2. The first y-axis specifies the logP RMSE and the secondary x-axis the corresponding r 2 values for each GNN model. The D-GIN is colored blue, the D-MPNN orange and the GIN green. Each of the boxes represents one different model run. The kernel density of each model is shown on the very left side. The red lines correspond to the 95% confidence intervals. The model names are a combination of model type (D-GIN, GIN, D-MPNN), training approach and featurization type-a detailed description of each model name can be found in Table A8, Table A9, Table A10, Table A11, Table A12, Table A13, Table A14, Table A15, Table A16, Table A17, Table A18 and Table A19 in the Appendix A.
Figure A2. The first y-axis specifies the logP RMSE and the secondary x-axis the corresponding r 2 values for each GNN model. The D-GIN is colored blue, the D-MPNN orange and the GIN green. Each of the boxes represents one different model run. The kernel density of each model is shown on the very left side. The red lines correspond to the 95% confidence intervals. The model names are a combination of model type (D-GIN, GIN, D-MPNN), training approach and featurization type-a detailed description of each model name can be found in Table A8, Table A9, Table A10, Table A11, Table A12, Table A13, Table A14, Table A15, Table A16, Table A17, Table A18 and Table A19 in the Appendix A.
Molecules 26 06185 g0a2
Figure A3. The first y-axis specifies the logS RMSE and the secondary x-axis the corresponding r 2 values for each GNN model. The D-GIN is colored blue, the D-MPNN orange and the GIN green. Each of the boxes represents one different model run. The kernel density of each model is shown on the very left side. The red lines correspond to the 95% confidence intervals. The model names are a combination of model type (D-GIN, GIN, D-MPNN), training approach and featurization type-a detailed description of each model name can be found in Table A8, Table A9, Table A10, Table A11, Table A12, Table A13, Table A14, Table A15, Table A16, Table A17, Table A18 and Table A19 in the Appendix A.
Figure A3. The first y-axis specifies the logS RMSE and the secondary x-axis the corresponding r 2 values for each GNN model. The D-GIN is colored blue, the D-MPNN orange and the GIN green. Each of the boxes represents one different model run. The kernel density of each model is shown on the very left side. The red lines correspond to the 95% confidence intervals. The model names are a combination of model type (D-GIN, GIN, D-MPNN), training approach and featurization type-a detailed description of each model name can be found in Table A8, Table A9, Table A10, Table A11, Table A12, Table A13, Table A14, Table A15, Table A16, Table A17, Table A18 and Table A19 in the Appendix A.
Molecules 26 06185 g0a3
Table A8. Shows the log D RMSE and r 2 results for the D-GIN model type used during this survey. The last column consist of the unique identify. Training strategy 0 represents a training strategy combining logD, logS, and logP, 1 represents a strategy combining logD and logP, 2 stands for a combination of logD and logS, 3 represents a strategy using logS and logP, 4 represents a strategy using logD, 5 represents a strategy using logS, and 6 represents a strategy using logP.
Table A8. Shows the log D RMSE and r 2 results for the D-GIN model type used during this survey. The last column consist of the unique identify. Training strategy 0 represents a training strategy combining logD, logS, and logP, 1 represents a strategy combining logD and logP, 2 stands for a combination of logD and logS, 3 represents a strategy using logS and logP, 4 represents a strategy using logD, 5 represents a strategy using logS, and 6 represents a strategy using logP.
ModelTrainingFeaturizationlogD r 2 Unique
TypeStrategyStrategyRMSE ID
D-GIN250.679 ± 0.0340.652 ± 0.035dg522
D-GIN280.596 ± 0.0260.728 ± 0.021dg822
D-GIN240.587 ± 0.0290.736 ± 0.020dg422
D-GIN230.582 ± 0.0350.746 ± 0.027dg322
D-GIN030.582 ± 0.0380.744 ± 0.031dg302
D-GIN060.581 ± 0.0280.755 ± 0.021dg602
D-GIN040.598 ± 0.0290.728 ± 0.027dg402
D-GIN430.601 ± 0.0570.731 ± 0.052dg341
D-GIN080.579 ± 0.0430.745 ± 0.033dg801
D-GIN130.575 ± 0.0440.749 ± 0.035dg312
D-GIN180.605 ± 0.0530.722 ± 0.046dg811
D-GIN460.596 ± 0.0580.729 ± 0.062dg642
D-GIN140.606 ± 0.0520.725 ± 0.053dg411
D-GIN170.615 ± 0.0510.713 ± 0.045dg712
D-GIN440.622 ± 0.0650.707 ± 0.065dg442
D-GIN270.626 ± 0.0230.704 ± 0.022dg721
D-GIN480.629 ± 0.0680.700 ± 0.068dg841
D-GIN470.645 ± 0.0430.685 ± 0.048dg741
D-GIN450.660 ± 0.0670.669 ± 0.071dg542
D-GIN070.661 ± 0.0390.666 ± 0.034dg701
D-GIN150.685 ± 0.0520.643 ± 0.074dg512
D-GIN160.553 ± 0.0490.767 ± 0.053dg612
D-GIN050.704 ± 0.0270.632 ± 0.024dg502
D-GIN260.592 ± 0.0300.734 ± 0.024dg622
D-GIN cons.250.605 ± 0.0320.719 ± 0.031dg522_cons
D-GIN cons.050.622 ± 0.0300.704 ± 0.027dg502_cons
D-GIN cons.070.603 ± 0.0300.722 ± 0.025dg701_cons
D-GIN cons.150.609 ± 0.0560.714 ± 0.062dg512_cons
D-GIN cons.160.548 ± 0.0510.769 ± 0.049dg612_cons
D-GIN cons.470.589 ± 0.0450.734 ± 0.043dg741_cons
D-GIN cons.130.561 ± 0.0470.758 ± 0.039dg312_cons
D-GIN cons.240.557 ±0.0290.762 ± 0.023dg422_cons
D-GIN cons.430.557 ± 0.0520.762 ± 0.044dg341_cons
D-GIN cons.230.549 ± 0.0340.769 ± 0.026dg322_cons
D-GIN cons.450.590 ± 0.0590.733 ± 0.055dg542_cons
D-GIN cons.080.562 ± 0.0320.758 ± 0.025dg801_cons
D-GIN cons.040.563 ± 0.0310.757 ± 0.027dg402_cons
D-GIN cons.170.566 ± 0.0550.754 ± 0.044dg712_cons
D-GIN cons.140.567 ± 0.0510.754 ± 0.041dg411_cons
D-GIN cons.030.562 ± 0.0330.758 ± 0.024dg302_cons
D-GIN cons.480.575 ± 0.0520.746 ± 0.053dg841_cons
D-GIN cons.280.568 ± 0.0300.753 ± 0.026dg821_cons
D-GIN cons.270.580 ± 0.0270.742 ± 0.023dg721_cons
D-GIN cons.180.580 ± 0.0530.742 ± 0.047dg811_cons
D-GIN cons.440.577 ± 0.0520.744 ± 0.041dg442_cons
D-GIN cons.460.568 ± 0.0540.751 ± 0.051dg642_cons
D-GIN cons.060.569 ± 0.0290.751 ± 0.024dg602_cons
D-GIN cons.260.571 ± 0.0310.750 ± 0.024dg622_cons
Table A9. Shows the logD RMSE and r 2 results for the D-MPNN model type used during this survey. The last column consists of the unique identifier. Training strategy 0 represents a training strategy combining logD, logS, and logP, 1 represents a strategy combining logD and logP, 2 stands for a combination of logD and logS, 3 represents a strategy using logS and logP, 4 represents a strategy using logD, 5 represents a strategy using logS, and 6 represents a strategy using logP.
Table A9. Shows the logD RMSE and r 2 results for the D-MPNN model type used during this survey. The last column consists of the unique identifier. Training strategy 0 represents a training strategy combining logD, logS, and logP, 1 represents a strategy combining logD and logP, 2 stands for a combination of logD and logS, 3 represents a strategy using logS and logP, 4 represents a strategy using logD, 5 represents a strategy using logS, and 6 represents a strategy using logP.
ModelTrainingFeaturizationlogD r 2 Unique
TypeStrategyStrategyRMSE ID
D-MPNN170.7836 ± 0.0650.532 ± 0.087dmp711
D-MPNN160.686 ± 0.0540.645 ± 0.071dmp612
D-MPNN150.857 ± 0.0600.442 ± 0.089dmp511
D-MPNN230.759 ± 0.0370.563 ± 0.039dmp322
D-MPNN140.692 ± 0.0690.634 ± 0.080dmp411
D-MPNN250.911 ± 0.0290.371 ± 0.035dmp521
D-MPNN260.721 ± 0.0290.606 ± 0.037dmp621
D-MPNN040.721 ± 0.0360.608 ± 0.034dmp401
D-MPNN080.712 ± 0.0390.613 ± 0.037dmp802
D-MPNN480.713 ± 0.0430.614 ± 0.057dmp842
D-MPNN180.724 ± 0.0440.607 ± 0.062dmp811
D-MPNN440.719 ± 0.0570.614 ± 0.067dmp441
D-MPNN460.719 ± 0.0560.610 ± 0.076dmp642
D-MPNN430.731 ± 0.0440.594 ± 0.062dmp342
D-MPNN060.724 ± 0.0300.601 ± 0.040dmp602
D-MPNN130.703 ± 0.0840.625 ± 0.069dmp311
D-MPNN030.719 ± 0.0400.618 ± 0.034dmp302
D-MPNN280.788 ± 0.0270.532 ± 0.033dmp822
D-MPNN240.728 ± 0.0410.600 ± 0.039dmp422
D-MPNN270.823 ± 0.0270.493 ± 0.039dmp721
D-MPNN470.804 ± 0.0580.517 ± 0.089dmp742
D-MPNN450.881 ± 0.0510.417 ± 0.077dmp542
D-MPNN070.812 ± 0.0350.512 ± 0.037dmp702
D-MPNN050.864 ± 0.0260.432 ± 0.035dmp502
D-MPNN cons.040.633 ± 0.0310.693 ± 0.025dmp402_cons
D-MPNN cons.450.699 ± 0.0470.625 ± 0.056dmp542_cons
D-MPNN cons.240.632 ± 0.0310.694 ± 0.028dmp422_cons
D-MPNN cons.260.632 ± 0.0290.694 ± 0.028dmp621_cons
D-MPNN cons.250.710 ± 0.0270.614 ± 0.024dmp521_cons
D-MPNN cons.180.632 ± 0.0500.693 ± 0.058dmp811_cons
D-MPNN cons.030.625 ± 0.0300.701 ± 0.026dmp302_cons
D-MPNN cons.440.625 ± 0.0540.700 ± 0.052dmp441_cons
D-MPNN cons.130.625 ± 0.0570.700 ± 0.050dmp311_cons
D-MPNN cons.080.624 ± 0.0290.701 ± 0.023dmp802_cons
D-MPNN cons.480.622 ± 0.0450.703 ± 0.050dmp842_cons
D-MPNN cons.160.618 ± 0.0450.706 ± 0.051dmp612_cons
D-MPNN cons.140.613 ± 0.0550.711 ± 0.057dmp411_cons
D-MPNN cons.460.634 ± 0.0550.691 ± 0.066dmp642_cons
D-MPNN cons.060.636 ± 0.0300.690 ± 0.029dmp602_cons
D-MPNN cons.230.646 ± 0.0320.680 ± 0.028dmp322_cons
D-MPNN cons.150.688 ± 0.0590.637 ± 0.057dmp512_cons
D-MPNN cons.280.652 ± 0.0300.674 ± 0.025dmp822_cons
D-MPNN cons.170.654 ± 0.0600.671 ± 0.069dmp711_cons
D-MPNN cons.470.663 ± 0.0450.662 ± 0.058dmp742_cons
D-MPNN cons.070.670 ± 0.0280.656 ± 0.026dmp701_cons
D-MPNN cons.270.674 ± 0.0260.652 ± 0.029dmp721_cons
D-MPNN cons.050.697 ± 0.0260.628 ± 0.024dmp502_cons
D-MPNN cons.430.636 ± 0.0420.689 ± 0.055dmp342_cons
Table A10. Shows the logD RMSE and r 2 results for the GIN model type used during this survey. The last column consist of the unique identify. Training strategy 0 represents a training strategy combining logD, logS, and logP, 1 represents a strategy combining logD and logP, 2 stands for a combination of logD and logS, 3 represents a strategy using logS and logP, 4 represents a strategy using logD, 5 represents a strategy using logS, and 6 represents a strategy using logP.
Table A10. Shows the logD RMSE and r 2 results for the GIN model type used during this survey. The last column consist of the unique identify. Training strategy 0 represents a training strategy combining logD, logS, and logP, 1 represents a strategy combining logD and logP, 2 stands for a combination of logD and logS, 3 represents a strategy using logS and logP, 4 represents a strategy using logD, 5 represents a strategy using logS, and 6 represents a strategy using logP.
ModelTrainingFeaturizationlogD r 2 Unique
TypeStrategyStrategyRMSE ID
GIN250.860 ± 0.0350.440 ± 0.047g522
GIN260.896 ± 0.0310.387 ± 0.039g621
GIN040.791 ± 0.0510.539 ± 0.050g401
GIN080.789 ± 0.0310.526 ± 0.036g802
GIN280.814 ± 0.0360.496 ± 0.040g822
GIN180.786 ± 0.0680.538 ± 0.081g812
GIN160.899 ± 0.0610.384 ± 0.104g612
GIN480.734 ± 0.0580.589 ± 0.079g842
GIN450.856 ± 0.0730.442 ± 0.112g542
GIN050.892 ± 0.0350.400 ± 0.047g502
GIN430.741 ± 0.0630.584 ± 0.080g342
GIN170.756 ± 0.0600.567 ± 0.057g711
GIN140.761 ± 0.0760.561 ± 0.071g412
GIN440.743 ± 0.0660.581 ± 0.061g441
GIN470.747 ± 0.0820.576 ± 0.087g742
GIN270.751 ± 0.0410.581 ± 0.040g722
GIN230.781 ± 0.0350.557 ± 0.041g321
GIN240.766 ± 0.0420.557 ± 0.044g421
GIN070.765 ± 0.0400.558 ± 0.041g702
GIN030.756 ± 0.0300.570 ± 0.033g302
GIN130.742 ± 0.0650.580 ± 0.062g311
GIN460.860 ± 0.0630.437 ± 0.085g642
GIN060.900 ± 0.0330.381 ± 0.041g601
GIN150.911 ± 0.0560.371 ± 0.062g512
GINcons.060.715 ± 0.0200.609 ± 0.025g602_cons
GINcons.430.627 ± 0.0540.698 ± 0.059g342_cons
GINcons.480.632 ± 0.0500.693 ± 0.057g842_cons
GINcons.470.633 ± 0.0610.692 ± 0.056g742_cons
GINcons.440.637 ± 0.0580.688 ± 0.060g441_cons
GINcons.130.640 ± 0.0560.685 ± 0.051g311_cons
GINcons.270.642 ± 0.0330.685 ± 0.028g722_cons
GINcons.170.644 ± 0.0470.682 ± 0.053g711_cons
GINcons.240.645 ± 0.0340.681 ± 0.031g421_cons
GINcons.030.647 ± 0.0290.679 ± 0.025g302_cons
GINcons.140.648 ± 0.0590.678 ± 0.056g412_cons
GINcons.040.652 ± 0.0330.674 ± 0.030g401_cons
GINcons.070.654 ± 0.0290.673 ± 0.024g701_cons
GINcons.230.656 ± 0.0280.670 ± 0.030g321_cons
GINcons.180.657 ± 0.0550.669 ± 0.068g812_cons
GINcons.080.661 ± 0.0340.665 ± 0.031g801_cons
GINcons.280.673 ± 0.0320.653 ± 0.029g822_cons
GINcons.450.689 ± 0.0560.636 ± 0.064g542_cons
GINcons.460.692 ± 0.0520.633 ± 0.057g642_cons
GINcons.160.707 ± 0.0570.616 ± 0.061g612_cons
GINcons.260.707 ± 0.0250.617 ± 0.025g622_cons
GINcons.050.712 ± 0.0330.612 ± 0.030g502_cons
GINcons.150.719 ± 0.0460.603 ± 0.052g512_cons
GINcons.250.692 ± 0.0260.634 ± 0.022g522_cons
Table A11. Shows the logD RMSE and r 2 results for the non-GNN model types used during this survey. The last column consists of the unique identifier. Training strategy 0 represents a training strategy combining logD, logS, and logP, 1 represents a strategy combining logD and logP, 2 stands for a combination of logD and logS, 3 represents a strategy using logS and logP, 4 represents a strategy using logD, 5 represents a strategy using logS, and 6 represents a strategy using logP.
Table A11. Shows the logD RMSE and r 2 results for the non-GNN model types used during this survey. The last column consists of the unique identifier. Training strategy 0 represents a training strategy combining logD, logS, and logP, 1 represents a strategy combining logD and logP, 2 stands for a combination of logD and logS, 3 represents a strategy using logS and logP, 4 represents a strategy using logD, 5 represents a strategy using logS, and 6 represents a strategy using logP.
ModelTrainingFeaturizationlogD r 2 Unique
TypeStrategyStrategyRMSE ID
KNNlogD100.979 ± 0.0720.261 ± 0.094KNN10
KNNlogD160.970 ± 0.0560.275 ± 0.097KNN16
KNNlogD120.959 ± 0.0660.293 ± 0.080KNN12
KNNlogD110.993 ± 0.0720.240 ± 0.096KNN11
KNNlogD130.909 ± 0.0670.363 ± 0.069KNN13
KNNlogD151.003 ± 0.0570.226 ± 0.096KNN15
KNNlogD140.996 ± 0.0650.236 ± 0.110KNN14
KNNlogD170.801 ± 0.0600.506 ± 0.058KNN17
RFlogD170.708 ± 0.0600.614 ± 0.055rf17
RFlogD150.699 ± 0.0620.623 ± 0.059rf15
RFlogD160.703 ± 0.0610.620 ± 0.057rf16
RFlogD110.859 ± 0.0620.433 ± 0.062rf11
RFlogD140.706 ± 0.0620.616 ± 0.058rf14
RFlogD100.890 ± 0.0600.390 ± 0.062rf10
RFlogD130.813 ± 0.0600.491 ± 0.059rf13
RFlogD120.863 ± 0.0680.427 ± 0.068rf12
SVMlogD120.782 ± 0.0610.529 ± 0.055svm12
SVMlogD160.707 ± 0.0560.615 ± 0.047svm16
SVMlogD100.810 ± 0.0620.495 ± 0.057svm10
SVMlogD150.698 ± 0.0550.625 ± 0.045svm15
SVMlogD140.674 ± 0.0540.650 ± 0.043svm14
SVMlogD170.639 ± 0.0510.686 ± 0.046svm17
SVMlogD110.793 ± 0.0600.516 ± 0.060svm11
SVMlogD130.814 ± 0.0590.490 ± 0.055svm13
Table A12. Shows the logS RMSE and r 2 results for the D-GIN model type used during this survey. The last column consists of the unique identifier. Training strategy 0 represents a training strategy combining logD, logS, and logP, 1 represents a strategy combining logD and logP, 2 stands for a combination of logD and logS, 3 represents a strategy using logS and logP, 4 represents a strategy using logD, 5 represents a strategy using logS, and 6 represents a strategy using logP.
Table A12. Shows the logS RMSE and r 2 results for the D-GIN model type used during this survey. The last column consists of the unique identifier. Training strategy 0 represents a training strategy combining logD, logS, and logP, 1 represents a strategy combining logD and logP, 2 stands for a combination of logD and logS, 3 represents a strategy using logS and logP, 4 represents a strategy using logD, 5 represents a strategy using logS, and 6 represents a strategy using logP.
ModelTrainingFeaturizationlogD r 2 Unique
TypeStrategyStrategyRMSE ID
D-GIN570.897 ± 0.0410.757 ± 0.025dg752
D-GIN070.866 ± 0.0340.771 ± 0.016dg701
D-GIN550.969 ± 0.0480.717 ± 0.030dg552
D-GIN050.988 ± 0.0420.705 ± 0.021dg502
D-GIN251.010 ± 0.0400.694 ± 0.020dg522
D-GIN080.795 ± 0.0380.807 ± 0.019dg801
D-GIN040.807 ± 0.0320.803 ± 0.016dg401
D-GIN380.813 ± 0.0440.813 ± 0.021dg832
D-GIN560.818 ± 0.0440.798 ± 0.025dg651
D-GIN530.848 ± 0.0470.783 ± 0.023dg352
D-GIN060.821 ± 0.0420.796 ± 0.020dg601
D-GIN270.906 ± 0.0450.755 ± 0.017dg721
D-GIN240.822 ± 0.0280.794 ± 0.014dg422
D-GIN360.827 ± 0.0630.794 ± 0.036dg632
D-GIN580.854 ± 0.0560.781 ± 0.027dg852
D-GIN340.829 ± 0.0440.791 ± 0.025dg432
D-GIN330.831 ± 0.0500.790 ± 0.026dg332
D-GIN030.832 ± 0.0390.798 ± 0.016dg302
D-GIN230.833 ± 0.0370.789 ± 0.016dg321
D-GIN540.852 ± 0.0500.789 ± 0.026dg451
D-GIN370.851 ± 0.0490.782 ± 0.027dg732
D-GIN260.837 ± 0.0420.788 ± 0.020dg622
D-GIN351.061 ± 0.0610.662 ± 0.034dg532
D-GIN cons.250.794 ± 0.0390.807 ± 0.014dg522_cons
D-GIN cons.050.779 ± 0.0400.814 ± 0.014dg502_cons
D-GIN cons.550.778 ± 0.0490.816 ± 0.020dg552_cons
D-GIN cons.350.820 ± 0.0540.795 ± 0.024dg532_cons
D-GIN cons.080.705 ± 0.0390.848 ± 0.014dg801_cons
D-GIN cons.270.757 ± 0.0440.825 ± 0.015dg721_cons
D-GIN cons.280.734 ± 0.0330.835 ± 0.012dg822_cons
D-GIN cons.370.733 ± 0.0460.836 ± 0.021dg732_cons
D-GIN cons.030.733 ± 0.0400.836 ± 0.015dg301_cons
D-GIN cons.070.731 ± 0.0350.836 ± 0.010dg701_cons
D-GIN cons.530.730 ± 0.0470.838 ± 0.019dg352_cons
D-GIN cons.330.727 ± 0.0480.839 ± 0.021dg332_cons
D-GIN cons.540.739 ± 0.0520.834 ± 0.023dg451_cons
D-GIN cons.570.748 ± 0.0450.830 ± 0.018dg752_cons
D-GIN cons.230.725 ± 0.0410.839 ± 0.014dg321_cons
D-GIN cons.380.724 ± 0.0460.840 ± 0.018dg832_cons
D-GIN cons.340.722 ± 0.0470.841 ± 0.020dg432_cons
D-GIN cons.240.718 ± 0.0310.842 ± 0.011dg422_cons
D-GIN cons.360.718 ± 0.0470.843 ± 0.021dg631_cons
D-GIN cons.260.716 ± 0.0380.843 ± 0.014dg622_cons
D-GIN cons.040.715 ± 0.0350.843 ± 0.011dg401_cons
D-GIN cons.560.711 ± 0.0460.846 ± 0.021dg651_cons
D-GIN cons.060.724 ± 0.0380.839 ± 0.016dg601_cons
D-GIN cons.580.735 ± 0.0530.835 ± 0.021dg852_cons
Table A13. Shows the logS RMSE and r 2 results for the D-MPNN model type used during this survey. The last column consist of the unique identify. Training strategy 0 represents a training strategy combining logD, logS, and logP, 1 represents a strategy combining logD and logP, 2 stands for a combination of logD and logS, 3 represents a strategy using logS and logP, 4 represents a strategy using logD, 5 represents a strategy using logS, and 6 represents a strategy using logP.
Table A13. Shows the logS RMSE and r 2 results for the D-MPNN model type used during this survey. The last column consist of the unique identify. Training strategy 0 represents a training strategy combining logD, logS, and logP, 1 represents a strategy combining logD and logP, 2 stands for a combination of logD and logS, 3 represents a strategy using logS and logP, 4 represents a strategy using logD, 5 represents a strategy using logS, and 6 represents a strategy using logP.
ModelTrainingFeaturizationlogD r 2 Unique
TypeStrategyStrategyRMSE ID
D-MPNN380.913 ± 0.0520.756 ± 0.026dmp832
D-MPNN260.857 ± 0.0350.782 ± 0.015dmp621
D-MPNN230.865 ± 0.0380.784 ± 0.016dmp322
D-MPNN050.962 ± 0.0380.718 ± 0.018dmp501
D-MPNN080.907 ± 0.0490.757 ± 0.021dmp802
D-MPNN580.921 ± 0.0460.751 ± 0.025dmp851
D-MPNN280.906 ± 0.0410.763 ± 0.020dmp821
D-MPNN070.906 ± 0.0490.764 ± 0.027dmp701
D-MPNN270.902 ± 0.0400.772 ± 0.021dmp722
D-MPNN350.938 ± 0.0440.733 ± 0.029dmp532
D-MPNN550.945 ± 0.0410.734 ± 0.030dmp552
D-MPNN340.896 ± 0.0620.782 ± 0.028dmp432
D-MPNN040.893 ± 0.0400.762 ± 0.022dmp401
D-MPNN330.893 ± 0.0540.775 ± 0.026dmp332
D-MPNN560.879 ± 0.0460.777 ± 0.021dmp651
D-MPNN530.878 ± 0.0530.772 ± 0.027dmp351
D-MPNN060.877 ± 0.0390.780 ± 0.021dmp601
D-MPNN030.874 ± 0.0460.779 ± 0.024dmp302
D-MPNN250.961 ± 0.0360.721 ± 0.021dmp521
D-MPNN540.866 ± 0.0540.778 ± 0.022dmp452
D-MPNN370.865 ± 0.0490.773 ± 0.027dmp731
D-MPNN240.863 ± 0.0460.800 ± 0.020dmp422
D-MPNN570.897 ± 0.0500.763 ± 0.024dmp751
D-MPNN360.861 ± 0.0470.780 ± 0.024dmp632
D-MPNN cons.540.748 ± 0.0550.830 ± 0.020dmp452_cons
D-MPNN cons.040.768 ± 0.0400.819 ± 0.015dmp401_cons
D-MPNN cons.280.769 ± 0.0390.819 ± 0.014dmp821_cons
D-MPNN cons.380.762 ± 0.0520.823 ± 0.022dmp832_cons
D-MPNN cons.560.749 ± 0.0530.829 ± 0.021dmp652_cons
D-MPNN cons.080.770 ± 0.0440.818 ± 0.016dmp802_cons
D-MPNN cons.330.762 ± 0.0520.823 ± 0.021dmp332_cons
D-MPNN cons.070.771 ± 0.0440.818 ± 0.016dmp701_cons
D-MPNN cons.270.771 ± 0.0400.818 ± 0.016dmp722_cons
D-MPNN cons.570.761 ± 0.0510.824 ± 0.021dmp751_cons
D-MPNN cons.580.774 ± 0.0480.817 ± 0.022dmp851_cons
D-MPNN cons.350.777 ± 0.0480.816 ± 0.021dmp532_cons
D-MPNN cons.550.779 ± 0.0450.815 ± 0.021dmp552_cons
D-MPNN cons.340.765 ± 0.0560.822 ± 0.021dmp431_cons
D-MPNN cons.250.784 ± 0.0380.811 ± 0.016dmp521_cons
D-MPNN cons.030.760 ± 0.0420.823 ± 0.016dmp302_cons
D-MPNN cons.060.759 ± 0.0390.823 ± 0.014dmp601_cons
D-MPNN cons.230.756 ± 0.0390.825 ± 0.014dmp322_cons
D-MPNN cons.260.744 ± 0.0380.831 ± 0.013dmp621_cons
D-MPNN cons.530.752 ± 0.0550.828 ± 0.021dmp351_cons
D-MPNN cons.370.744 ± 0.0530.831 ± 0.021dmp731_cons
D-MPNN cons.240.750 ± 0.0440.828 ± 0.016dmp422_cons
D-MPNN cons.360.748 ± 0.0550.830 ± 0.021dmp632_cons
D-MPNN cons.050.785 ± 0.0400.811 ± 0.015dmp501_cons
Table A14. Shows the logS RMSE and r 2 results for the GIN model type used during this survey. The last column consists of the unique identifier. Training strategy 0 represents a training strategy combining logD, logS, and logP, 1 represents a strategy combining logD and logP, 2 stands for a combination of logD and logS, 3 represents a strategy using logS and logP, 4 represents a strategy using logD, 5 represents a strategy using logS, and 6 represents a strategy using logP.
Table A14. Shows the logS RMSE and r 2 results for the GIN model type used during this survey. The last column consists of the unique identifier. Training strategy 0 represents a training strategy combining logD, logS, and logP, 1 represents a strategy combining logD and logP, 2 stands for a combination of logD and logS, 3 represents a strategy using logS and logP, 4 represents a strategy using logD, 5 represents a strategy using logS, and 6 represents a strategy using logP.
ModelTrainingFeaturizationlogD r 2 Unique
TypeStrategyStrategyRMSE ID
GIN241.126 ± 0.0470.612 ± 0.029g421
GIN031.122 ± 0.0490.615 ± 0.028g301
GIN041.116 ± 0.0530.624 ± 0.037g402
GIN071.089 ± 0.0440.650 ± 0.031g701
GIN371.088 ± 0.0660.643 ± 0.036g731
GIN271.127 ± 0.0490.613 ± 0.028g721
GIN341.147 ± 0.0840.602 ± 0.052g431
GIN571.145 ± 0.0600.602 ± 0.038g752
GIN561.400 ± 0.0640.407 ± 0.045g652
GIN551.362 ± 0.0610.437 ± 0.043g552
GIN261.359 ± 0.0450.441 ± 0.036g621
GIN251.345 ± 0.0440.447 ± 0.030g522
GIN361.334 ± 0.0580.463 ± 0.047g631
GIN051.326 ± 0.0490.474 ± 0.031g502
GIN081.136 ± 0.0580.605 ± 0.034g801
GIN351.326 ± 0.0660.472 ± 0.049g532
GIN581.217 ± 0.0620.554 ± 0.040g851
GIN531.178 ± 0.0700.579 ± 0.042g352
GIN281.173 ± 0.0500.580 ± 0.027g821
GIN381.166 ± 0.0590.603 ± 0.036g832
GIN541.161 ± 0.0890.598 ± 0.057g452
GIN231.151 ± 0.0470.595 ± 0.025g321
GIN331.148 ± 0.0640.610 ± 0.036g331
GIN061.314 ± 0.0530.474 ± 0.034g602
GINcons.560.969 ± 0.0570.714 ± 0.029g652_cons
GINcons.270.840 ± 0.0360.784 ± 0.014g721_cons
GINcons.550.945 ± 0.0450.728 ± 0.023g552_cons
GINcons.030.842 ± 0.0410.783 ± 0.016g301_cons
GINcons.540.861 ± 0.0650.774 ± 0.031g452_cons
GINcons.280.867 ± 0.0430.770 ± 0.017g821_cons
GINcons.530.872 ± 0.0560.769 ± 0.026g352_cons
GINcons.330.857 ± 0.0540.776 ± 0.024g331_cons
GINcons.070.825 ± 0.0380.791 ± 0.016g701_cons
GINcons.580.887 ± 0.0560.760 ± 0.026g851_cons
GINcons.230.855 ± 0.0430.776 ± 0.015g321_cons
GINcons.570.854 ± 0.0550.778 ± 0.026g752_cons
GINcons.370.828 ± 0.0520.791 ± 0.023g731_cons
GINcons.040.835 ± 0.0400.786 ± 0.016g402_cons
GINcons.340.860 ± 0.0610.775 ± 0.029g431_cons
GINcons.240.835 ± 0.0410.786 ± 0.014g421_cons
GINcons.060.926 ± 0.0450.737 ± 0.017g602_cons
GINcons.050.933 ± 0.0380.733 ± 0.015g502_cons
GINcons.350.934 ± 0.0530.734 ± 0.025g532_cons
GINcons.250.938 ± 0.0400.730 ± 0.015g522_cons
GINcons.360.940 ± 0.0480.731 ± 0.025g631_cons
GINcons.260.945 ± 0.0410.726 ± 0.017g621_cons
GINcons.080.850 ± 0.0420.779 ± 0.014g802_cons
GINcons.380.861 ± 0.0530.774 ± 0.018g832_cons
Table A15. Shows the logS RMSE and r 2 results for the non-GNN model types used during this survey. The last column consists of the unique identifier. Training strategy 0 represents a training strategy combining logD, logS, and logP, 1 represents a strategy combining logD and logP, 2 stands for a combination of logD and logS, 3 represents a strategy using logS and logP, 4 represents a strategy using logD, 5 represents a strategy using logS, and 6 represents a strategy using logP.
Table A15. Shows the logS RMSE and r 2 results for the non-GNN model types used during this survey. The last column consists of the unique identifier. Training strategy 0 represents a training strategy combining logD, logS, and logP, 1 represents a strategy combining logD and logP, 2 stands for a combination of logD and logS, 3 represents a strategy using logS and logP, 4 represents a strategy using logD, 5 represents a strategy using logS, and 6 represents a strategy using logP.
ModelTrainingFeaturizationlogD r 2 Unique
TypeStrategyStrategyRMSE ID
KNNlogS141.547 ± 0.0630.268 ± 0.054KNN14
KNNlogS111.587 ± 0.0630.230 ± 0.054KNN11
KNNlogS101.587 ± 0.0640.229 ± 0.056KNN10
KNNlogS121.600 ± 0.0660.217 ± 0.055KNN12
KNNlogS131.280 ± 0.0660.499 ± 0.036KNN13
KNNlogS151.676 ± 0.0660.140 ± 0.068KNN15
KNNlogS171.058 ± 0.0550.658 ± 0.032KNN17
KNNlogS161.670 ± 0.0650.147 ± 0.061KNN16
RFlogS121.239 ± 0.0560.530 ± 0.032rf12
RFlogS140.760 ± 0.0430.823 ± 0.020rf14
RFlogS150.764 ± 0.0440.821 ± 0.022rf15
RFlogS131.128 ± 0.0610.611 ± 0.039rf13
RFlogS160.765 ± 0.0450.821 ± 0.024rf16
RFlogS170.770 ± 0.0490.818 ± 0.025rf17
RFlogS101.284 ± 0.0550.495 ± 0.033rf10
RFlogS111.271 ± 0.0570.506 ± 0.035rf11
SVMlogS121.142 ± 0.0550.601 ± 0.035svm12
SVMlogS111.149 ± 0.0570.596 ± 0.035svm11
SVMlogS160.966 ± 0.0490.715 ± 0.024svm16
SVMlogS150.930 ± 0.0460.735 ± 0.023svm15
SVMlogS131.086 ± 0.0600.639 ± 0.034svm13
SVMlogS170.730 ± 0.0410.837 ± 0.020svm17
SVMlogS140.891 ± 0.0460.757 ± 0.023svm14
SVMlogS101.162 ± 0.0560.587 ± 0.036svm10
Table A16. Shows the logP RMSE and r 2 results for the D-GIN model type used during this survey. The last column consists of the unique identifier. Training strategy 0 represents a training strategy combining logD, logS, and logP, 1 represents a strategy combining logD and logP, 2 stands for a combination of logD and logS, 3 represents a strategy using logS and logP, 4 represents a strategy using logD, 5 represents a strategy using logS, and 6 represents a strategy using logP.
Table A16. Shows the logP RMSE and r 2 results for the D-GIN model type used during this survey. The last column consists of the unique identifier. Training strategy 0 represents a training strategy combining logD, logS, and logP, 1 represents a strategy combining logD and logP, 2 stands for a combination of logD and logS, 3 represents a strategy using logS and logP, 4 represents a strategy using logD, 5 represents a strategy using logS, and 6 represents a strategy using logP.
ModelTrainingFeaturizationlogD r 2 Unique
TypeStrategyStrategyRMSE ID
D-GIN080.510 ± 0.0260.878 ± 0.012dg801
D-GIN030.496 ± 0.0250.885 ± 0.012dg302
D-GIN060.493 ± 0.0240.889 ± 0.010dg601
D-GIN640.493 ± 0.0210.886 ± 0.011dg461
D-GIN670.541 ± 0.0280.862 ± 0.014dg761
D-GIN040.487 ± 0.0230.891 ± 0.012dg401
D-GIN680.477 ± 0.0250.893 ± 0.011dg862
D-GIN070.560 ± 0.0290.852 ± 0.015dg701
D-GIN050.663 ± 0.0340.793 ± 0.021dg501
D-GIN650.653 ± 0.0290.802 ± 0.018dg561
D-GIN630.472 ± 0.0200.896 ± 0.009dg362
D-GIN660.511 ± 0.0270.878 ± 0.013dg662
D-GIN cons.630.428 ± 0.0260.914 ± 0.010dg362 cons.
D-GIN cons.650.502 ± 0.0270.881 ± 0.013dg561 cons.
D-GIN cons.070.473 ± 0.0320.894 ± 0.013dg701 cons.
D-GIN cons.050.515 ± 0.0330.875 ± 0.014dg501 cons.
D-GIN cons.030.447 ± 0.0300.906 ± 0.012dg302 cons.
D-GIN cons.680.433 ± 0.0300.912 ± 0.012dg862 cons.
D-GIN cons.060.434 ± 0.0290.911 ± 0.011dg601 cons.
D-GIN cons.040.439 ± 0.0260.909 ± 0.011dg401 cons.
D-GIN cons.670.461 ± 0.0270.900 ± 0.012dg761 cons.
D-GIN cons.080.452 ± 0.0300.904 ± 0.014dg801 cons.
D-GIN cons.640.440 ± 0.0300.909 ± 0.012dg461 cons.
D-GIN cons.660.442 ± 0.0260.908 ± 0.011dg661 cons.
Table A17. Shows the logP RMSE and r 2 results for the D-MPNN model type used during this survey. The last column consists of the unique identifier. Training strategy 0 represents a training strategy combining logD, logS, and logP, 1 represents a strategy combining logD and logP, 2 stands for a combination of logD and logS, 3 represents a strategy using logS and logP, 4 represents a strategy using logD, 5 represents a strategy using logS, and 6 represents a strategy using logP.
Table A17. Shows the logP RMSE and r 2 results for the D-MPNN model type used during this survey. The last column consists of the unique identifier. Training strategy 0 represents a training strategy combining logD, logS, and logP, 1 represents a strategy combining logD and logP, 2 stands for a combination of logD and logS, 3 represents a strategy using logS and logP, 4 represents a strategy using logD, 5 represents a strategy using logS, and 6 represents a strategy using logP.
ModelTrainingFeaturizationlogD r 2 Unique
TypeStrategyStrategyRMSE ID
D-MPNN660.540 ± 0.0190.863 ± 0.012dmp661
D-MPNN650.735 ± 0.0260.746 ± 0.019dmp561
D-MPNN080.583 ± 0.0200.845 ± 0.012dmp802
D-MPNN060.551 ± 0.0220.861 ± 0.012dmp601
D-MPNN640.556 ± 0.0190.856 ± 0.011dmp461
D-MPNN050.717 ± 0.0270.758 ± 0.021dmp501
D-MPNN630.570 ± 0.0190.847 ± 0.012dmp362
D-MPNN670.629 ± 0.0230.815 ± 0.014dmp762
D-MPNN070.605 ± 0.0200.828 ± 0.012dmp702
D-MPNN680.593 ± 0.0210.834 ± 0.014dmp862
D-MPNN040.571 ± 0.0220.851 ± 0.012dmp401
D-MPNN030.552 ± 0.0180.857 ± 0.011dmp301
D-MPNN cons.650.533 ± 0.0310.866 ± 0.015dmp562 cons.
D-MPNN cons.060.444 ± 0.0240.907 ± 0.010dmp601 cons.
D-MPNN cons.660.452 ± 0.0260.904 ± 0.011dmp661 cons.
D-MPNN cons.050.524 ± 0.0310.870 ± 0.015dmp501 cons.
D-MPNN cons.030.461 ± 0.0270.900 ± 0.011dmp301 cons.
D-MPNN cons.040.463 ± 0.0270.899 ± 0.011dmp401 cons.
D-MPNN cons.630.464 ± 0.0260.899 ± 0.012dmp362 cons.
D-MPNN cons.680.471 ± 0.0280.895 ± 0.012dmp862 cons.
D-MPNN cons.080.475 ± 0.0270.894 ± 0.011dmp802 cons.
D-MPNN cons.070.484 ± 0.0280.890 ± 0.012dmp701 cons.
D-MPNN cons.640.454 ± 0.0260.903 ± 0.012dmp462 cons.
D-MPNN cons.670.488 ± 0.0270.888 ± 0.012dmp761 cons.
Table A18. Shows the logP RMSE and r 2 results for the GIN model type used during this survey. The last column consists of the unique identifier. Training strategy 0 represents a training strategy combining logD, logS, and logP, 1 represents a strategy combining logD and logP, 2 stands for a combination of logD and logS, 3 represents a strategy using logS and logP, 4 represents a strategy using logD, 5 represents a strategy using logS, and 6 represents a strategy using logP.
Table A18. Shows the logP RMSE and r 2 results for the GIN model type used during this survey. The last column consists of the unique identifier. Training strategy 0 represents a training strategy combining logD, logS, and logP, 1 represents a strategy combining logD and logP, 2 stands for a combination of logD and logS, 3 represents a strategy using logS and logP, 4 represents a strategy using logD, 5 represents a strategy using logS, and 6 represents a strategy using logP.
ModelTrainingFeaturizationlogD r 2 Unique
TypeStrategyStrategyRMSE ID
GIN030.752 ± 0.0400.742 ± 0.025g302
GIN080.750 ± 0.0290.738 ± 0.018g802
GIN040.716 ± 0.0320.758 ± 0.019g402
GIN050.902 ± 0.0330.624 ± 0.026g502
GIN650.894 ± 0.0360.626 ± 0.026g561
GIN070.717 ± 0.0330.758 ± 0.019g701
GIN670.727 ± 0.0320.753 ± 0.022g761
GIN630.739 ± 0.0300.744 ± 0.019g361
GIN680.741 ± 0.0270.743 ± 0.018g862
GIN660.871 ± 0.0330.643 ± 0.024g662
GIN640.725 ± 0.0370.753 ± 0.025g462
GIN060.883 ± 0.0360.640 ± 0.026g601
GINcons.670.537 ± 0.0350.864 ± 0.016g761 cons.
GINcons.680.548 ± 0.0340.858 ± 0.016g861 cons.
GINcons.040.539 ± 0.0350.863 ± 0.015g402 cons.
GINcons.070.539 ± 0.0340.863 ± 0.015g701 cons.
GINcons.640.534 ± 0.0390.865 ± 0.018g462 cons.
GINcons.060.612 ± 0.0380.823 ± 0.019g601 cons.
GINcons.660.606 ± 0.0370.827 ± 0.019g662 cons.
GINcons.630.544 ± 0.0350.860 ± 0.015g362 cons.
GINcons.030.547 ± 0.0360.859 ± 0.015g302 cons.
GINcons.080.555 ± 0.0330.855 ± 0.014g802 cons.
GINcons.650.614 ± 0.0380.822 ± 0.019g561 cons.
GINcons.050.618 ± 0.0370.820 ± 0.018g502 cons.
Table A19. Shows the logP RMSE and r 2 results for the non-GNN model types used during this survey. The last column consists of the unique identifier. Training strategy 0 represents a training strategy combining logD, logS, and logP, 1 represents a strategy combining logD and logP, 2 stands for a combination of logD and logS, 3 represents a strategy using logS and logP, 4 represents a strategy using logD, 5 represents a strategy using logS, and 6 represents a strategy using logP.
Table A19. Shows the logP RMSE and r 2 results for the non-GNN model types used during this survey. The last column consists of the unique identifier. Training strategy 0 represents a training strategy combining logD, logS, and logP, 1 represents a strategy combining logD and logP, 2 stands for a combination of logD and logS, 3 represents a strategy using logS and logP, 4 represents a strategy using logD, 5 represents a strategy using logS, and 6 represents a strategy using logP.
ModelTrainingFeaturizationlogD r 2 Unique
TypeStrategyStrategyRMSE ID
KNNlogP130.939 ± 0.0290.586 ± 0.024KNN13
KNNlogP140.995 ± 0.0290.534 ± 0.025KNN14
KNNlogP151.065 ± 0.0320.467 ± 0.027KNN15
KNNlogP101.082 ± 0.0350.450 ± 0.023KNN10
KNNlogP121.087 ± 0.0330.445 ± 0.023KNN12
KNNlogP161.100 ± 0.0340.431 ± 0.026KNN16
KNNlogP111.103 ± 0.0350.428 ± 0.026KNN11
KNNlogP170.744 ± 0.0190.740 ± 0.014KNN17
RFlogP130.814 ± 0.0290.688 ± 0.023RF13
RFlogP150.479 ± 0.0220.892 ± 0.010RF15
RFlogP170.472 ± 0.0230.895 ± 0.010RF17
RFlogP140.472 ± 0.0240.895 ± 0.011RF14
RFlogP120.893 ± 0.0300.625 ± 0.021RF12
RFlogP160.470 ± 0.0240.896 ± 0.011RF16
RFlogP100.921 ± 0.0310.601 ± 0.022RF10
RFlogP110.928 ± 0.0290.595 ± 0.022RF11
SVMlogP140.572 ± 0.0210.846 ± 0.012SVM14
SVMlogP120.809 ± 0.0270.692 ± 0.018SVM12
SVMlogP160.628 ± 0.0200.815 ± 0.012SVM16
SVMlogP170.493 ± 0.0300.886 ± 0.013SVM17
SVMlogP100.833 ± 0.0290.673 ± 0.020SVM10
SVMlogP110.827 ± 0.0280.678 ± 0.020SVM11
SVMlogP130.782 ± 0.0290.713 ± 0.020SVM13
SVMlogP150.602 ± 0.0210.830 ± 0.013SVM15

References

  1. Yang, K.; Swanson, K.; Jin, W.; Coley, C.; Eiden, P.; Gao, H.; Guzman-Perez, A.; Hopper, T.; Kelley, B.; Mathea, M.; et al. Analyzing Learned Molecular Representations for Property Prediction. J. Chem. Inf. Model. 2019, 59, 3370–3388. [Google Scholar] [CrossRef] [Green Version]
  2. Hu, W.; Liu, B.; Gomes, J.; Zitnik, M.; Liang, P.; Pande, V.S.; Leskovec, J. Pre-training Graph Neural Networks. arXiv 2019, arXiv:1905.12265. [Google Scholar]
  3. Gilmer, J.; Schoenholz, S.S.; Riley, P.F.; Vinyals, O.; Dahl, G.E. Neural Message Passing for Quantum Chemistry. In Proceedings of the 34th International Conference on Machine Learning (ICML), Sydney, Australia, 6–11 August 2017. [Google Scholar] [CrossRef] [Green Version]
  4. Wieder, O.; Kohlbacher, S.; Kuenemann, M.; Garon, A.; Ducrot, P.; Seidel, T.; Langer, T. A compact review of molecular property prediction with graph neural networks. Drug Discov. Today Technol. 2020. [Google Scholar] [CrossRef]
  5. Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C.; Yu, P.S. A Comprehensive Survey on Graph Neural Networks. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 4–24. [Google Scholar] [CrossRef] [Green Version]
  6. Zhou, K.; Dong, Y.; Lee, W.S.; Hooi, B.; Xu, H.; Feng, J. Effective Training Strategies for Deep Graph Neural Networks. arXiv 2020, arXiv:2006.07107. [Google Scholar]
  7. Shang, C.; Liu, Q.; Chen, K.S.; Sun, J.; Lu, J.; Yi, J.; Bi, J. Edge Attention-based Multi-Relational Graph Convolutional Networks. arXiv 2018, arXiv:1802.04944. [Google Scholar]
  8. Liao, R.; Zhao, Z.; Urtasun, R.; Zemel, R.S. LanczosNet: Multi-scale deep graph convo-lutional networks. In Proceedings of the 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, 6–9 May 2019; pp. 1–18. [Google Scholar]
  9. Withnall, M.; Lindelöf, E.; Engkvist, O.; Chen, H. Building attention and edge message passing neural networks for bioactivity and physical-chemical property prediction. J. Cheminform. 2020, 12. [Google Scholar] [CrossRef]
  10. Yuan, H.; Ji, S. StructPool: Structured Graph Pooling via Conditional Random Fields. In Proceedings of the 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
  11. Hu, W. For pre-training graph neural. arXiv 2020, arXiv:1905.12265v3. [Google Scholar]
  12. Micheli, A. Neural network for graphs: A contextual constructive approach. IEEE Trans. Neural Netw. 2009, 20, 498–511. [Google Scholar] [CrossRef]
  13. Lusci, A.; Pollastri, G.; Baldi, P. Deep architectures and deep learning in chemoinformatics: The prediction of aqueous solubility for drug-like molecules. J. Chem. Inf. Model. 2013, 53, 1563–1575. [Google Scholar] [CrossRef] [Green Version]
  14. Bruna, J.; Zaremba, W.; Szlam, A.; LeCun, Y. Spectral Networks and Locally Connected Networks on Graphs. arXiv 2014, arXiv:cs.LG/1312.6203. [Google Scholar]
  15. Duvenaud, D.; Maclaurin, D.; Aguilera-Iparraguirre, J.; Gómez-Bombarelli, R.; Hirzel, T.; Aspuru-Guzik, A.; Adams, R.P. Convolutional networks on graphs for learning molecular fingerprints. Adv. Neural Inf. Process. Syst. 2015, 2015, 2224–2232. [Google Scholar]
  16. Coley, C.W.; Barzilay, R.; Green, W.H.; Jaakkola, T.S.; Jensen, K.F. Convolutional Embedding of Attributed Molecular Graphs for Physical Property Prediction. J. Chem. Inf. Model. 2017, 57, 1757–1772. [Google Scholar] [CrossRef] [PubMed]
  17. Bouritsas, G.; Frasca, F.; Zafeiriou, S.; Bronstein, M.M. Improving Graph Neural Network Expressivity via Subgraph Isomorphism Counting. In Proceedings of the 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
  18. Xu, K.; Jegelka, S.; Hu, W.; Leskovec, J. How powerful are graph neural networks? In Proceedings of the 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, 6–9 May 2019; pp. 1–17.
  19. Morris, C.; Ritzert, M.; Fey, M.; Hamilton, W.L.; Lenssen, J.E.; Rattan, G.; Grohe, M. Weisfeiler and Leman Go Neural: Higher-Order Graph Neural Networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 4602–4609. [Google Scholar] [CrossRef] [Green Version]
  20. Weisfeiler, B.Y.; Leman, A.A. A reduction of a graph to a canonical form and an algebra arising during this reduction. Nauchno-Tech. Informatsia 1968, 2, 2–16. [Google Scholar]
  21. Dwivedi, V.P.; Joshi, C.K.; Laurent, T.; Bengio, Y.; Bresson, X. Benchmarking Graph Neural Netw. arXiv 2020, arXiv:2003.00982. [Google Scholar]
  22. Mayr, A.; Klambauer, G.; Unterthiner, T.; Steijaert, M.; Wegner, J.K.; Ceulemans, H.; Clevert, D.A.; Hochreiter, S. Large-scale comparison of machine learning methods for drug target prediction on ChEMBL. Chem. Sci. 2018, 9, 5441–5451. [Google Scholar] [CrossRef] [Green Version]
  23. Errica, F.; Podda, M.; Bacciu, D.; Micheli, A. A Fair Comparison of Graph Neural Networks for Graph Classification. arXiv 2019, arXiv:1912.09893. [Google Scholar]
  24. Shchur, O.; Mumme, M.; Bojchevski, A.; Günnemann, S. Pitfalls of Graph Neural Network Evaluation. arXiv 2018, arXiv:1811.05868. [Google Scholar]
  25. Neal, B. On the Bias-Variance Tradeoff: Textbooks Need an Update. arXiv 2019, arXiv:cs.LG/1912.08286. [Google Scholar]
  26. Wu, Z.; Ramsundar, B.; Feinberg, E.N.; Gomes, J.; Geniesse, C.; Pappu, A.S.; Leswing, K.; Pande, V. MoleculeNet: A Benchmark for Molecular Machine Learning. arXiv 2018, arXiv:1703.00564v3. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  27. Cui, Q.; Lu, S.; Ni, B.; Zeng, X.; Tan, Y.; Chen, Y.D.; Zhao, H. Improved Prediction of Aqueous Solubility of Novel Compounds by Going Deeper With Deep Learning. Front. Oncol. 2020, 10, 121. [Google Scholar] [CrossRef] [PubMed]
  28. Graph Networks. Available online: https://github.com/spudlig/graph_networks (accessed on 1 August 2021).
  29. RDKit. Available online: https://www.rdkit.org/ (accessed on 1 September 2021).
  30. CDPKit. Available online: https://github.com/aglanger/CDPKit (accessed on 30 January 2021).
  31. TensorFlow. Version 2.3.0. Available online: https://tensorflow.org (accessed on 15 January 2020).
  32. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
  33. Xie, Y.; Gong, M.; Gao, Y.; Qin, A.K.; Fan, X. A Multi-Task Representation Learning Architecture for Enhanced Graph Classification. Front. Neurosci. 2020, 13, 1395. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  34. Seltzer, M.L.; Droppo, J. Multi-task learning in deep neural networks for improved phoneme recognition. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 26–31 May 2013; pp. 6965–6969. [Google Scholar] [CrossRef]
  35. Hashimoto, K.; Xiong, C.; Tsuruoka, Y.; Socher, R. A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks. arXiv 2016, arXiv:1611.01587. [Google Scholar]
Figure 1. High level representation of the directed-edge graph isomorphism network (D-GIN) architecture for physicochemical prediction (logD, logS, or logP). (a) High level workflow depicting how a graph and its nodes and edges are featurized, then fed into the D-GIN to generate a molecular graph embedding. (b) The D-GIN architecture at a low level. Steps involved in generating input to make predictions: 1) Initial hidden directed-edge features ( h u v 0 ) are initialized by concatenating the corresponding node ( x v ) and directed edge ( x u v ) features. (2) Directed edge messages ( m u v ) are used to update the hidden directed-edge features ( h u v t ). (3) Directed messages are combined with their corresponding hidden node features ( h v ), and (4) iteratively updated by an additional trainable identifier (epsilon). (5) Hidden node features are aggregated to generate the molecular embedding ( h G ) which is used as input for (6), the feed-forward neural network.
Figure 1. High level representation of the directed-edge graph isomorphism network (D-GIN) architecture for physicochemical prediction (logD, logS, or logP). (a) High level workflow depicting how a graph and its nodes and edges are featurized, then fed into the D-GIN to generate a molecular graph embedding. (b) The D-GIN architecture at a low level. Steps involved in generating input to make predictions: 1) Initial hidden directed-edge features ( h u v 0 ) are initialized by concatenating the corresponding node ( x v ) and directed edge ( x u v ) features. (2) Directed edge messages ( m u v ) are used to update the hidden directed-edge features ( h u v t ). (3) Directed messages are combined with their corresponding hidden node features ( h v ), and (4) iteratively updated by an additional trainable identifier (epsilon). (5) Hidden node features are aggregated to generate the molecular embedding ( h G ) which is used as input for (6), the feed-forward neural network.
Molecules 26 06185 g001
Figure 2. LogD, logS, and logP property prediction results for GNN and non-GNN model types with different featurization and training strategies. The different GNN architectures are colored in blue (D-GIN), orange (D MPNN), and green (GIN), the non-GNN architectures in gray (SVM), salmon pink (RF), and red (KNN) For logD and logS, 24 individual RMSE values were calculated for each model type. For logP 12 individual RMSE values were calculated. The individual boxplots show the average value of each model type as white dot and the median as a dark gray line. The values are listed in Table A8, Table A9, Table A10, Table A11, Table A12, Table A13, Table A14, Table A15, Table A16, Table A17, Table A18 and Table A19 in the Appendix A.
Figure 2. LogD, logS, and logP property prediction results for GNN and non-GNN model types with different featurization and training strategies. The different GNN architectures are colored in blue (D-GIN), orange (D MPNN), and green (GIN), the non-GNN architectures in gray (SVM), salmon pink (RF), and red (KNN) For logD and logS, 24 individual RMSE values were calculated for each model type. For logP 12 individual RMSE values were calculated. The individual boxplots show the average value of each model type as white dot and the median as a dark gray line. The values are listed in Table A8, Table A9, Table A10, Table A11, Table A12, Table A13, Table A14, Table A15, Table A16, Table A17, Table A18 and Table A19 in the Appendix A.
Molecules 26 06185 g002
Figure 3. LogD prediction results for each GNN model instance. The left y-axis specifies the logD RMSE and the right, secondary x-axis the corresponding r 2 values for each GNN model. D-GIN is colored blue, D-MPNN orange, and GIN green. Each of the bars represent a different trained model-a detailed description can be found in Figure A1 in the Appendix A. The accumulated kernel density for each model type is shown on the very left side. The red lines correspond to the 95% confidence intervals. The model names are a combination of model type (D-GIN, GIN, D-MPNN), training approach and featurization type-a detailed description of each model name can be found in Table A8, Table A9, Table A10, Table A11, Table A12, Table A13, Table A14, Table A15, Table A16, Table A17, Table A18 and Table A19 in the Appendix A.
Figure 3. LogD prediction results for each GNN model instance. The left y-axis specifies the logD RMSE and the right, secondary x-axis the corresponding r 2 values for each GNN model. D-GIN is colored blue, D-MPNN orange, and GIN green. Each of the bars represent a different trained model-a detailed description can be found in Figure A1 in the Appendix A. The accumulated kernel density for each model type is shown on the very left side. The red lines correspond to the 95% confidence intervals. The model names are a combination of model type (D-GIN, GIN, D-MPNN), training approach and featurization type-a detailed description of each model name can be found in Table A8, Table A9, Table A10, Table A11, Table A12, Table A13, Table A14, Table A15, Table A16, Table A17, Table A18 and Table A19 in the Appendix A.
Molecules 26 06185 g003
Figure 4. LogS prediction results for each GNN model instance. The left y-axis specifies the logS RMSE and the right, secondary x-axis the corresponding r 2 values for each GNN model. D-GIN is colored blue, D-MPNN orange, and GIN green. Each of the bars represent a different trained model-a detailed description can be found in Figure A3 in the Appendix A. The accumulated kernel density for each model type is shown on the very left side. The red lines correspond to the 95% confidence intervals. The model names are a combination of model type (D-GIN, GIN, D-MPNN), training approach and featurization type-a detailed description of each model name can be found in Table A8, Table A9, Table A10, Table A11, Table A12, Table A13, Table A14, Table A15, Table A16, Table A17, Table A18 and Table A19 in the Appendix A.
Figure 4. LogS prediction results for each GNN model instance. The left y-axis specifies the logS RMSE and the right, secondary x-axis the corresponding r 2 values for each GNN model. D-GIN is colored blue, D-MPNN orange, and GIN green. Each of the bars represent a different trained model-a detailed description can be found in Figure A3 in the Appendix A. The accumulated kernel density for each model type is shown on the very left side. The red lines correspond to the 95% confidence intervals. The model names are a combination of model type (D-GIN, GIN, D-MPNN), training approach and featurization type-a detailed description of each model name can be found in Table A8, Table A9, Table A10, Table A11, Table A12, Table A13, Table A14, Table A15, Table A16, Table A17, Table A18 and Table A19 in the Appendix A.
Molecules 26 06185 g004
Figure 5. LogP prediction results for each GNN model instance. The left y-axis specifies the logP RMSE and the right, secondary x-axis the corresponding r 2 values for each GNN model. D-GIN is colored blue, D-MPNN orange, and GIN green. Each of the bars represent a different trained model-a detailed description can be found in Figure A2 in the Appendix A. The accumulated kernel density for each model type is shown on the very left side. The red lines correspond to the 95% confidence intervals. The model names are a combination of model type (D-GIN, GIN, D-MPNN), training approach and featurization type-a detailed description of each model name can be found in Table A8, Table A9, Table A10, Table A11, Table A12, Table A13, Table A14, Table A15, Table A16, Table A17, Table A18 and Table A19 in the Appendix A.
Figure 5. LogP prediction results for each GNN model instance. The left y-axis specifies the logP RMSE and the right, secondary x-axis the corresponding r 2 values for each GNN model. D-GIN is colored blue, D-MPNN orange, and GIN green. Each of the bars represent a different trained model-a detailed description can be found in Figure A2 in the Appendix A. The accumulated kernel density for each model type is shown on the very left side. The red lines correspond to the 95% confidence intervals. The model names are a combination of model type (D-GIN, GIN, D-MPNN), training approach and featurization type-a detailed description of each model name can be found in Table A8, Table A9, Table A10, Table A11, Table A12, Table A13, Table A14, Table A15, Table A16, Table A17, Table A18 and Table A19 in the Appendix A.
Molecules 26 06185 g005
Figure 6. LogD, logS, and logP prediction results for all GNN model types depending on the featurization used (see Section 2.4 for a detailed description). The mean is shown as a white dot whereas the median is shown as a dark gray line. Exact values are listed in Table A8, Table A9, Table A10, Table A11, Table A12, Table A13, Table A14, Table A15, Table A16, Table A17, Table A18 and Table A19 in the Appendix A.
Figure 6. LogD, logS, and logP prediction results for all GNN model types depending on the featurization used (see Section 2.4 for a detailed description). The mean is shown as a white dot whereas the median is shown as a dark gray line. Exact values are listed in Table A8, Table A9, Table A10, Table A11, Table A12, Table A13, Table A14, Table A15, Table A16, Table A17, Table A18 and Table A19 in the Appendix A.
Molecules 26 06185 g006
Figure 7. LogD prediction results for all GNN model types according to the used training strategy. Blue box shows the performance of the multi-task training strategy using logD, logS, and logP. The green and orange box show the results utilizing a combination of logD and logP and logD and logS for training. The salmon pink box shows the results using logD for training. The mean is shown as a white dot whereas the median is shown as a dark gray line. Exact values are listed in Table A8, Table A9, Table A10 and Table A11 in the Appendix A.
Figure 7. LogD prediction results for all GNN model types according to the used training strategy. Blue box shows the performance of the multi-task training strategy using logD, logS, and logP. The green and orange box show the results utilizing a combination of logD and logP and logD and logS for training. The salmon pink box shows the results using logD for training. The mean is shown as a white dot whereas the median is shown as a dark gray line. Exact values are listed in Table A8, Table A9, Table A10 and Table A11 in the Appendix A.
Molecules 26 06185 g007
Figure 8. LogS prediction results for all GNN model types according to the used training strategy. The blue box shows the performance of the multi-task training strategy using logD, logS, and logP. The gray and orange box show the results utilizing a combination of logS and logP and logD and logS properties, respectively. The red box shows the results using logS for training. The mean is shown as a white dot whereas the median is shown as a dark gray line. Exact values are listed in Table A12, Table A13, Table A14 and Table A15 in the Appendix A.
Figure 8. LogS prediction results for all GNN model types according to the used training strategy. The blue box shows the performance of the multi-task training strategy using logD, logS, and logP. The gray and orange box show the results utilizing a combination of logS and logP and logD and logS properties, respectively. The red box shows the results using logS for training. The mean is shown as a white dot whereas the median is shown as a dark gray line. Exact values are listed in Table A12, Table A13, Table A14 and Table A15 in the Appendix A.
Molecules 26 06185 g008
Figure 9. LogP prediction results for all GNN model types according to the used training strategy. Blue box shows the performance of the multi-task training strategy using logD, logS, and logP. The dark orange box shows the results using logP for training. The mean is shown as a white dot, whereas the median is shown as a dark gray line. Exact values are listed in Table A16, Table A17, Table A18 and Table A19 in the Appendix A.
Figure 9. LogP prediction results for all GNN model types according to the used training strategy. Blue box shows the performance of the multi-task training strategy using logD, logS, and logP. The dark orange box shows the results using logP for training. The mean is shown as a white dot, whereas the median is shown as a dark gray line. Exact values are listed in Table A16, Table A17, Table A18 and Table A19 in the Appendix A.
Molecules 26 06185 g009
Table 1. Common notations used throughout this publication.
Table 1. Common notations used throughout this publication.
NotationDefinition
τ A non-linear function (e.g., sigmoid or relu)
c a t ( , ) Vector concatenation
tIterator of t steps
GA graph
VSet of nodes
ESet of edges
vNode vV
e u v Edge e u v E between node u and v
N ( u ) Neighbors of node u
N ( u ) / w Neighbors of node u except w
nThe number of nodes
mThe number of edges
dThe dimension of a node feature vector
bThe dimension of a edge feature vector
X n × d Feature matrix of a graph
x v d Feature vector of node v
x e u v b Feature vector of edge e u v
h v c Hidden feature vector of node v
m v c Message feature vector to node v
h G c Feature vector of the graph G
h u v d Hidden feature vector of edge e u v
m u v d Message feature vector to edge e u v
WWeight matrix of a neural network
A { 1 , 0 } | n | x | n | Adjacency matrix
RMSERoot mean squared error
GNNGraph neural network
GINGraph isomorphic network as in [18]
ϵ Epsilon as described in [18]
D-MPNNDirected-edge message passing network as in [1]
D-GINDirected-edge graph isomorphic network
CI95% confidence interval calculated via bootstrapping
f ( · ) Feed forward neural network
Table 2. Overview of the best performing machine learning model types independent of training and featurization strategy for prediction of logD, logS, and logP. The performance was calculated as the distribution average over all used model root mean squared error (RMSE) values. In total 24 models were used for the loD and logS property, and 12 for the logP property. RMSE values highlighted in dark and light gray show the best and next best models. Red asterisks mark the lowest RMSE for the non-consensus models for each property prediction.
Table 2. Overview of the best performing machine learning model types independent of training and featurization strategy for prediction of logD, logS, and logP. The performance was calculated as the distribution average over all used model root mean squared error (RMSE) values. In total 24 models were used for the loD and logS property, and 12 for the logP property. RMSE values highlighted in dark and light gray show the best and next best models. Red asterisks mark the lowest RMSE for the non-consensus models for each property prediction.
MolecularModel TypeMeanMinMax
Property RMSERMSERMSE
logDD-GIN0.615 ± 0.039 *0.5530.704
D-MPNN0.762 ± 0.0650.6860.911
GIN0.804 ± 0.0610.7380.911
RF0.780 ± 0.0840.6990.890
SVM0.740 ± 0.0680.6390.814
KNN0.951 ± 0.0670.8011.003
D-GIN cons.0.575 ± 0.0190.5480.622
D-MPNN cons.0.647 ± 0.0280.6130.710
GIN cons.0.666 ± 0.0290.6270.719
logSD-GIN0.867 ± 0.070 *0.7951.061
D-MPNN0.896 ± 0.0300.8570.961
GIN1.210 ± 0.1021.0881.400
RF0.997 ± 0.2530.7601.284
SVM1.006 ± 0.1540.7291.162
KNN1.500 ± 0.2171.0571.676
D-GIN cons.0.738 ± 0.0280.7050.820
D-MPNN cons.0.762 ± 0.0120.7430.785
GIN cons.0.881 ± 0.0450.8250.969
logPD-GIN0.529 ± 0.064*0.4720.662
D-MPNN0.600 ± 0.0630.5400.734
GIN0.784 ± 0.0770.7160.901
RF0.681 ± 0.2240.4700.928
SVM0.693 ± 0.1340.4930.833
KNN1.014 ± 0.1230.7431.102
D-GIN cons.0.455 ± 0.0280.4280.515
D-MPNN cons.0.475 ± 0.0270.4430.532
GIN cons.0.566 ± 0.0340.5330.618
Table 3. Impact of the featurization and training strategy on the different molecular properties independent of what GNN model type was used. For each endpoint the mean, minimum, and maximum RMSE can be seen. The dark gray boxes show the best RMSE for the particular property, the light gray the second best.
Table 3. Impact of the featurization and training strategy on the different molecular properties independent of what GNN model type was used. For each endpoint the mean, minimum, and maximum RMSE can be seen. The dark gray boxes show the best RMSE for the particular property, the light gray the second best.
Featurization logD RMSE logS RMSE logP RMSE
StrategyMeanMinMaxMeanMinMaxMeanMinMax
3.00.689 ± 0.0790.5750.7810.954 ± 0.1460.8311.1770.596 ± 0.1200.4720.751
4.00.694 ± 0.0720.5870.7910.948 ± 0.1420.8071.1600.591 ± 0.1050.4870.725
5.00.813 ± 0.0990.6600.9111.099 ± 0.1800.9381.3610.760 ± 0.1100.6520.901
6.00.727 ± 0.1320.5530.9001.015 ± 0.2500.8181.4000.641 ± 0.1830.4930.883
7.00.732 ± 0.0750.6150.8230.961 ± 0.1130.8511.1440.629 ± 0.0770.5410.727
8.00.706 ± 0.0830.5790.8140.970 ± 0.1550.7951.2160.609 ± 0.1140.4770.750
Training logD RMSE logS RMSE logP RMSE
StrategyMeanMinMaxMeanMinMaxMeanMinMax
logD/P/S0.730 ± 0.1020.5790.9000.979 ± 0.1660.7951.3250.639 ± 0.1290.4870.901
logD/P0.719 ± 0.1050.5530.911------
logD/S0.737 ± 0.1060.5820.9110.993 ± 0.1760.8211.359---
logS/P---0.988 ± 0.1730.8121.333---
logD0.722 ± 0.0870.5960.881------
logS---1.004 ± 0.1870.8181.400---
logP------0.637 ± 0.1300.4720.894
Table 4. Impact of the featurization on the different molecular properties. For each property and model type, the mean, minimum, and maximum RMSE are shown. The dark gray boxes represent the best RMSE for the particular property and model type, the light gray the second best. The red asterisk is the overall best RMSE for the particular property.
Table 4. Impact of the featurization on the different molecular properties. For each property and model type, the mean, minimum, and maximum RMSE are shown. The dark gray boxes represent the best RMSE for the particular property and model type, the light gray the second best. The red asterisk is the overall best RMSE for the particular property.
Model TypeFeaturization logD RMSE logS RMSE logP RMSE
StrategyMeanMinMaxMeanMinMaxMeanMinMax
D-GIN3.00.585 ± 0.0110.5750.6010.835 ± 0.0070.8310.8470.484 ± 0.017 *0.4720.496
4.00.603 ± 0.0140.5870.6220.827 ± 0.0180.8070.8510.490 ± 0.0040.4870.493
5.00.682 ± 0.0180.6600.7041.007 ± 0.0390.9691.0610.657 ± 0.0060.6520.662
6.00.580 ± 0.019 *0.5530.5960.825 ± 0.008 *0.8180.8360.502 ± 0.0120.4930.511
7.00.637 ± 0.0200.6150.6610.880 ± 0.0250.8510.9060.550 ± 0.0130.5410.560
8.00.602 ± 0.0200.5790.6290.826 ± 0.0270.7950.8540.493 ± 0.0230.4770.509
D-MPNN3.00.728 ± 0.0230.7030.7590.835 ± 0.0070.8310.8470.561 ± 0.0120.5520.570
4.00.715 ± 0.0160.6920.7280.879 ± 0.0170.8620.8960.563 ± 0.0100.5560.570
5.00.878 ± 0.0230.8570.9110.951 ± 0.0110.9380.9610.725 ± 0.0120.7160.734
6.00.712 ± 0.0170.6860.7240.868 ± 0.0110.8570.8790.545 ± 0.0070.5400.551
7.00.805 ± 0.0160.7830.8230.892 ± 0.0180.8650.9050.616 ± 0.0170.6040.629
8.00.734 ± 0.0360.7120.7880.911 ± 0.0070.9050.9210.588 ± 0.0070.5810.593
GIN3.00.755 ± 0.0180.7410.7811.149 ± 0.0221.1221.1770.745 ± 0.0090.7380.751
4.00.765 ± 0.0190.7430.7911.137 ± 0.0201.1161.1600.720 ± 0.0060.7160.725
5.00.880 ± 0.0260.8560.9111.339 ± 0.0171.3251.3610.897 ± 0.0050.8940.901
6.00.889 ± 0.0190.8600.9001.351 ± 0.0371.3141.4000.877 ± 0.0080.8710.883
7.00.755 ± 0.0070.7470.7651.112 ± 0.0281.0881.1440.722 ± 0.0070.7160.727
8.00.781 ± 0.0330.7340.8141.172 ± 0.0331.1351.2160.745 ± 0.0060.7410.750
Table 5. Impact of the training strategies on the different molecular properties. Each model type is evaluated separately. For each property and model type, the mean, minimum, and maximum RMSE are shown. The dark gray boxes represent the best RMSE for the particular property and model type, the light gray the second best. The red asterisk highlights the overall best RMSE for the particular property.
Table 5. Impact of the training strategies on the different molecular properties. Each model type is evaluated separately. For each property and model type, the mean, minimum, and maximum RMSE are shown. The dark gray boxes represent the best RMSE for the particular property and model type, the light gray the second best. The red asterisk highlights the overall best RMSE for the particular property.
Model TypeTraining logD RMSE logS RMSE logP RMSE
StrategyMeanMinMaxMeanMinMaxMeanMinMax
D-GINlogD/P/S0.617 ± 0.0520.5790.7040.851 ± 0.071 *0.7950.9870.534 ± 0.0670.4870.662
logD/P0.607 ± 0.044 *0.5530.685------
logD/S0.610 ± 0.0360.5820.6790.875 ± 0.0720.8211.010---
logS/P---0.868 ± 0.0950.8121.061---
logD0.625 ± 0.0240.5960.660------
logS---0.872 ± 0.0530.8180.969---
logP------0.524 ± 0.067 *0.4720.652
D-MPNNlogD/P/S0.759 ± 0.0630.7120.8640.903 ± 0.0310.8730.9610.596 ± 0.0620.5510.716
logD/P0.741 ± 0.0660.6860.857------
logD/S0.788 ± 0.0700.7210.9110.892 ± 0.0390.8570.960---
logS/P---0.894 ± 0.0290.8600.938---
logD0.761 ± 0.0670.7130.881------
logS---0.897 ± 0.0300.8650.944---
logP------0.603 ± 0.0710.5400.734
GINlogD/P/S0.815 ± 0.0630.7560.9011.183 ± 0.1061.0891.3250.786 ± 0.0830.7160.901
logD/P0.809 ± 0.0750.7420.911------
logD/S0.811 ± 0.0560.7510.8961.213 ± 0.1081.1261.359---
logS/P---1.201 ± 0.1021.0881.333---
logD0.780 ± 0.0600.7340.860------
logS---1.243 ± 0.1091.1441.400---
logP------0.782 ± 0.0770.7250.894
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Wieder, O.; Kuenemann, M.; Wieder, M.; Seidel, T.; Meyer, C.; Bryant, S.D.; Langer, T. Improved Lipophilicity and Aqueous Solubility Prediction with Composite Graph Neural Networks. Molecules 2021, 26, 6185. https://doi.org/10.3390/molecules26206185

AMA Style

Wieder O, Kuenemann M, Wieder M, Seidel T, Meyer C, Bryant SD, Langer T. Improved Lipophilicity and Aqueous Solubility Prediction with Composite Graph Neural Networks. Molecules. 2021; 26(20):6185. https://doi.org/10.3390/molecules26206185

Chicago/Turabian Style

Wieder, Oliver, Mélaine Kuenemann, Marcus Wieder, Thomas Seidel, Christophe Meyer, Sharon D. Bryant, and Thierry Langer. 2021. "Improved Lipophilicity and Aqueous Solubility Prediction with Composite Graph Neural Networks" Molecules 26, no. 20: 6185. https://doi.org/10.3390/molecules26206185

APA Style

Wieder, O., Kuenemann, M., Wieder, M., Seidel, T., Meyer, C., Bryant, S. D., & Langer, T. (2021). Improved Lipophilicity and Aqueous Solubility Prediction with Composite Graph Neural Networks. Molecules, 26(20), 6185. https://doi.org/10.3390/molecules26206185

Article Metrics

Back to TopTop