2.1. Semi-Exhaustive Approach
The semi-exhaustive approach follows an FBDD technique. Mainly, we use a DRA for the computations of new ligands. In
Figure 1, we can observe the stages of the algorithm. In the following, we explain each step:
Stage 1. Deconstruction. A set of
Hsp90 ligands, called known ligands (KL), is selected for the fragmentation stage. Each ligand is cut off in sub-ligands considering the following constraints extracted from RO3 [
15]: (i) the molecular weight must be <300; (ii) cLogP must be ≤3; (iii) the number of hydrogen bond donors must be ≤3 and (iv) the number of hydrogen bond receptors must be ≤3 [
26]. Let
be the number of known ligands. Each ligand
is fragmented in
sub-ligands (fragments). Then a total of
fragments are generated from KL. Then the number of sub-ligands stored in the folder FL is
.
Stage 2. Duplicity and Cleaning. For each sub-ligand obtained in the first stage,
, we apply a duplicity check process. The process compares each sub-ligand with the rest of sub-ligands in the set to check if a duplication exists. For this goal, we use the Tanimoto coefficient, where the similarity between each pair of sub-ligands is computed [
27]. If there are duplicated sub-ligands, they are deleted. Let
be the number of duplicated sub-ligands, therefore the number of sub-ligands now is
. It is important to note that if a sub-ligand appears several times, all their copies are deleted, and just one remains.
Stage 3. Sub-Ligands. The folder SL* contains all sub-ligands obtained in the previous stage. Each sub-ligand that includes an asterisk identifier indicates that it contains a broken bond. Another folder called SL contains each sub-ligand without the asterisk identifiers. In this set, we have the same sub-ligands from SL*, but without the identifiers. We recall that each of these folders contain sub-ligands.
Stage 4. Reconstruction. The folder SL* contains the header of each sub-ligand, i.e., the sub-ligand with the asterisks identifying broken bonds. The folder SL contains the body of the sub-ligands, i.e., the sub-ligands without the asterisk identifiers.
The process of union between sub-ligands starts taking a header from set SL* and a body from set SL. Let consider two sub-ligands, sub-ligand A (from SL*) and B (from SL).
The body of sub-ligand
B bonds into the first asterisk identifier on the header of sub-ligand
A. Then, if asterisk identifiers remain in the sub-ligand
A, a random sub-ligand from SL is coupled. This last step is repeated until no free identifiers remain in
A. The process is repeated for each sub-ligand
B in SL with each sub-ligand
A in SL*. Each new ligand obtained from the union process is stored in the folder NL. Let recall that we have
sub-ligands in each folder, SL* and SL, then the reconstruction process will generate
new ligands. If a new ligand does not satisfy Lipinski’s Rule of Five [
18] and Veber’s rules [
19], then it is rejected. Let
be the number of new ligands which are rejected, then the total of new ligands is
.
Stage 5. Duplicity and Cleaning. As in Stage 2, a duplicity and cleaning process is applied in the same way. Each new ligand is compared with the rest of the ligands in set NL, checking for duplication. If so, then all the duplicated new ligands are deleted, and just one remains. Let
be the number of duplicated new ligands, then the number of new ligands now is
. Therefore the total number of new ligands obtained from known ligands is:
where
is the total fragments from KL library,
is the number of deleted fragments due to duplicity,
is the number of new ligands deleted from duplicity, and
is the number of rejected new ligands.
Stage 6. Validation. The validation stage consists of computing the binding energy (
) of each new ligand stored in NL. The process consists in calculating the interaction between the Hsp90 protein and a specific ligand. Then, considering several possible docking positions, the BE is computed. If the size of a ligand is larger than the grid specified for docking with the Hsp90, the evaluation is not possible. These ligands remain valid but not available. From all docking positions, it considers the most negative energy. The molecular docking software Autodock [
28] is used in this stage.
Deconstruction example. The ligand PU3 is presented in
Figure 2. If this ligand is cut off (deconstruction), then a total of five sub-ligands can be obtained. In
Figure 2 the corresponding sub-ligands are presented as well.
The open-source software OpenBabel [
29] is used to convert a ligand, from PDB (protein data bank) format to SMILES (simplified molecular-input line-entry system) format [
30]. The line notation of SMILES format allows to work the molecular representation of the ligands in a simpler manner. The reading of files with chemical structures in this format becomes easier to handle than with PDB format. We use the SMILES format from stages 1 to 5 (deconstruction, clean/duplicity, sub-ligands/reconstruction, and clean/duplicity). The open-source software PyMol [
31] is used for the visualization.
Automated docking was used to locate new compound appropriate binding orientations and conformations into the Hsp90 binding pocket. The structure of Hsp90 used in molecular docking was obtained from X-ray crystal data in RCSB Protein Data Bank (PDB ID: 5LQ9). The molecular docking calculations were carried out by using the AutoDock4 software. The protein structure was prepared by removing water, adding Kollman charges and polar hydrogen. First, the residues around 6Å of the co-crystallized ligand were considered the binding site for docking calculations. Note that this box centered in the co-crystallized ligand involved two characteristic regions: (i) Site 1, corresponding to ATP-binding site and (ii) Site 2, which represents an internal region (see
Figure 3). All the molecules from the data set were docked into the active site of Hsp90. Then, a grid map was generated by the auxiliary program AutoGrid4 using x, y, and z coordinate to find a small but more representative grid center for the active site. The grid dimensions were set to
points, with a grid spacing of
Å. The number of docking runs was set to 100. The population in the genetic algorithm was 150. After docking, the 100 docked poses were clustered into groups with RMS deviations lower than 1Å. Among the entire cluster of complexes predicted by AutoDock4, the most populated cluster conformation, together with the lowest energy conformation for the most active compound docked to the receptor.
The computation equipment used in the experiments is an Intel Core i7 with 3.2 GHz of processing on Ubuntu 16.04.
2.2. Heuristic Approach
We propose a heuristic-based approach to find new high-quality ligands from the sub-ligands obtained from the deconstruction process shown in
Figure 1 in reduced computation times. This process works as a replacement for the reconstruction process shown in stage 4 in
Figure 1.
The objective is to reduce the time involved in the reconstruction process using a local search approach capable of quickly constructing high-quality new ligands. Next, we present the representation and evaluation function used in our algorithm. Also, it shows the main structure of the local search approach and the search operators used.
2.2.1. Representation
A list of fragments represents a solution. Each solution contains a unique header and one or more bodies. The number of bodies depends on the number of available bonds (asterisk identifiers) in the header structure.
Figure 4 shows an example of the representation of a solution. The solution contains the header in the first position, followed by two bodies. It uses SMILES format for the elements in the list. In the lower area, the two-dimensional model is shown as well.
2.2.2. Evaluation Function
The evaluation starts checking the feasibility of the list of fragments and then evaluates its quality. Feasibility is a filter according to the Lipinski’s rules to determine that each new ligand is valid in a pharmacological environment [
18]. Once validated, it computes the energy of the new ligand using the
AutoDock Vina [
32] software and the same conditions described in
Section 2.1. The number of positions to test is considered a parameter of the algorithm (
).
2.2.3. General Structure
Algorithm 1 shows the structure of the local search algorithm proposed here. It corresponds to a simulated annealing-based approach that uses two repairing operators to generate a new ligand at each iteration. It can accept worse quality solutions based on a probability controlled by a cooling schedule. This cooling schedule allows more deteriorations in the early stages of the search process and reduces their acceptance at the end of the process. The algorithm requires the parameter values for reduction factor () and initial and final temperature ( and ).
The algorithm performs a fixed number of times the whole simulated annealing procedure. This number of repetitions is required to change the header structure of the current solution during the process. The parameter
controls the number of repetitions. In each step, a local search process iterates from the given initial temperature to the final temperature (lines 5 to 22). At each iteration, temperature is reduced by
factor (line 21).
Algorithm 1 Simulated Annealing Algorithm |
- 1:
BestLigand ; - 2:
while (tries < max_tries) do - 3:
current_temp ← initial_temp; - 4:
CurrentLigand ← generate_initial_ligand(); - 5:
while (current_temp > final_temp and not stuck) do - 6:
NewLigand ← Change (CurrentLigand); - 7:
if (f(NewLigand) ≤f(CurrentLigand)) then - 8:
CurrentLigand ← NewLigand; - 9:
NewLigand ← Swap (CurrentLigand); - 10:
if (f(NewLigand) ≤f(CurrentLigand)) then - 11:
CurrentLigand ← NewLigand; - 12:
end if - 13:
else - 14:
if (rand ) then - 15:
CurrentLigand ← NewLigand; - 16:
end if - 17:
end if - 18:
if (f(CurrentLigand) ≤f(BestLigand)) then - 19:
BestLigand ← CurrentLigand; - 20:
end if - 21:
current_temp ← current_temp; - 22:
end while - 23:
end while
|
Each simulated annealing process incorporates a stuck criterion to reduce the computational effort dedicated to exploring local search neighborhoods. The parameter
controls the number of iterations without improvement. Variable
stores the best solution found during the whole process (line 1). Simulated annealing starts each iteration from a new random solution (line 4). To select a high-quality initial solution, a competition between random ligands is performed. The description of this procedure is presented in
Section 2.2.4.
The approach implements two movements:
change and
swap. At each iteration, the process searches for the best combination of bodies for a given header. Hence, the first movement selects a random body from the solution and changes it by a different body available in the library (line 6). If the movement generates an improvement in the energy of the current ligand, a second movement is applied to evaluate the distribution of these bodies in the available bonds of the header (line 9). Movements are explained in
Section 2.2.5 and
Section 2.2.6.
The algorithm performs the simulated annealing cooling scheme to accept worsening solutions only during the use of the change operator (line 14). It is expected that this operator generates more significant changes in the structure of solutions, and hence, it is allowed to generate deteriorations in some possible ligands as well.
2.2.4. Initial Solution Generation
In this step, the algorithm performs a competition between random solutions. First, it generates a solution choosing its header structure from the available ones. From its available bonds, the algorithm selects bodies to complete a new ligand. The selection of these bodies is random.
2.2.5. Change Movement
This movement consists of changing one of the bodies in the current solution. A random body from the current solution is selected and replaced by a new random body. To reduce the neighborhood of this movement, we use the Tanimoto indicator to determine the similarity between bodies. A Tanimoto similarity higher than defines that two bodies belong to the same neighborhood.
Figure 5 shows an example of the application of this movement. An input ligand and its corresponding output are in the upper and lower part, respectively. Each one is shown with its SMILES format and its two-dimensional representation. Moreover, it also shows the two-dimensional structure of both corresponding ligands on the right of the figure. In this example, the first body is replaced by a new body randomly selected using its Tanimoto similarity neighborhood from the SL set.
2.2.6. Swap Movement
This movement changes the order of the locations of bodies in the available header’s bonds. This step occurs after applying the change operator to the bodies. The goal of this movement is to improve the quality of the solution only if a promising new ligand was found. It is important to remark that the movement is not applied if the current solution contains only one body.
Figure 6 shows an example of the use of this movement. It shows the same information as in
Figure 5. This input ligand corresponds to the output ligand obtained in the change movement example. The movement only changes the location of two bodies, but this can lead to essential changes in the final ligand structure. It shows these changes in the right part of
Figure 6 with the two-dimensional representation of the resulting ligands.