1. Introduction
Crystal structure solution from powder diffraction data is limited respect to the single crystal because of the presence of diffraction peak overlap, the uncertain estimate of background and possible preferred orientation can prevent the complete and correct interpretation of the experimental diffraction pattern. Solution by powder diffraction is extensively applied to small structures, much less to macromolecules. Anyway, in the last 25 years, there has been a growing interest in powder diffraction: innovative theories, methodological approaches, experimental and computational abilities, and advanced software have been developed. Such a progress is also reported in the recent publication of the
International Tables for Crystallography, Volume H, Powder Diffraction [
1] which covers all the aspects of powder diffraction and is the key reference for all researchers involved in this area. The
Tables point out also the important role of powder diffraction in crystal structure determination.
In this scenario, a relevant contribution has come also from the software EXPO [
2], which is used for solving structures, organic, inorganic, metal-organic. EXPO is a computer program, born more than 20 years ago as a single-crystal diffraction like software for the
ab-initio solution by powder diffraction data and progressively improved by robust theory, methodological approaches, computational and graphical tools. Not negligible is the capacity of the program to carry out all the steps of the powder solution process: determination of unit-cell parameters and space group; extraction of experimental structure factor moduli; solution by Direct Methods and/or Simulated Annealing; refinement by the Rietveld method, editing and visualizing molecules. One or more of these steps can be executed by following standard processes, which are mainly automatic, or non-default approaches whose application is easily guided by the EXPO user-friendly graphic tools. The authors of EXPO are also alert to develop computational methods able to minimize the execution time of the solution process: the solution by Direct Methods is usually performed in only a few minutes; instead the solution by Simulated Annealing can be slower and even requires days in case of quite complex structures. The standard approaches, which are the result of an intensive test work on hundreds of structures already solved and published, of different structure complexity, suggest solution strategies successful in most cases. Several options of non-default procedures are available in case of failure of the default run. This paper is focused on applications of EXPO for the solution in the reciprocal space [
3], specifically by Direct Methods (default choice), or in the real space [
4], in particular by Simulated Annealing, or by both.
The decision about the best solution strategy mainly depends on data quality, especially overall diffracted intensity, experimental resolution and peak overlap; structure complexity expressed in terms of number of non-hydrogen atoms in the asymmetric unit and/or degrees of freedom of the structure; available information on the expected structure model. In particular, peak overlap [
5] is unavoidably present and can cause that the interpretation of a powder diffraction pattern is not straightforward. Altomare et al. [
6] proposed a method for evaluating the overlapping degree of a powder diffraction profile, warning that the larger the overlap, the smaller is the number of independent observations with consequences on the rate of success of Direct Methods solution procedures and least-squares refinement processes.
In this paper, we discuss and compare the different conditions requested for a successful solution by Direct Methods and Simulated Annealing, respectively, in EXPO and we present two case studies that clarify the opportunity of using one method rather than the other one and explain the strength of using both, if possible. In particular, we show that we can originally and profitably exploit the structure model obtained at the end of the solution process by Simulated Annealing for improving the solution by Direct Methods and confirming the correctness of the Simulated Annealing model.
2. Solving Structures in the Reciprocal Space by EXPO: Case Studies
Structure solution by X-ray powder diffraction data through reciprocal-space methods [
3] is a single-crystal like process. In this perspective, the structure factor moduli are extracted from the experimental diffraction pattern, then Direct Methods, Patterson Methods, Maximum Entropy Methods or Charge Flipping determine the phases of reflections, finally deriving the electron density map. The structure solution process in the reciprocal space by EXPO consists of a very fast sequence of steps: extraction, from the powder profile, of the experimental structure factor moduli through the Le Bail method [
7]; progressive use of the moduli by Direct Methods for solving the phase problem and statistically and probabilistically evaluating the phases of structure factors; calculation of the inverse Fourier transform of structure factors so determined in both moduli and phases; interpretation of peak positions and intensities in the electron density map for obtaining the structure model; optimization of the structure model by least-squares-fourier-recycling methods. In addition to the speed, another great advantage of the Direct Methods
ab-initio solution is the minimal request of necessary initial information, chemical formula and diffraction pattern only. Moreover, when a chemically plausible structure model is obtained by Direct Methods, its reliability is usually high. The solution by Direct Methods is the automatic choice in EXPO because it is fast and, if successful, reliable.
The attainment of the correct solution by Direct Methods requires conditions that, unfortunately, are not always fulfilled: well estimated structure factors moduli extracted from the diffraction pattern and atomic experimental resolution.
Because of diffraction peak overlap, the accuracy of the extracted moduli is usually poor even if it can be improved by the use of prior information in the Le Bail algorithm [
8] in EXPO. Peak overlap can be reduced, but not completely eliminated. In this paragraph we consider the set of 155 test structures, contained in our database of structures already solved (by using different available methods and/or software) and published, on which the capacity of solution by EXPO, by using default and non-default strategies, has been proved. The set is made up by organic, metal-organic, inorganic structures and covers a broad range of research fields. For them, in
Table 1, we provide the experimental resolution (RES), the structure complexity defined by the number of non-hydrogen atoms in the asymmetric unit (NA-noH), and the type of data (laboratory X-ray or synchrotron) (see below for the content in the last column). Molecular formula and reference corresponding to each test structure are listed as
Supplementary information in Table S1.
Figure 1 gives details about the errors on the structure factor moduli estimates extracted from the experimental powder diffraction profile: for each test structure reported in
Table 1, the ordinate axis displays the corresponding R
F reliability parameter (the structures have been ranked by the increasing value of R
F),
where the summation is over the number of reflections in the experimental diffraction pattern,
is the structure factor modulus corresponding to the
h reflection extracted by the Le Bail method in EXPO and
is the modulus calculated by the correctly refined model. The R
F values, which are calculated in an
a-posteriori analysis after that the structures have been solved, reveal that the errors range from 21% to 91% and that very large errors may occur: the percentage of structures with R
F > 40% is 62%; 35% with R
F > 50%. Such a circumstance clarifies the reasons why the
ab-initio solution by powder diffraction is still a challenge.
One of the causes of the uncertainties on the experimental structure factor moduli is undoubtedly the peak overlap. It can be useful to quantify the peak overlap extent in a powder diffraction pattern. We developed a method that evaluates the peak overlap degree, specifically the percentage of independent observations in an experimental powder diffraction profile [
6]: the larger the overlap, the smaller is the number of independent observations, the smaller the rate of success solution. The percentage value can be used as one of the predictive indications on the success of the solution process by Direct Methods. In
Figure 2, the percentage of independent reflections (IRP) is shown for each of the 155 test structures reported in
Table 1: in average, IRP decreases as R
F increases; the correlation between IRP and R
F is −0.75, which indicates the strong relationship between the peak overlap degree and the uncertainties on the structure factor moduli estimates. Only 36% of the reported structures have IRP larger than 50%.
The average values of RF and IRP on synchrotron data structures are 42% and 52% respectively, while the corresponding values on the laboratory X-ray data structures are 47% and 44%.
In the last column of
Table 1, the ratio between the number of independent reflections, obtained by multiplying the number of reflections in the experimental pattern (NR
exp) by IRP, and NA-noH is supplied (IR/NA-noH). Direct Methods usually require that the number of reflections actively used in the phasing process (NR
act) is at least 7 times the number of non-hydrogen atoms in the asymmetric unit to assure a sufficient number of phase relationships and a good quality electron-density map [
6]. NR
act is usually much smaller than the number of measured reflections: for the test structures in
Table 1, 10% < (NR
act/NR
exp)⋅100 < 46%. This means that when IRP is small the condition requested by Direct Methods is hardly satisfied.
When we are approaching the solution process of a new structure by Direct Methods, a preliminary estimate about the success probability can be useful. The possibility of success increases with increasing IRP and IR/NA-noH, and decreasing RES and NA-noH. According to the analysis on our test structures,
Table 2 schematizes the limits, empirically determined, of these conditions for the two extreme cases of high and low success rates, respectively. When a structure does not verify the conditions of high and low success rates, the advice is anyway to attempt the EXPO
ab-initio solution process. Structure unsolved by Direct Methods can be successfully solved by real-space methods.
The organic structures are the most resistant to the solution attempts by Direct Methods because the rapid scattering factor decay for light atoms usually limits the experimental resolution and therefore two case studies of organic structures are commented below. They are outside the limits of conditions of both high and low success rate.
Firstly, we will consider the case of 5-(5-nitro furan-2-ylmethylen), 3-N-(2-methoxy phenyl), 2-N’-(2-methoxyphenyl) imino thiazolidin-4-one compound [
9] with chemical formula C
22H
17O
6N
3S. X-ray diffraction data, shown in
Figure 3, were collected at room temperature (293 K) by using an automated Rigaku RINT2500 laboratory diffractometer (50 KV, 200 mA), equipped with an asymmetric Johansson Ge(111) crystal to select the monochromatic CuKa1 radiation (λ = 1.54056 Å). The silicon strip Rigaku D/teX Ultra detector was used. The measurement was scanned for diffraction angles (2θ) ranging from 7° to 70° with a step size of 0.02° and a time of 6 s/step. The value of RES is 1.34 Å.
The structure has been automatically and successfully indexed by EXPO: triclinic, cell parameters:
a = 11.476 Å,
b = 10.912 Å,
c = 8.809 Å, α = 103.663°, β = 91.490°, γ = 84.140°. Assuming
P as space group, NA-noH = 32. The number of reflections in the experimental pattern is 919 and the number of independent reflections (IR) calculated by EXPO according to [
6] is 349 (IRP = 38%) with IR/NA-noH of 10.9. With these conditions, the
ab-initio solution process by Direct Methods in EXPO is not promising, but can be tried. Anyway, the default run did not provide a feasible solution. Therefore, with the aim of facing the peak overlap problem, possible responsible of failure, a non-default attempt was carried out by using the EXPO strategy based on a random approach in the Le Bail extraction step [
10]: a special directive was introduced in the input file for EXPO. The input file of 5-(5-nitro furan-2-ylmethylen), 3-N-(2-methoxy phenyl), 2-N’-(2-methoxyphenyl) imino thiazolidin-4-one compound for EXPO is given as
supplementary information in Table S2. We derived a reasonable structure model, but approximate (presence of false atom positions and chemical labels, lowly accurate bond distances and angles) and uncomplete. The model was improved by the application of default [
11] and non-default structure model optimization strategies [
12] ending to the complete and correct solution. The overall execution time for obtaining the correct Direct Methods solution starting from only the chemical formula and experimental diffraction pattern was about 5 min [Intel(R) Core(TM) i7-4510U CPU @ 2.00 GHz 2.60 GHz]. In an
a-posteriori analysis, after that the structure has been solved, we checked, by comparison with the true model, that the error on the structure factor moduli extracted by the Le Bail method is about 46%. The hydrogen atoms were geometrically located in the Direct Methods model by using the tool available in EXPO and then the structure was refined by the Rietveld method (R
p = 2.034, R
wp = 2.777). In
Figure 4 the observed, calculated and difference profiles as well as the background are shown.
The second case corresponds to 2-(5,6-dimethylimidazo[2,1-b]thiazol-3-yl)-1-morpholinoethanone compound with chemical formula C
13H
17N
3O
2S (shorthand code MORPHOLIN), Z’ = 2, solved by single-crystal diffraction data [
13]. X-ray powder diffraction data, shown in
Figure 5, were collected at room temperature (293 K) by using an automated Rigaku RINT2500 laboratory diffractometer (50 KV, 200 mA), equipped with an asymmetric Johansson Ge(111) crystal to select the monochromatic CuKa1 radiation (λ = 1.54056 Å). The silicon strip Rigaku D/teX Ultra detector was used. The measurement was scanned for diffraction angles (2θ) ranging from 8° to 70° with a step size of 0.02° and a time of 6 s/step. The value of RES is 1.34 Å.
The cell parameters and the space group have been automatically and successfully determined by EXPO: monoclinic with cell parameters: a = 14.237 Å, b = 19.197 Å, c = 10.085 Å, β = 99.082°; space group P21/c. NA-noH is 38. The number of reflections in the experimental pattern is 1175 and the number of independent reflections (IR) calculated by EXPO is 331 (IRP = 28%) with IR/NA-noH equal to 8.7. Trials to solve the structure by Direct Methods in EXPO using default and non-default strategies were unsuccessful so we decided to solve the structure in the real space. In an a-posteriori analysis, we verified that the error on the structure factor moduli extracted by the Le Bail method was nearly 47%.
3. Solving Structures in Real Space by EXPO: A Case Study
Structure solution in the real space [
4], developed for overcoming the limits of solution in the reciprocal space, is a valid alternative to Direct Methods especially when structure expected molecular geometry information is available and the solution in the real space can be successfully carried out requiring neither experimental atomic resolution nor extraction of the structure factor moduli. The structure complexity is described by degrees of freedom (DoFs) corresponding to the external DoFs of position (3 DoFs) and orientation (3 DoFs) of each structure fragment in the expected model, and to the internal DoFs of torsion angles. Countless variations of DoFs are randomly generated and the corresponding cost function (CF) values are monitored. CF depends on the agreement between the observed and calculated diffraction profiles. Global optimization methods are used to reach the global minimum of CF, which should correspond to the best real-space solution, possibly the correct structure. The efficiency of the real-space solution process depends on the discrepancy between the starting bond distance and bond angle values respect to the correct parameters and the number of DoFs. The execution time, which is usually longer than for the reciprocal-space solution case, mainly depends on the skill of building up the appropriate starting model, on the number of DoFs and on symmetry. In our experience, structures containing only one fragment with number of internal DoFs up to 7 are usually easily solved. For more complex structures, it can be effective to use fast computer and parallel version software.
The solution in the real space by EXPO searches the global minimum of CF by Simulated Annealing (SA) which has been suitably modified for improving efficiency and reducing execution time [
14,
15]. The CF function used in EXPO is R
wp [
2].
MORPHOLIN compound has been solved by SA in EXPO: the hydrogen atoms have been neglected. Eliminating the H atoms, which do not contribute significantly to X-ray diffraction, effectively decreases the time to evaluate CF for each trial structure. The angular range 8°< 2θ < 45.30° was used (RES = 2.0 Å). The starting expected model of the title compound was assembled using the sketching facilities of ACD/ChemSketch [
16] and the geometry was optimized by using MOPAC [
17], which was run, by using EXPO graphical interface. Two molecules were positioned into the unit cell. In
Figure 6, the starting model (
Figure 6a) obtained by MOPAC and the SA solution (
Figure 6b) are shown.
A total of 18 DoFs (12 external and 6 torsion angles for the two molecules) were optimized by EXPO during the SA minimization process. No feasible solution was obtained in a default run (20 times). The global optimization algorithm was run, in a non-standard way, by 100 times under Linux workstation. The number of iterations was increased (
niter directive set to 2000 in the EXPO input file) to achieve the optimal crystal structure with a reasonable success rate. The overall time spent on the calculation was about 277 hours (Intel(R) Xeon(R) CPU E5-2690 @ 2.90GHz). The structure of the best solution with the lowest profile cost function (R
wp = 7.34) was selected. The root mean square displacement calculated for all non-H atoms after the overlay upon structure solution obtained by single-crystal data [
13] was 0.0436 Å. The hydrogen atoms were geometrically located by using the tool available in EXPO and then the structure was refined by the Rietveld method (R
p = 1.625, R
wp = 2.11). In
Figure 7 the observed, calculated and difference profiles as well as the background are shown.