Next Article in Journal
Effect of DC Micro-Pulsing on Microstructure and Mechanical Properties of TIG Welded Ti-6Al-4V
Previous Article in Journal
New Co-Crystals of Betaine: Significant Improvements in Hygroscopicity
Previous Article in Special Issue
Predicting X-ray Diffraction Quality of Protein Crystals Using a Deep-Learning Method
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Identifications of False Positives Amongst Sodium(I) Cations in Protein Three-Dimensional Structures—A Validation Approach Extendible to Any Alkali or Alkaline Earth Cation and to Any Monoatomic Anion

Department of Chemistry, University of Pavia, viale Taramelli 12, 27100 Pavia, Italy
Crystals 2024, 14(11), 918; https://doi.org/10.3390/cryst14110918
Submission received: 4 October 2024 / Revised: 18 October 2024 / Accepted: 20 October 2024 / Published: 24 October 2024
(This article belongs to the Special Issue Protein Crystallography: The State of the Art)

Abstract

:
Validation of the data deposited in the Protein Data Bank is of the upmost importance, since many other databases, data mining processes, and artificial intelligence tools are strictly grounded on them. The present paper is divided into two parts. The first part describes and analyzes validation methods that have been designed and used by the structural biology community. Everything began with the Ramachandran plot, with its allowed and disallowed types of backbone conformations, and evolved in different directions, with the inclusion of additional stereochemical features, distributions’ analyses of structural moieties, and scrutiny of structure factor amplitudes across the reciprocal lattice. The second part of the paper is focused on the largely unexplored problem of the high number of false positives amongst the sodium(I) cations observed in protein crystal structures. It is demonstrated that these false positives, which are atoms wrongly identified with sodium, can be identified by using electrostatic considerations and it is anticipated that this approach can be extended to other alkali and alkaline earth cations or to monoatomic anions. In the end, I think a global initiative, accessible to all volunteers and possibly overseen by the Protein Data Bank, should take the place of the numerous web servers and software applications by providing the community with a select few reliable and widely accepted tools.

1. Introduction

1.1. Overview of the Validation Tools

Validation of scientific observations is an essential aspect of every scientific procedure: it implies not only the estimation of the accuracy but also the estimation of the level of anomaly in the observation [1].
The Ramachandran plot marked the beginning of validation in structural biology [2,3]: by examining the values of phi and psi torsions, it is possible to determine if a protein’s 3D structure is normal or aberrant. Normal structures have phi and psi values within the permissible ranges, while abnormal structures have phi and psi values outside of these ranges. The validation technique was made accessible to structural biologists by the implementation of ProCheck, a computer application [4]: Google Scholar reports that the initial release of ProCheck, a publication that is almost 30 years old, has received over 27,100 citations.
Despite its user-friendly interface and clarity, ProCheck suffers from a significant drawback that is common among other similar software programs: it lacks the ability to differentiate between actual faults and functionally significant abnormalities, such as the accumulation of energy in the form of strain. In other words, ProCheck has the ability to detect abnormal backbone structures, even in situations where the abnormality is not necessarily a mistake.
Similar approaches were adopted in other computer programs, like AQUA and Procheck-NMR [5], ProSA [6,7], Molprobity [8,9,10,11], What_Check, and the use of RAMA-Z [12,13].
Additional structural features were included into the validation: for example, in MolProbity [8,9,10,11], the position of the hydrogen atom bound to the Calpha atom is computed and any potential collisions with other atoms are searched for; Afonine and his colleagues suggested utilizing the distribution of stereochemical hydrogen bond parameters to detect potential inaccuracies in low-resolution structures, where the reliability of Ramachandran plot analyses is reduced due to restraints imposed on backbone torsions [14].
Differently from the above methods, which primarily examine stereochemical features, SFCHECK [15] examines the structure factor amplitudes across the reciprocal lattice, assessing data completeness and the global and local agreement between the atomic model and experimental data.
In addition, there exist computational algorithms specifically developed to evaluate the quality of 3D models and choose the most credible option from a given set of options [16,17,18,19,20]. A novel non-parametric method has been tailored to detect outliers amongst the PDB entries [21] and a novel confidence score, derived from the real space correlation coefficients between residue coordinates and experimental electron density, was created to identify local anomalies within a single PDB entry [22]. Pereira and Lamzin proposed that Calpha atoms exhibiting atypical local stereochemistry can be distinguished by their positioning in a three-dimensional space defined by the first three orthogonal components computed using eigen analysis of the distance matrix of the Calpha(i − 1)—O(i − 1)—Calpha(i)—O(i)—Calpha(i + 1) atom locations [23].
While most validation strategies have focused on X-ray protein crystal structures, there have also been advancements in methods applicable to structures determined using other techniques. These include Procheck-NMR [5], which is specialized for solution NMR structures, the pioneering validation of results obtained from small angle scattering experiments [24], and EMRinger, a tool that assesses the fitting of an atomic model into the map of cryo-EM structures [25].
Concluding this concise and unavoidably incomplete review of 3D protein structure validation approaches, it is essential to remember the variety of methods employed in PDB to assess the accuracy of each entry [26,27]. Structures obtained by diverse experimental approaches, such as crystallography, solution NMR, and Cryo-EM, require the employment of distinct techniques. Crystalline structures, for example, are monitored using five main parameters: the free-R-factor, the frequency of outliers in the Ramachandran plot, the frequency of interatomic clashes in the backbone and sidechains, and the real space R-factor Z-score. Each parameter is compared to its calculated distribution in either the entire set of crystallographic structures or only in the subset of structures with similar resolution to the entry.
Three decades of efforts in developing protein structure validation tools have improved tremendously the quality of the data deposited in the PDB. For instance, I recall encountering a PDB entry about 30 years ago where the biological source was mistakenly labeled as “omo sapiens”, a simple typo that would likely have confused the search engines of that time. Nowadays, such an error would not happen.

1.2. Aberrant and Inaccurate Structures

However, there is still potential for improvements.
There exists a substantial body of work in the literature about mistakes in protein structures and many strategies that have been suggested for the purpose of discovering and rectifying them. Zbigniew Dauter and colleagues wrote in 2014 that “Common errors can be traced to negligence and a lack of rigorous verification of the models against electron density, creation of non-parsimonious models, generation of improbable numbers, application of incorrect symmetry, illogical presentation of the results, or violation of the rules of chemistry and physics” [28].
After one year, it was noted that the majority of protein complexes with the anticancer medicines cisplatin and carboplatin exhibited substantial issues, either related to crystallography or chemistry [29]. Analogously, wrong zinc(II) sites were discovered in several protein structures [30]. It was discovered that, in a few structures of metallo-beta-lactamase, the interpretation of the bound ligands (e.g., inhibitors, substrate/product analogs) is doubtful or even incorrect [31].
For obvious reasons, the quality of the structures of proteins related to COVID-19 has been carefully checked [32,33,34]. Also, the structures of L-asparaginases, enzymes that include approved drugs for the treatment of certain types of leukemia, were verified [35]. Analogously, the quality of large-scale crystallographic fragment screening projects, the purpose of which is the identifications of very low-occupancy small molecule ligands in macromolecular complexes, has been questioned [36]. A systematic analysis of the quality of structures from structural genomics was published too [37].
Other issues are important and largely unexplored. For example, the identification of abnormal B-factors, whether they are isotropic, anisotropic, or produced by TLS refinements, by examining B-factor expected values and distributions, by considering various features like resolution, crystal packing, solvent accessibility, etc. Another issue is related to metal cations, such as sodium(I) or potassium(I), and monoatomic anions, such as chloride or bromide, which are occasionally misidentified.
This communication specifically addresses the latter problem. There are mistakes, as all of us know. For example, it has been reported that several metal cations are completely isolated in the crystal lattice, with no atoms bound to them: they are totally naked, something that is simply absurd [38].
Here, I will not provide a comprehensive method for evaluating the chemical nature of metal cations and monoatomic anions in protein structures. From my perspective, while this endeavor is intriguing and has the potential to be valuable to the scientific community, it is not suitable to be presented as one of the numerous (maybe excessive) computer applications and servers that exist on the internet. Here, I present a proof of concept about the ability to accurately and reliably identify electron density peaks that may have been mistakenly identified as metal cations or monoatomic anions. I would like to propose that the PDB initiates a research program, open to the scientific community, aimed at preventing cation and anion assignment errors, if not all of them. In fact, as reported by Rupp and colleagues, ensuring the non-publications of structures containing mistakes should be the outcome of a collaborative effort [39,40].
Such an initiative for validating metal cations and monoatomic anions can be grounded, of course, on the literature in this field, relatively abundant for metal cations and less for anions, where there are numerous studies and quite a few databases [41,42,43,44,45,46,47,48,49,50,51,52,53].
Significantly, numerous attempts have been made to validate metal biosites in metalloproteins [43,54], where the cations can perform diverse functions such as adjusting thermodynamic stability, enabling oxygen transport and storage, and catalyzing chemical reactions. Conversely, the validation of metal cations or monoatomic anions that occur by chance in the structure due to their presence in the crystallization mixture remains mainly unexplored. From a purely biological standpoint, these cations and anions are of lesser significance. Nevertheless, it is advisable to pretend that they are accurately designated.
This study investigates the columbic potential energy of the chemical environment around sodium(I) cations. The analysis is conducted on ten subsets of protein crystal structures that have been carefully selected from the PDB database. The potential energy distributions enable the identification of outliers that may not be a sodium(I) cation but rather something distinct. It is expected that the same method can be readily applied to other alkali or alkaline earth cations and monoatomic anions.
This approach has the advantage of being useful non only in crystallography, where, in many cases, the chemical nature of the atoms can be inferred from anomalous scattering, but also in other structural biology methods (Cryo-EM, NMR), where, in general, there is no way to make qualitative analyses about the chemical nature of the atoms.
Preventing these errors is not a trivial focus on minor but still significant details, as anything that is stored in the Protein Data Bank, and subsequently in other databases that use its data, can contaminate the results of statistical analysis, particularly in the era of artificial intelligence. The issue of bias in data mining the Protein Data Bank (PDB) caused by varying degrees of inaccuracies in protein structures has been extensively documented [55].
Misidentifying sodium(I) cations within three-dimensional protein structures can have significant consequences, introducing noise in the analysis of protein hydration layers and potentially altering the local protein surface. These errors can negatively affect docking studies and virtual screening outcomes, making it crucial to minimize them as much as possible.
An effective analogy may be drawn from the culinary world: a meticulously refined 3D structure can be likened to a Michelin-starred restaurant, whereas a sloppy structure resembles a university cafeteria where the dishes are merely palatable but lack distinction. Alternatively, one might highlight the difference between a mass-produced prêt-à-porter dress and a custom-made haute couture dress.
It is worth remembering how a sense of accomplishment can rescue people from the most heinous situations, as Primo Levi showed in “Se questo è un uomo [If This is a Man]” [56,57]. Levi tells the story of Lorenzo, an Italian mason imprisoned in a Nazi concentration camp who, despite the awful circumstances, continues to build a wall with commitment “solo per il piacere di fare un lavoro ben fatto [just for the satisfaction of doing a job well]” as a symbolic act of defiance against the camp’s atrocities. The same should happen in science.

2. Results

All protein crystal structures containing sodium(I) cations were downloaded from the Protein Data Bank and ten subsets were randomly assembled, each containing 100 structures (see Table S1 in the Supplementary Material), according to the procedure described by Carugo [58]. Thus, one may concurrently perform 10 analyses and estimates, then average the results. This enables the calculation of standard errors for each average, so assessing the reliability of the predictions.
The n non-hydrogen atoms within 3.5 Å from the sodium(I) cation were retained, their electrical charges qi were taken from the CHARMM force field (see Table S2 in the Supplementary Material; [59]), and the coulombic potential energy V of the sodium(I) cation was computed as
V = i = 1 n q i d i
where di is the distance of the i-th atoms from the sodium(I) cation. Crystal packing was explicitly considered, as described by Djinovic-Carugo and Carugo [38], to include atoms belonging to adjacent asymmetric units.
The threshold of 3.5 Å is slightly larger than the sum of the sodium(I) ionic radius and the oxygen van der Waals radius (2.5–2.9 Å). This is necessary for taking into account the intrinsic inaccuracies in protein crystal structures, especially at the protein surface where misidentifications are more probable, and to guarantee adequate coverage of the volume surrounding the cation.
Sodium(I) cations where it was impossible to assign the electric charge to one of their neighbors were discarded.
A sodium(I) cation surrounded by negatively charged atoms is expected to have V < 0. On the contrary, a sodium(I) cation with V close to 0 or even >0 is expected to have been erroneously identified and it is probably not a sodium(I) cation and, perhaps, it is not a cation at all.
A list of all sodium(I) cations and of the atoms close to them is available on request.
Figure 1a illustrates the distributions of the Coulombic potential energy Vs across the ten data subsets. The mean distribution, calculated by averaging the ten subsets, is shown in Figure 1b. The ten distributions are, with minor deviations, similar, with a maximum around −1.1. The same phenomenon is seen in the averaged distribution. Very few values of V are found below −2 (4.37%) or above 0 (0.06%).
These data may be used to assess a novel sodium(I) cation. If its V is higher than 0, it may be classified as an outlier with a 99.94% probability, since just 0.06% of the sodium(I) cations in the PDB have V values above 0.
To enhance the efficacy of this method, it is advisable to evaluate the new sodium(I) cation inside each of the ten subsets. The probability to be an outlier must be computed in each subset and, subsequently, the mean probability may be calculated.
Figure 2 illustrates three examples of atoms deposited as sodium(I) cations, but they may likely be totally different things. The randomly chosen instances are not meant to denigrate the authors but only to demonstrate how easily such mistakes can arise, especially when the emphasis on structure is mostly biological and little attention is paid to apparently trivial aspects.
In the first case (Figure 2a), a sphere with a radius of 3.5 Å, centered on the cation, encompasses a carboxylic side-chain oxygen atom, a backbone oxygen atom, a backbone carbon atom, and two backbone nitrogen atoms. The Coulombic potential energy (−0.534) is rather elevated, as deduced from a comparison with the data shown in Figure 1. Quantitatively, just 8.74% of the sodium(I) cations from the ten subsets shown in Figure 1 possess Coulombic potential energy values above −0.534. This indicates that the cation shown in Figure 2a is likely not a sodium(I) cation and, presumably, is not a cation at all.
In the two further cases (Figure 2b,c), even the termini of lysine side-chains being within 3.5 Å of the sodium(I) cations and the Coulombic potential energy values are greater than in the first example (−0.414 and −0.444). Furthermore, only 6.54% of the sodium(I) cations from the ten subsets shown in Figure 1 exhibit Coulombic potential energy values exceeding these values. This indicates that an incorrect assignment has occurred, implying that most likely these atoms are not sodium(I) cations.
Conversely, Figure 3 presents two instances in which the sodium(I) cations have been almost definitely accurately identified. In both cases, they are at a crystal packing contacts and they are coordinated by carboxylate side-chains and by several water molecules. Their Coulombic potential energy values (−2.05 and −2.12) are low and the large majority of the sodium(I) cations from the ten subsets shown in Figure 1 exhibit Coulombic potential energy values exceeding these values.
These examples unequivocally demonstrate that anomalous cations may be identified in PDB entries by a simple method grounded on electrostatic interactions.

3. Discussion

Errors in identifying sodium(I) cations within three-dimensional protein structures can lead to substantial repercussions, as they introduce noise in the analysis of hydration layers surrounding proteins and may modify the local structure of the protein surface, adversely affecting docking and virtual screening results. It is, consequently, important to minimize these errors as much as possible.
Clearly, the strategy outlined above can be improved, for instance, by include hydrogen atoms, which are currently disregarded, despite their locations often being indeterminate. Nonetheless, this raises a debate of the extent to which nonexperimental knowledge might be used in the validation of experimental results.
Also, unusual B-factor values may indicate misassignments of sodium(I) cations, which could be more accurately interpreted as heavier or lighter elements or groups, or may suggest that the occupancies are <1 [63,64].
Moreover, the percentages calculated in Figure 2 are likely overestimated. In fact, several sodium(I) cations included in the ten subsets of Figure 1 are likely not authentic sodium(I) cations and may be false positives and, consequently, the proportion of authentic sodium(I) cations with Coulombic potential energy above that shown in Figure 2 is likely less than 8.74% or 6.54%. This reinforces the warning that the examples shown in Figure 2 are outliers and, hence, do not represent sodium(I) cations. Nonetheless, any a priori assumption used to eliminate false positives from the ten subgroups shown in Figure 1 has inherent risks, since it lacks direct experimental evidence.
It is prudent to eliminate any isolated sodium(I) cations. These cations lack nearby atoms and have been described as “naked metal cations swimming in protein crystals”. These are probable mistakes in the interpretation of the electron density maps, in the absence of additional experimental evidence.
These considerations imply an additional remark.
The performance of prediction algorithms is often assessed by whole or partial cross-validation. This necessitates the creation of two data sets, one designated as positive and the other as negative, followed by a determination of the amounts of true positives, true negatives, false positives, and false negatives. Thus, with these four integer values, one may evaluate the accuracy of the predictive algorithms together with their sensitivity, specificity, and Matthews correlation coefficient.
The first phase of this approach is obstructed here, since it is infeasible to create two distinct datasets: one including metal sites that are unequivocally sodium(I) sites and the other consisting of metal sites that are definitively not sodium(I) sites. Therefore, I am unable to provide strong and accurate assessments of the success of the prediction approach outlined in this report. This strategy may assist in preventing errors: sodium(I) candidates with substantial Coulombic potential energy should be regarded with caution, particularly when the proportion of occurrences with greater energy is minimal.

4. Conclusions

Validation of results is fundamental to science, and in structural biology, it involves identifying errors and inaccuracies in macromolecular structures. For over three decades, significant efforts have been made in this area, addressing various and different aspects such as the analysis of backbone phi and psi torsions and the distribution of structure factors.
It is impossible to eliminate all potential errors in structural biology as in any other human endeavor. However, it is possible to enhance our technology to reduce mistakes to a minimum. Interestingly, minor and major mistakes occur in crystallographic analyses of small molecules, where the ratio between experimental information and number of parameters to be determined and refined in much higher than in protein crystallography. Several examples can be found in the literature, from the pioneering works of Marsh, who highlighted erroneous space group determinations (see for example [65,66,67]; for a long time, the expression “to be marshed” indicated someone who published a wrong structure), to more recent publications (see for the numerous examples reported in [68,69]).
Someone with a chemical background will readily observe in Figure 2 that the atoms listed in the PDB as sodium(I) cations are not actually sodium(I) cations, nor can they be classified as any other form of cation. However, inexpert structural biologists or scientists lacking proper education in crystallography and structural chemistry might not realize this mistake, especially if under stress to publish quickly to secure a tenured position, spurred on by greedy faculty deans.
In addition to using validation methods, the responsibilities of the reviewers must be reinforced. Numerous protein crystal structures, particularly those deemed “hot” within the molecular biology community, are published in scientific journals that emphasize biological novelty above structural findings. As a result, it is possible that none of the reviewers has expertise in crystallography, NMR, or CryoEM. Consequently, erroneous structures may be disseminated rather being refined under the reviewers’ supervision. It is preferable that at least one reviewer have extensive experience in crystallography or relevant areas of structural biology as needed.
Many computer programs have been published during the last three decades in the field of protein structure validation. Although the Protein Data Bank has implemented a rigorous validation strategy using a limited set of tools, several other methods are available, which can result in some uncertainty and disorientation. In my opinion, the creation of a more unified and standardized approach is advisable.
This initiative, potentially coordinated by the Protein Data Bank, should welcome contributions from all scientists and remain flexible, allowing for the incorporation of new validation methods and assessments of new structural features, such as the false sodium cations discussed in this communication. Additionally, it should be adaptable to structures determined not only by X-ray crystallography, solution NMR, and cryo-EM, but also by future technologies.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/cryst14110918/s1, Table S1: List of the PDB files used to compute the coulombic energy of the Na ions. They are divided into 10 sets, each containing 100 PDB files. Table S2: Charges of the non-hydrogen atoms of amino acids, N-teminal and C-tarminal moeities, and water.

Funding

O.C. acknowledges support from the Ministero dell’Università e della Ricerca (MUR) and the University of Pavia through the program “Dipartimenti di Eccellenza 2023−2027”.

Data Availability Statement

All data are available in public databases.

Acknowledgments

O.C. thanks K. Djinovic for helpful discussions, T. Albinoni for constant support, and V. Capossela for the help in preparing figures.

Conflicts of Interest

There are no conflicts of interest regarding the publication of this work.

References

  1. Popper, K. Logik der Forschung; Verlag von Julius Springer: Heidelberg, Germany, 1934. [Google Scholar]
  2. Carugo, O.; Djinovic-Carugo, K. Half a century of Ramachandran plots. Acta Crystallogr. 2013, D69, 1333–1341. [Google Scholar] [CrossRef] [PubMed]
  3. Ramachandran, G.; Ramakrishnan, C.; Sasisekharan, V. Stereochemistry of polypeptide chain conformations. J. Mol. Biol. 1963, 7, 95–99. [Google Scholar] [CrossRef] [PubMed]
  4. Laskowski, R.A.; MacArthur, M.W.; Moss, D.S.; Thornton, J.M. PROCHECK: A program to check the stereochemical quality of protein structures. J. Appl. Cryst. 1993, 26, 283–291. [Google Scholar] [CrossRef]
  5. Laskowski, R.A.; Rullmann, J.A.C.; MacArthur, M.W.; Kaptein, R.; Thornton, J.M. AQUA and PROCHECK-NMR: Programs for checking the quality of protein structures solved by NMR. J. Biomol. NMR 1996, 8, 477–486. [Google Scholar] [CrossRef]
  6. Sippl, M.J. Recognition of errors in three-dimensional structures of proteins. Proteins 1993, 17, 355–362. [Google Scholar] [CrossRef]
  7. Wiederstein, M.; Sippl, W.M. ProSA-web: Interactive web service for the recognition of errors in three-dimensional structures of proteins. Nucleic Acids Res. 2007, 35, W407–W410. [Google Scholar] [CrossRef]
  8. Chen, V.B.; Arendall, W.B., 3rd; Headd, J.J.; Keedy, D.A.; Immormino, R.M.; Kapral, G.J.; Murray, L.W.; Richardson, J.S.; Richardson, D.C. MolProbity: All-atom structure validation for macromolecular crystallography. Acta Crystallogr. 2010, D66, 12–21. [Google Scholar] [CrossRef]
  9. Davis, I.W.; Murray, L.W.; Richardson, J.S.; Richardson, D.C. MOLPROBITY: Structure validation and all-atom contact analysis for nucleic acids and their complexes. Nucleic Acids Res. 2004, 32, W615–W619. [Google Scholar] [CrossRef]
  10. Hintze, B.J.; Lewis, S.M.; Richardson, J.S.; Richardson, D.C. Molprobity’s ultimate rotamer-library distributions for model validation. Proteins 2016, 84, 1177–1189. [Google Scholar] [CrossRef]
  11. Williams, C.J.; Headd, J.J.; Moriarty, N.W.; Prisant, M.G.; Videau, L.L.; Deis, L.N.; Verma, V.; Keedy, D.A.; Hintze, B.J.; Chen, V.B.; et al. MolProbity: More and better reference data for improved all-atom structure validation. Protein Sci. 2018, 27, 293–315. [Google Scholar] [CrossRef]
  12. Hooft, R.W.; Sander, C.; Vriend, G. Objectively judging the quality of a protein structure from a Ramachandran plot. Comput. Appl. Biosci. 1997, 13, 425–430. [Google Scholar] [CrossRef] [PubMed]
  13. Sobolev, O.V.; Afonine, P.V.; Moriarty, N.W.; Hekkelman, M.L.; Joosten, R.P.; Perrakis, A.; Adams, P.D. A Global Ramachandran Score Identifies Protein Structures with Unlikely Stereochemistry. Structure 2020, 28, 1249–1258.e1242. [Google Scholar] [CrossRef] [PubMed]
  14. Afonine, P.V.; Sobolev, O.V.; Moriarty, N.W.; Terwilliger, T.C.; Adams, P.D. Overall protein structure quality assessment using hydrogen-bonding parameters. Acta Cryst. 2023, D79, 684–693. [Google Scholar] [CrossRef] [PubMed]
  15. Vaguine, A.A.; Richelle, J.; Wodak, S.J. SFCHECK: A unified set of procedures for evaluating the quality of macromolecular structure-factor data and their agreement with the atomic model. Acta Crystallogr. 1999, D55, 191–205. [Google Scholar] [CrossRef]
  16. Benkert, P.; Biasini, M.; Schwede, T. Toward the estimation of the absolute quality of individual protein structure models. Bioinformatics 2011, 27, 343–350. [Google Scholar] [CrossRef]
  17. Benkert, P.; Tosatto, S.C.; Schomburg, D. QMEAN: A comprehensive scoring function for model quality assessment. Proteins 2008, 71, 261–277. [Google Scholar] [CrossRef]
  18. Praznikar, J.; Tomic, M.; Turk, D. Validation and quality assessment of macromolecular structures using complex network analysis. Sci. Rep. 2019, 9, 1678. [Google Scholar] [CrossRef]
  19. Studer, G.; Biasini, M.; Schwede, T. Assessing the local structural quality of transmembrane protein models using statistical potentials (QMEANBrane). Bioinformatics 2014, 30, i505–i511. [Google Scholar] [CrossRef]
  20. Studer, G.; Rempfer, C.; Waterhouse, A.M.; Gumienny, R.; Haas, J.; Schwede, T. QMEANDisCo-distance constraints applied on model quality estimation. Bioinformatics 2020, 36, 1765–1771. [Google Scholar] [CrossRef]
  21. Shao, C.; Liu, Z.; Yang, H.; Wang, S.; Burley, S.K. Outlier analyses of the Protein Data Bank archive using a probability-density-ranking approach. Sci. Data 2018, 5, 180293. [Google Scholar] [CrossRef]
  22. Shao, C.; Bittrich, S.; Wang, S.; Burley, S.K. Assessing PDB macromolecular crystal structure confidence at the individual amino acid residue level. Structure 2022, 30, 1385–1394. [Google Scholar] [CrossRef]
  23. Pereira, J.; Lamzin, V.S. A distance geometry-based description and validation of protein main-chain conformation. IUCrJ 2017, 4, 657–670. [Google Scholar] [CrossRef] [PubMed]
  24. Trewhella, J.; Hendrickson, W.A.; Kleywegt, G.J.; Sali, A.; Sato, M.; Schwede, T.; Svergun, D.I.; Tainer, J.A.; Westbrook, J.; Berman, H.M. Report of the wwPDB Small-Angle Scattering Task Force: Data requirements for biomolecular modeling and the PDB. Structure 2013, 21, 875–881. [Google Scholar] [CrossRef] [PubMed]
  25. Barad, B.A.; Echols, N.; Wang, R.Y.; Cheng, Y.; DiMaio, F.; Adams, P.D.; Fraser, J.S. EMRinger: Side chain-directed model and map validation for 3D cryo-electron microscopy. Nat. Methods 2015, 12, 943–946. [Google Scholar] [CrossRef]
  26. Gore, S.; Sanz Garcia, E.; Hendrickx, P.M.S.; Gutmanas, A.; Westbrook, J.D.; Yang, H.; Feng, Z.; Baskaran, K.; Berrisford, J.M.; Hudson, B.P.; et al. Validation of Structures in the Protein Data Bank. Structure 2017, 25, 1916–1927. [Google Scholar] [CrossRef]
  27. Smart, O.S.; Horský, V.; Gore, S.; Svobodová Vařeková, R.; Bendová, V.; Kleywegt, G.J.; Velankar, S. Worldwide Protein Data Bank validation information: Usage and trends. Acta Cryst. 2018, 74, 237–244. [Google Scholar] [CrossRef]
  28. Dauter, Z.; Wlodawer, A.; Minor, W.; Jaskolski, M.; Rupp, B. Avoidable errors in deposited macromolecular structures: An impediment to efficient data mining. IUCrJ 2014, 1, 179–193. [Google Scholar] [CrossRef]
  29. Shabalin, I.; Dauter, Z.; Jaskolski, M.; Minor, W.; Wlodawer, A. Crystallography and chemistry should always go together: A cautionary tale of protein complexes with cisplatin and carboplatin. Acta Cryst. 2015, D71, 1965–1979. [Google Scholar] [CrossRef]
  30. Raczynska, J.E.; Wlodawer, A.; Jaskolski, M. Prior knowledge or freedom of interpretation? A critical look at a recently published classification of “novel” Zn binding sites. Proteins 2016, 84, 700–776. [Google Scholar] [CrossRef]
  31. Raczynska, J.; Shabalin, I.G.; Minor, W.; Wlodawer, A.; Jaskolski, M. A close look onto structural models and primary ligands of metallo-β-lactamases. Drug Resist. Updat. 2018, 40, 1–12. [Google Scholar] [CrossRef]
  32. Brezinski, D.; Kowiel, M.; Cooper, D.R.; Cymborowski, M.; Grabowski, M.; Wlodawer, A.; Dauter, Z.; Shabalin, I.G.; Gilski, M.; Rupp, B.; et al. Covid-19.bioreproducibility.org: A web resource for SARS-CoV-2-related structural models. Protein Sci. 2021, 30, 115–124. [Google Scholar] [CrossRef] [PubMed]
  33. Jaskolski, M.; Dauter, Z.; Shabalin, I.G.; Gilski, M.; Brzezinski, D.; Kowiel, M.; Rupp, B.; Wlodawer, A. Crystallographic models of SARS-CoV-2 3CLpro: In-depth assessment of structure quality and validation. IUCrJ 2021, 8, 238–256. [Google Scholar] [CrossRef] [PubMed]
  34. Wlodawer, A.; Dauter, Z.; Shabalin, I.; Gilski, M.; Brezinski, D.; Kowiel, M.; Minor, W.; Rupp, B.; Jaskolski, M. Ligand-centered assessment of SARS-CoV-2 drug target models in the Protein Data Bank. FEBS J. 2020, 287, 3703–3718. [Google Scholar] [CrossRef]
  35. Wlodawer, A.; Dauter, Z.; Lubkowski, J.; Loch, J.; Brezinski, D.; Gilski, M. Towards a dependable dataset of structures for L-asparaginase research. Acta Cryst. 2024, D80, 506–527. [Google Scholar]
  36. Jaskolski, M.; Wlodawer, A.; Dauter, Z.; Minor, W.; Rupp, B. Group deposition to the Protein Data Bank need adequate presentation and different archiving protocol. Protein Sci. 2022, 31, 784–786. [Google Scholar] [CrossRef]
  37. Domagalski, M.J.; Zheng, H.; Zimmerman, M.D.; Dauter, Z.; Wlodawer, A.; Minor, W. The quality and validation of structures from structural genomics. Meth. Mol. Biol. 2014, 2091, 297–314. [Google Scholar]
  38. Djinovic-Carugo, K.; Carugo, O. Naked Metal Cations Swimming in Protein Crystals. Crystals 2019, 9, 581. [Google Scholar] [CrossRef]
  39. Rupp, B.; Wlodawer, A.; Minor, W.; Helliwell, J.R.; Jaskolski, M. Correcting the record of structural publications requires joint effort of the community and journal editors. FEBS J. 2016, 283, 4452–4457. [Google Scholar] [CrossRef]
  40. Wlodawer, A.; Dauter, Z.; Minor, W.; Stanfield, R.; Porebski, P.; Jaskolski, M.; Pozjarski, E.; Weichenberger, C.X.; Rupp, B. Detect, Correct, Retract: How to manage incorrect structural models. FEBS J. 2018, 285, 444–466. [Google Scholar] [CrossRef]
  41. Brown, I.D.; Wu, K.K. Empirical Parameters for Calculating Cation-Oxygen Bond Valences. Acta Cryst. 1975, B32, 1957–1959. [Google Scholar] [CrossRef]
  42. Carugo, O. Buried chloride stereochemistry in the protein data bank. BMC Struct. Biol. 2014, 14, 19. [Google Scholar] [CrossRef] [PubMed]
  43. Gucwa, M.; Bijak, V.; Zheng, H.; Murzyn, K.; Minor, W. CheckMyMetal (CMM): Validating metal-binding sites in X-ray and cryo-EM data. IUCrJ 2024, 11, 871–877. [Google Scholar] [CrossRef] [PubMed]
  44. Harding, M.M. The geometry of metal-ligand interactions relevant to proteins. Acta Cryst. 1999, D55, 1432–1443. [Google Scholar] [CrossRef]
  45. Harding, M.M. The geometry of metal-ligand interactions relevant to proteins. II. Angles at the metal atom, additional weak metal-donor interactions. Acta Cryst. 2000, D56, 857–867. [Google Scholar] [CrossRef]
  46. Harding, M.M. Geometry of metal-ligand interactions in proteins. Acta Cryst. 2001, D57, 401–411. [Google Scholar] [CrossRef]
  47. Harding, M.M. The architecture of metal coordination groups in proteins. Acta Cryst. 2004, D60, 849–859. [Google Scholar] [CrossRef]
  48. Harding, M.M. Small revisions to predicted distances around metal sites in proteins. Acta Cryst. 2006, D62, 678–682. [Google Scholar] [CrossRef]
  49. Harding, M.M.; Nowicki, M.W.; Walkinshaw, M.D. Metals in protein structures: A review of their principal features. Cryst. Rev. 2010, 16, 247–302. [Google Scholar] [CrossRef]
  50. Hsin, K.; Sheng, Y.; Harding, M.M.; Taylor, P.; Walkinshaw, M.D. MESPEUS: A database of the geometry of metal sites in proteins. J. Appl. Cryst. 2008, 41, 963–968. [Google Scholar] [CrossRef]
  51. Lin, G.-Y.; Su, Y.-C.; Huang, Y.L.; Hsin, K.-Y. MESPEUS: A database of metal coordination groups in proteins. Nucl. Acids Res. 2024, 52, D483–D493. [Google Scholar] [CrossRef]
  52. Mueller, P.; Koepke, S.; Sheldrick, G.M. Is the bond-valence method able to identify metal atoms in protein structures? Acta Cryst. 2003, D59, 32–37. [Google Scholar] [CrossRef] [PubMed]
  53. Nayal, M.; Di Cera, E. Valence Screening of Water in Protein Crystals Reveals Potential Na+ Binding Sites. J. Mol. Biol. 1996, 256, 228–234. [Google Scholar] [CrossRef] [PubMed]
  54. Bazayeva, M.; Andreini, C.; Rosato, A. A adatabase overview of meta-coordination distances in metalloproteins. Acta Cryst. 2024, D80, 362–376. [Google Scholar]
  55. Minor, W.; Dauter, Z.; Helliwell, J.R.; Jaskolski, M.; Wlodawer, A. Safeguarding Structural Data Repositories against Bad Apples. Structure 2016, 24, 216–220. [Google Scholar] [CrossRef]
  56. Levi, P. If This Is a Man and the Truce; Abacus: London, UK, 2003. [Google Scholar]
  57. Levi, P. Se Questo e’ un Uomo; Einaudi: Turin, Italy, 2014. [Google Scholar]
  58. Carugo, O. Random sampling of the Protein Data Bank: RaSPDB. Sci. Rep. 2021, 11, 24178. [Google Scholar] [CrossRef]
  59. Brooks, B.R.; Bruccoleri, R.E.; Olafson, B.D.; States, D.J.; Swaminathan, S.; Karplus, M. CHARMM: A program for macromolecular energy, minimization, and dynamics calculations. J. Comput. Chem. 1983, 4, 187–217. [Google Scholar] [CrossRef]
  60. Anjanappa, R.; Garcia-Alai, M.; Kopicki, J.D.; Lockhauserbaumer, J.; Aboelmagd, M.; Hinrichs, J.; Nemtanu, I.M.; Uetrecht, C.; Zacharias, M.; Springer, S.; et al. Structures of peptide-free and partially loaded MHC class I molecules reveal mechanisms of peptide selection. Nat. Commun. 2020, 11, 1314. [Google Scholar] [CrossRef]
  61. Thomas, M.E.; Grinshpon, R.; Swartz, P.; Clark, A.C. Modifications to a common phosphorylation network provide individualized control in caspases. J. Biol. Chem. 2018, 293, 5447–5461. [Google Scholar] [CrossRef]
  62. Podvalnaya, N.; Bronkhorst, A.W.; Lichtenberger, R.; Hellmann, S.; Nischwitz, E.; Falk, T.; Karaulanov, E.; Butter, F.; Falk, S.; Ketting, R.F. piRNA processing by a trimeric Schlafen-domain nuclease. Nature 2023, 622, 402–409. [Google Scholar] [CrossRef]
  63. Rupp, B. Biomolecular Crystallography: Principles, Practice, and Application to Structural Biology; Garland Science: New York, NY, USA, 2010. [Google Scholar]
  64. Wlodawer, A.; Minor, W.; Dauter, Z.; Jaskolski, M. Protein crystallography for non-crystallographers, or how to get the best (but not more) from published macromolecular structures. FEBS J. 2008, 275, 1–21. [Google Scholar] [CrossRef]
  65. Connick, W.B.; Henling, W.M.; Marsh, R.E. Revision of structure of (bipyridyl-N,N’)disyanoplatinum(II). Acta Cryst. 1996, B52, 817–822. [Google Scholar] [CrossRef]
  66. Marsh, R.E. P1 or P-1? Or something else? Acta Cryst. 1999, B55, 931–936. [Google Scholar] [CrossRef] [PubMed]
  67. Marsh, R.E. Space groups P1 and Cc: How are they doing? Acta Cryst. 2009, B65, 782–783. [Google Scholar] [CrossRef]
  68. Meng, A.Q.; Diment, L.A.; Abdi, A.; Hubbs, V.J.; Jeffreys, E.A.; O’Dell, M.; Ou, X.; Park, K.A.; Quillin, B.T.; Dickie, D.A. Using data from the Cambridge Structural Database to practice crystallographic skills and revise erroneous structures. Cryst. Growth Des. 2024, 24, 4690–4696. [Google Scholar] [CrossRef]
  69. Thompson, A.J.; Whittaker, J.J.; Brock, A.J.; Baanoon, H.A.; Sankalpa, A.-F.K.; Arachichage, A.; Pfunder, M.C.; McMurtrie, J.C. Is a crystal structure enough? Reflecting on the reliability of SCXRD in a age of automation. Cryst. Growth Des. 2024, 24, 5349–5354. [Google Scholar] [CrossRef]
Figure 1. (a) Distribution of the Coulombic potential energy values in the 10 subsets extracted from the Protein Data Bank. (b) Average distribution, with standard errors in parentheses, of the Coulombic potential energy.
Figure 1. (a) Distribution of the Coulombic potential energy values in the 10 subsets extracted from the Protein Data Bank. (b) Average distribution, with standard errors in parentheses, of the Coulombic potential energy.
Crystals 14 00918 g001
Figure 2. Three examples of anomalous sodium cations: NA 504 A of entry 6TDS [60] (a), NA 202 B of entry 6bgk [61] (b), and NA 405 A of entry 8by5 [62] (c). For each example, an image on the left displays atoms within 3.5 Å of the cation. These atoms, along with their distances from the cation, are listed in the center. This information allows for the calculation of the Coulomb potential (V). On the right, the percentage of sodium cations with a Coulomb potential greater than V is shown for each of the ten subsets of the PDB. The average percentage is provided at the conclusion of each example.
Figure 2. Three examples of anomalous sodium cations: NA 504 A of entry 6TDS [60] (a), NA 202 B of entry 6bgk [61] (b), and NA 405 A of entry 8by5 [62] (c). For each example, an image on the left displays atoms within 3.5 Å of the cation. These atoms, along with their distances from the cation, are listed in the center. This information allows for the calculation of the Coulomb potential (V). On the right, the percentage of sodium cations with a Coulomb potential greater than V is shown for each of the ten subsets of the PDB. The average percentage is provided at the conclusion of each example.
Crystals 14 00918 g002
Figure 3. Two examples of sodium cations: NA 403 A of entry 6MR8 (a) and NA 202 A of entry 6ont (b). For each example, an image on the left displays atoms within 3.5 Å of the cation (in italic when present in symmetry related asymmetric units). These atoms, along with their distances from the cation, are listed in the center. This information allows for the calculation of the Coulomb potential (V). On the right, the percentage of sodium cations with a Coulomb potential greater than V is shown for each of the 10 subsets of the PDB. The average percentage is provided at the conclusion of each example.
Figure 3. Two examples of sodium cations: NA 403 A of entry 6MR8 (a) and NA 202 A of entry 6ont (b). For each example, an image on the left displays atoms within 3.5 Å of the cation (in italic when present in symmetry related asymmetric units). These atoms, along with their distances from the cation, are listed in the center. This information allows for the calculation of the Coulomb potential (V). On the right, the percentage of sodium cations with a Coulomb potential greater than V is shown for each of the 10 subsets of the PDB. The average percentage is provided at the conclusion of each example.
Crystals 14 00918 g003
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Carugo, O. Identifications of False Positives Amongst Sodium(I) Cations in Protein Three-Dimensional Structures—A Validation Approach Extendible to Any Alkali or Alkaline Earth Cation and to Any Monoatomic Anion. Crystals 2024, 14, 918. https://doi.org/10.3390/cryst14110918

AMA Style

Carugo O. Identifications of False Positives Amongst Sodium(I) Cations in Protein Three-Dimensional Structures—A Validation Approach Extendible to Any Alkali or Alkaline Earth Cation and to Any Monoatomic Anion. Crystals. 2024; 14(11):918. https://doi.org/10.3390/cryst14110918

Chicago/Turabian Style

Carugo, Oliviero. 2024. "Identifications of False Positives Amongst Sodium(I) Cations in Protein Three-Dimensional Structures—A Validation Approach Extendible to Any Alkali or Alkaline Earth Cation and to Any Monoatomic Anion" Crystals 14, no. 11: 918. https://doi.org/10.3390/cryst14110918

APA Style

Carugo, O. (2024). Identifications of False Positives Amongst Sodium(I) Cations in Protein Three-Dimensional Structures—A Validation Approach Extendible to Any Alkali or Alkaline Earth Cation and to Any Monoatomic Anion. Crystals, 14(11), 918. https://doi.org/10.3390/cryst14110918

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop