Here, a step-by-step description is given of how models that resemble fibers in bones can be devised.
First, starting from the sequence of amino acids and an available fibrillar structure, the CLG Fibril model was devised through homology modeling. Then, a structure of the CLG fibril that requires Periodic Boundary Conditions (PBCs) only along the z direction was extracted and labeled CLG NanoFiber. When the latter is replicated along the x and y directions, surrounded by an EFV, and subjected to PBCs in the x, y, and z directions, the newly devised model is labeled CLG Fiber. Finally, adding
and HA both in the EFV and IFV of the CLG Fiber gives origin to the Bone Fiber model. See
Figure 2 for a schematic view of this modeling process, described in detail throughout this section.
2.1.1. CLG Fibril
Different from most proteins, CLG is not found isolated and fully solvated in bones, and it does not completely fold to perform a specific function. It is the association of CLGs under physiological conditions into fibrils and, consequently, fibers, which confer CLG-based tissues with their remarkable macroscale mechanical properties, such as high tensile strength. Thus, it is crucial to reproduce the fibrillar and fiber structure in MD simulations when studying the CLG mechanical properties.
Definition 1 (Physiological Conditions). In biochemistry, reactions are usually studied under physiological conditions, that is, an electrically neutral aqueous solution at 1 atm pressure, ~37 °C temperature, 0.16 mol/L salt concentration ( and ions), ”enantiomer specific”, and a specific pH.
To date, only the amino acid sequence, i.e., the primary protein structure, of human type I CLG has been fully determined. This can be found at the Universal Protein Resource (UniProt) website [
34] under the codes COL1A1_human (P02452) and COL1A2_human (P08123) for the alpha-1 and alpha-2 chains, respectively. However, to perform MD simulations, the spatial position of every atom, i.e., at least the tertiary protein structure, is required. Several high-resolution structures such as 1WZB [
35], which periodically reproduces a common amino acid pattern of the CLG, can be found in the Protein Data Bank (PDB) [
36] and can be used as approximations of the type I CLG human structure. However, as mentioned before, it is crucial to reproduce the fibrillar and fiber structure, i.e., the quaternary protein structure, when studying the CLG mechanical properties.
Definition 2 (High-Resolution and Low-Resolution Protein Structures). Low-resolution structures usually contain only the position of the alpha carbons (CA). All other atomic positions, e.g., side-chain atoms, must be inferred. High-resolution structures usually contain the position of every non-hydrogen atom.
Unfortunately, there is no experimentally determined molecular structure of the quaternary protein structure of the human type I CLG available in the PDB. An alternative for modeling the human type I CLG structure is homology modeling.
Definition 3 (Homology Modeling).
Also labeled comparative modeling of protein 3D structures, homology modeling is a procedure that produces a previously unknown 3D protein structure by associating an amino acid sequence (labeled the target) with a known experimentally determined 3D atomic-resolution structure (labeled the template) of a homologous sequence. Two amino acid sequences are considered homologous when they are very similar, e.g., they display a high sequence identity value, meaning that they share a common evolutionary ancestry. Homologous sequences display similar structures and, frequently, similar functions [37]. The PDB structure 3HR2 [
6], an experimentally determined low-resolution crystal structure for type I CLG of rat tail tendons, is, to our knowledge, the only structure available in the PDB that encompasses the fibrillar structure of type I CLG. It reproduces the fibrillar structure as a crystal, with a unit cell (UC) that is periodically replicated along the x, y, and z directions.
When aligned, the type I CLG amino acid sequences of the human—Uniprot P02452 and P08123—and rat—PDB 3HR2—exhibit sequence identity above 90%, indicating that they are highly homologous. Hence, they are appropriate for comparative structural modeling. If the 3HR2 structure were a high-resolution structure, it could be directly used for the MD simulations proposed here. However, since it contains only the positions of the CA atoms of the amino acids, the position of the non-CA atoms must be inferred. Homology modeling allows the inference of the positions of the non-CA atoms.
MODELLER 9.25 [
38] was used to build a homology model that correlates the human amino acids sequences—Uniprot P02452 and P08123—with the rat fibrillar CLG structure—PDB 3HR2. In
Appendix A, the necessary steps to build this model are described. All the necessary files and scripts for its reproduction together with further details are also provided in the
Supplementary Materials.
When compressed in the crystal-like triclinic UC determined by Ref. [
6] (
a = 39.970 Å;
b = 26.950 Å;
c = 677.900 Å;
= 89.24
;
= 94.59
;
= 105.58
; see
Figure 3), and periodically replicated in space through periodic boundary conditions (PBCs) (see
Figure 4), the built homology model reproduces the type I CLG fibrillar structure experimentally determined by Ref. [
6]. This new model is labeled CLG Fibril throughout this paper. It can be devised by performing three steps:
Importing the homology model, built as described in
Appendix A, into VMD [
39,
40] (
http://www.ks.uiuc.edu/Research/vmd/, accessed on 30 January 2022). The H atoms can be kept or not. The models built here did not keep the H atoms, since they can be added later using the VMD software when generating a PSF file, as described in
Appendix C;
Setting triclinic UC dimensions using the “pbc set {39.970 26.950 677.900 89.24 94.59 105.58}” command of the VMD PBCTools Plugin in the VMD TkConsole;
Wrapping all atoms into the defined UC using the ”pbc wrap” command of the VMD PBC Tools Plugin in the VMD TkConsole.
Models such as the CLG Fibril, which combine the human amino acids sequences with the rat fibrillar CLG structure, have been previously reported; see Refs. [
8,
9,
10,
11,
12,
13,
17,
41,
42]. Ref. [
11], followed by Refs. [
12,
13,
14,
15,
17], also performed homology modeling using
MODELLER and provided a structural framework used in this work.
Highlight 1 (Devising more realistic models).
As described in Section 2.1.2 and Section 2.1.3, the CLG Fibril model was improved into Bone Fiber, which is a better representation of the experimentally determined nanostructure of bone presented in Refs. [18,20,22]. Remark 2 (D-period).
The CLG Fibril, which is derived from the 3HR2 PDB from Ref. [6], exhibits the D-period of the CLG structure along the direction of its principal axis (z) [8], Figure 1. This means that at least one gap and one overlap zone are present in the CLG Fibril’s UC, and consequently in the CLG Fiber and Bone Fiber models described next. 2.1.2. CLG Fiber
As previously mentioned, the deposition of HA in the IFV yields the mCLGf. However, as shown in Refs. [
18,
19,
20,
22,
23], it is important to emphasize that most of the HA is found not in the IFV, but between and around fibrils, in the EFV. The results of Refs. [
18,
19] corroborate estimations exhibited in Ref. [
21]; for cortical bone, about 70–80% of the HA content is situated in the EFV in a plate-like shape.
To the best of our knowledge, there are, as of yet, no available studies reporting MD simulations of the bone structure while taking into consideration the HA content in the EFV. There are probably two main reasons for this:
- (a)
The 3HR2 structure (and others derived from it, such as the presented CLG Fibril model) does not directly allow the deposition of HA in the EFV, but only within the fibril. That is because the UC defined by [
6] possesses CLG covalent bonds that require PBCs in all directions. There is no room left for molecules in the EFV, and if the UC is expanded along the x and y directions to make space for such molecules, these would block the path of the CLG covalent bonds that require PBCs in the radial directions (x and y);
- (b)
Including HA in the EFV means devising a very large system (much larger than the UC of the 3HR2 structure), which implies computationally more expensive simulations.
Refs. [
12,
14,
15], for example, do include HA in their models, but only in the IFV; i.e., the mCLGf is modeled by inserting HA crystals to the UC of a homology model similar to the CLG Fibril described here.
Open Issue 1 (Coarse-Grained Models).
An alternative to simulate the CLG fiber structure without requiring a prohibitively large number of atoms is to use coarse-grained models where an entire group (typically from three up to five atoms) is treated as a single interacting entity [7,8,43,44]. Reference [43] presents a coarse-grained model of CLG molecules (including the non-standard amino acid HYP) using an extended version of the MARTINI force field [45]. Coarse-grained models combining CLG, , and HA are still an open field of research. The first step to create a model that resembles the structure of the fibers present in bone is to extract from the CLG Fibril a structure that requires no PBCs along the x and y direction, labeled here as CLG NanoFiber, as shown in
Figure 2. After that, the desired model is obtained by replicating the latter along the x and y directions and inserting it into an EFV, i.e., a volume large enough to contain extra-fibrillar
and HA, the boundaries of which are subjected to PBCs. In
Appendix B, a description is given on how to devise this structure, labeled as CLG Fiber.
2.1.3. Bone Fiber
When
and HA molecules are added to the CLG Fiber model described in
Section 2.1.2, which contains an EFV, the newly devised model is labeled Bone Fiber.
Remark 3 (CLG Fiber vs. Bone Fiber models). A bundle of axially aligned CLGfs and mCLGfs surrounded by and HA characterizes bone at the nanoscale. The literature usually refers to this bundle as a CLG fiber. Throughout this paper, to avoid misunderstanding and to facilitate the understanding of the modeling process, a CLG Fiber model refers to a bundle of CLGfs (without and HA). A Bone Fiber model refers to a bundle of hydrated mCLGf surrounded by and HA in the EFV. Thus, here a Bone Fiber model refers to the CLG Fiber model plus and HA, i.e., a composite material composed of fibers (CLG, , and HA) and a matrix ( and HA).
The mechanical properties of bones at the nanoscale are affected by the relative fractions of their constituents. All models presented here consider bone to be constituted of CLG, HA, and H
O only; i.e., they consider the whole organic phase to be CLG and the whole inorganic phase to be HA. Four models were devised, each with a specific percentage of mass based on the reference values in
Section 1.2 [
22,
30,
31,
32], as shown in
Table 1.
Open Issue 2 (Even More Realistic Bone Models).
The presented models consider bone to be constituted of CLG, HA, and only. However, about 10% of the bone organic phase exhibits an association of other collagen types (III and VI), and non-collagenous proteins (NCPs) [2]. Furthermore, parts of the mineral phase may exhibit some deficiencies in hydroxyl, and also substitutes for hydroxyl which leads to the formation of other types of minerals, not only what is commonly labeled hydroxyapatite [4,46]. Both these variations may not represent a large fraction of the total organic and mineral phase and they are not simple to model, but they might affect the computed mechanical properties. Recently, Ref. [47] reported the implications of extra-fibrillar NCPs on the bone mechanical properties. Packmol, a package distributed as free software for building initial configurations for MD simulations [
48], was used to add HA and
molecules to the CLG Fiber model, obeying the percentages of mass shown in
Table 1. Note that the devised Bone Fiber models were labeled based on their HA concentration. Details on how to devise these models, and on how to compute the number of molecules of each constituent to be added to the simulation box are provided in
Appendix C and
Appendix D.
In all devised models, the total number of HA molecules was added to the simulation box such that 80% belong to the EFV, and only 20% to the IFV, as Refs. [
18,
19,
20,
21,
22,
23] point out. Packmol allows the creation of different geometries, including parallelepiped, sphere, cylinder, and other geometric shapes within which the new molecules will be inserted. The IFV was defined as a parallelepiped region within the larger simulation box, where CLG fibrils are mostly inside.
The devised IFV displays the x, y, and z dimensions Å, and the simulation box dimensions Å. This indicates that the length of the simulated fibers is 679 Å.
Note that 20% of the HA molecules were added into the IFV box, and the remaining 80% were outside the IFV, but inside the simulation box. The EFV is defined as the volume of the simulation box subtracted from the volume of the IFV box.
Open Issue 3 (EFV vs. IFV).
By visually identifying the volume mostly occupied by the CLG fibrils, two different boxes were created that define the IFV and EFV. However, there may be more accurate ways to define the IFV and EFV for MD simulations. This paper presents a realistic model of a bone fiber (not fibril), i.e., the first model to reproduce fibrils and to insert HA molecules both in the IFV and in the EFV. However, modeling both the IFV and EFV can be considered an open issue.
All the devised models display a salt concentration of 0.16 mol/L. This was assured by adding a total of 132 chloride ions and 0 sodium ions to the models (these 132 atoms are already included in the number of atoms shown in
Table 1).
Figure 7 and
Figure 8 show a devised Bone Fiber model, i.e., mCLGfs immersed in water and surrounded by HA inside the EFV. HA molecules from the INTERFACE force field (IFF) [
49] database were used.
Appendix E provides more detail about the used HA PDB file.
Notice that Ref. [
42],
Figure 1c, and Ref. [
50],
Figure 1d also extracted a CLG NanoFiber from the 3HR2 PDB structure provided by Ref. [
6]; see
Appendix B, Step 2. However, they do not further develop the model into a bone fiber structure, i.e., into a model such as the presented CLG Fiber or Bone Fiber.