1. Introduction
According to the approach initially developed by Shannon in his theory of communication [
1], the complexity of a message consisting of symbols depends on the probability of occurrence of each symbol in the message. In particular, quantifying information content of a message in bits corresponds as the function:
where
pi is the probability of
i-th symbol to appear in the given message. Any graph with certain types of vertices may be considered as a message, as well. Finite graphs corresponding to the molecules belong to a wide class of so-called chemical graphs, and the approaches of measuring information content for them were introduced in the 1950s by Trucco [
2] and in the early 2000s reviewed by Bonchev [
3]. Commonly, the vertices of a chemical graph
G are referred to as equivalent if they belong to the same orbit of the automorphism group of the graph Aut(
G), which is isomorphic to the maximal symmetry group of the graph. The information measures of molecules and their ensembles was recently reviewed by Sabirov and Shepelevich [
4]. Information content of molecules is of a specific interest due to the studying of chemical reactions [
5], molecular aggregation processes [
6], searching the reason for the first bioorganic molecules to appear [
7], etc.
A crystal structure can also be represented by a finite graph called quotient graph introduced by Chung et al. in the 1980s [
8]. In fact, the quotient graph is a “molecule” of a non-molecular crystal. A quotient graph of the crystal structure maps atoms onto the vertices and chemical bonds onto the edges or loops and reflects the connectivity of the reduced unit cell of the structure. The quotient graph is a useful tool to enumerate nets occurring in crystal structures [
9] and perform a topological analysis of underlying nets [
10,
11]. The cyclomatic number of the quotient graph equals the dimensionality
n of the Euclidean space
En in which the net derived from the quotient graph may be embedded being periodic in the same number of linearly independent directions. In such a space, the deletion of any edge lattice of the net leads to a disconnected net, and the net is referred to as minimal [
12,
13,
14]. For instance, the diamondoid net is minimal in
E3, while the quartz net is minimal in
E4. Embeddings of some typical nets in
E3 were enumerated in Reticular Chemistry Structure Resource (RCSR) [
15], where each net is characterized by the maximal possible symmetry achieved by a barycentric placement of the vertices [
16,
17].
The amount of information stored by the quotient graph of the crystal structure was introduced by Krivovichev [
18,
19] to quantify the information content of the crystal. In this case, the probability
pi from (2) is calculated as
pi =
mi/
v, where
mi is the multiplicity of the
i-th crystallographic orbit occupied by atoms;
v—the number of atoms in the reduced unit cell. Later, Hornfeck [
20] complemented this measure by terms considering the degrees of freedom associated with a translational motion of an atom along the Wyckoff position and Kaußler and Kieslich [
21] adapted this measure to positionally disordered crystals. However, for molecular crystals the information content calculated using this approach indicates the complexity of the molecule itself instead of the crystal structure. The possible scheme of avoiding this problem was proposed in [
22]:
where
N is the number of atoms in the molecule, CN
mol—the molecular coordination number,
Hedge—the information content of the molecule,
Hedge—the information content of the edge net of the molecular net,
HmolNet—the combination of
Hmol and
Hedge with the property of strong additivity [
20,
23]. The value of
HmolNet is meaningful even for high symmetric molecular structures with the only orbit occupied by the atoms (i.e., I
2, S
6, and α-N
2) [
22]. It should be noted that the molecule in the crystal structure is commonly distorted, and the only symmetry operation retained in a molecule (in more than 90% cases) is the inversion center [
24], which requires for the preserving of dense packing according to Kitaigorodskii [
25]. Generally, a more conformational lability of the molecule promotes a more diverse set of contacts in the coordination shell and should result in the increasing of the molecular net complexity. On the other hand, certain molecular fragments have the opportunity to form a specific intermolecular interaction, such as H-bonds, π … π interactions, Hal…Hal, etc. In such a case the small subset of interactions often predominates in the crystal structure and, in fact, is bearing the entire net of the intermolecular contacts. However, the subset of bearing contacts may include excessive interactions and thus be redundant. The portion of the bearing subnet complexity attributable to the target engineered interactions may serve an indicator of effectiveness of a crystal engineering technique, as the latter aims to reproduce targeted bearing contacts.
In this work the formalism (3)–(5) previously discussed in [
26] is tested for the set of more than 4000 homomolecular crystals with the general formula C
xH
yO
z of a structural class
P2
1/
c,
Z = 4(1) (such notation indicates that there is exactly one symmetrically unique molecule occupying a general orbit in the space group
P2
1/
c). This structural class is of the special interest as the most widespread among organic crystals and corresponding to ~1/3 of all homomolecular structures and more than 1/2 of homomolecular racemates [
27]. The aim of this work is to investigate the partitioning of intermolecular contacts from the coordination shell of the molecule into equivalence classes and to obtain the distribution type and the descriptive statistics of
HmolNet.
2. Methods
The initial set of the crystal structures was extracted from Cambridge Structural Database (CSD) [
28] using the following restrains: the presence of atomic coordinates, the absence of errors and/or disorder, and
R-factor < 5%. Out of 4249 high-quality molecular crystal structures [
26] selected from CSD ver. 5.41 (with updates), the set of 4152 structures without duplicates was retained for further investigation. The criteria of considering a structure as a duplicate were the same cell dimensions (with the tolerance of 2σ), the same chemical composition, space group and Wyckoff sequence.
The construction of molecular nets was carried out using the ToposPro program [
29] by calculating the solid angles of the molecular Voronoi-Dirichlet polyhedron (VDP
mol). According to Blatov [
30], VDP
mol is the superposition of atomic VDPs in a molecule, and the solid angle (Ω) corresponding to an intermolecular contact arises from interatomic contacts as:
where Ω
ij is the solid angle for the intermolecular contact
ij, and Ω
Σ—the sum of solid angles for all the interatomic contacts for the given molecule with the adjacent ones. Interatomic contacts with Ω
ij < 1.5% of 4π steradians are omitted. In the same way, intermolecular contacts with Ω < 1.5% in this work have been omitted, while the adjacent molecules with Ω ≥ 1.5% are considered as the coordination shell of the initial molecule (
Figure 1). As a rule, for non-specific van der Waals interactions the descending order of Ω corresponds to the decrease of interaction energy, allowing to avoid energy calculations for the assessment of supramolecular arrangement [
26].
To derive the molecular net, the atoms were pulled to the mass center of the molecule. The molecular coordination number (CN
mol), which includes only symmetrically independent intermolecular contacts, is marked by a prime (CN
mol′), i.e., acrylic acid (ACRLAC04 [
31]) has CN
mol = 12 (cuboctahedron) and CN
mol′ = 8. The subset of bearing contacts generating so-called critical net for a given molecule was defined in [
32]. In a monosystemic crystal structure the center of gravity of each molecule is connected with the centers of gravity of CN
mol adjacent molecules, and VDP faces have the following order of the solid angles: Ω
1 > Ω
2 > Ω
3 > … > Ω
n (symmetrically equivalent contacts have the same Ω). For any value of
n, there is 1 ≤
k ≤
n such that if all edges corresponding to the solid angles Ω
k, Ω
k+1, …, Ω
n are removed from the net, the resulted net becomes disconnected. The value max(
k) is called a “critical coordination number with a prime” (CN
crit′). If all symmetrically equivalent contacts are considered, the corresponding value is called a “critical coordination number without a prime” (CN
mol). For instance, acrylic acid (ACRLAC04) has CN
crit = 5 (square pyramid) and CN
mol′ = 4. To derive a CN
crit, firstly, the edges of the net of intermolecular contacts, for which Ω > 15%, were removed from the net. In all cases, this led to reduction of the net’s dimensionality from 3D to 2D, 1D, or 0D. Then the contacts with Ω = 14.5–15.0% were returned to the adjacency matrix of the centers of gravity of the molecules, and a check was performed to establish the dimensionality of the net again. If the dimensionality was 3D, the returned contacts were referred as Ω
crit = Ω
max(k), and the constructed 3D net was considered a net of bearing contacts. If the dimensionality of the net did not increase to 3D, then the contacts with Ω = 14.0–14.5% were added to the adjacency matrix, and the dimensionality of the net was checked again. This procedure was repeated with the step of 0.5% until Ω
crit was found for all the structures. The less step values are not reliable since the measurement error is about 0.5%. The obtained distribution of the crystal structures of the considered set is close to normal (
Figure 2).
The nets of intermolecular contacts in the most symmetrical embedding in
E3 are classified either in accordance with RCSR [
15] or TopCryst database [
33] (when RCSR classification is lacking). The nets those remain unclassified in RCSR and TopCryst database up to date are characterized by a point symbol. The net for the crystal structure of acrylic acid has the RCSR code
fcu (cubic closest packing), while the net of bearing contacts—
sqp (
Figure 3). For a CN-coordinated net there are CN(CN–1)/2 angles. The shortest cycle in each angle should be identified. The point symbol in the form
Aa.
Bb…
Cc indicates that there are
a angles that are
A-cycles,
b angles that are
B-cycles, etc. (
A <
B < … <
C) [
34]. For instance, the
fcu net has 12∙11/2 = 66 angles in each vertex, and its point symbol is 3
24.4
36.5
6, while the
sqp net has 5∙4/2 = 10 angles in each vertex, and its point symbol is 4
4.6
6.
If there are p sorts of vertices and q sorts of edges in the net, then the net is called p,q-transitive. For instance, the fcu and sqp nets are 1,1 and 1,2-transitive, respectively. In fact, p and q denote the minimal number of orbits occupied by the molecular centers of gravity and the contacts, respectively, and interrelate with the molecular net complexity for its most symmetric embedding in E3.
The complexity of a molecular net was calculated using (3)–(5). The structural information content (SIC = 0–1) [
4] meaning the same as a normalized informational complexity [
19] and was calculated as follows:
where max(
H) is the maximal possible value of
H, when each vertex constitutes its own equivalence class: max(
Hmol) = log
2N; max(
Hedge) = log
2CN
mol; max(
HmolNet) = log
2(2
N + CN
mol).
The molecule of acrylic acid has N = 9 atoms and all of them are symmetrically unique (the Wyckoff set e9 in the space group P21/c), mi = 4, v = 36. Consequently, Hmol = −9∙4/36∙log2(4/36) = 3.170 bits/atom. The edge net of the CNmol-coordinated molecular net is generated by the midpoints added to each edge of the molecular net. Two midpoints are connected if and only if they are adjacent to the same vertex, and the vertices of the initial net are removed. The final net (edge net) is 2(CNmol − 1) = CNedge-connected. The edge net for acrylic acid is 22-connected and contains 8 symmetrically independent vertices with the Wyckoff sequence e4dcba, v = 24, Hedge = −16/24∙log2(16/24) − 4∙2/24∙log2(2/24) = 2.918 bits/contact; H(2N, CNmol) = H(18, 12) = 0.971 bits/d.f. (per a degree of freedom), HmolNet = 4.040 bits/d.f., SICmolNet = 0.823, HmolNet,tot = 4.040∙15 = 60.60 bits/molecule. Note that if just bearing contacts are included in the net, then the edge net would be 8-connected and contain only 4 of 8 independent vertices with the Wyckoff sequence edca, v = 10, Hedge = −4/10∙log2(4/10) − 3∙2/10∙log2(2/10) = 1.922 bits/contact. This net is characterized by the unknown topological type.
The discriminatory power of
H, based on the probability of two unrelated objects being characterized as the same type, was calculated according to the following equation [
35]:
where
N is the number of the tested crystal structures,
s the number of different types of structures with respect to
H, and
xj the number of objects belonging to the
j-th type. The correlations between calculated values were sought in the Mathematica software ver. 11.0 [
36].
3. Results and Discussion
Crystal structures of the analyzed set are distributed over CN
mol, generally, in accordance with the earlier results obtained by Carugo et al. [
37]. More than a half of the crystal structures have CN
mol = 14, and the second ranked value CN
mol = 16. The most frequent values of CN
crit are 5, 4, and 6, but unlike CN
mol there is no sharp peak on any of the values (
Figure 4). More than a half of the structures is characterized by CN
mol′ = 9 (with
e5—2355 structures; with
e6—84 structures; with
e4—13 structures;
e7ba—1 structure with refcode HINSOM [
38]), and most abundant CN
crit′ is its least value 3 (
eba,
ecb,
e—856 structures;
e2a,
e2b,
e2—737 structures;
e3—81 structures).
In the structural class
P2
1/
c,
Z = 4(1) each molecule can form contacts with a multiplicity 1 or 2. The former corresponds to a so-called involution, a symmetry element of the order 2 (the midpoint of a contact occupies the Wyckoff position
e). The only involution presence in the space group
P2
1/
c is the inversion center
(the midpoint of a contact occupies the Wyckoff position
a,
b,
c or
d). All other contacts are formed via a screw axis 2
1, or a glide plane
c, or a translation along some direction. It is easy to show that the average multiplicity in between 1 and 2 equals to
v/2CN′. According to two-sample
t-test, the difference of the mean values for all contacts and for those bearing the net is statistically significant (
p-value < 0.001). Moreover, the minimal multiplicity is 1.375 (3 structures) for the hole net of molecular contacts unlike 1.000 (1 structure with refcode KOLRAF [
39]) for the critical subnet (
Table 1). That is why the subnet of bearing contacts is, in average, more enriched by the inversion centers than the hole molecular net. As shown above, 2355 structures have the Wyckoff sequence
e5dcba (or similar) for the edge net, and mean multiplicity in this case is (5∙2 + 4)/9 = 1.556. Motherwell [
40] previously studied the projection patterns formed by projecting coordination shell of a molecule into 2D in different space groups with none of the special positions occupied. The majority of projection patterns in
P2
1/
c contained at least one contact via an inversion center.
The distribution of molecular nets in the considered series over the topological types is, generally, in accordance with the trend previously found by Carugo et al. for 105 549 packings of small molecules [
37]. The most widespread topological type is
bcu-x, a type derived from the body-centered cubic lattice where the coordination shell of the atom is extended by the second coordination shell (CN
mol = 8 + 6). This topological type has the least topological density TD
10 that reflects the total number of vertices in the first 10 of coordination shells, among all 14-coordinated nets reported for centrosymmetric [
41] and non-centrosymmetric [
32] crystalline hydrocarbons, some inorganic molecular crystals [
22] (i.e.,
14T191 in the orthorhombic sulfur, α-S
8), and those with the most popularity amongst all small molecular crystals [
37]. Recently studied crystal structure of 2-(tert-butyl)-4-chloro-6-phenyl-1,3,5-triazine with 2 symmetrically independent molecules [
42] is characterized by the
14T319 type topology (after neglecting contacts with Ω ≤ 2%), which occupies the opposite side of 14-coordinated molecular nets with respect to TD
10 (
Table 2). The more 2nd or 3rd CN does not mean the more 4th and 5th CNs. For instance, in the 2nd coordination sphere there are 54 vertices in
14T134 topological type and 53 in
14T10; nevertheless, TD
10 for
14T10 is slightly higher. Remark that the TopCryst database has been extended last years by many new topological types with large CNs, including CN = 14. Thus, the previously found in 2019 a
14T134 topological type [
32] in the crystal structure of spiropentane (refcode VAJGOC [
43]) has no reference code in the TopCryst database. The corresponding molecular net in the most symmetric embedding in
E3 is 1,6-transitive and has the space group
Rc with the only general position occupied by centers of gravity of molecules.
Consider three typical examples of molecular nets realized in α-methyl-
trans-cinnamic acid (refcode: BEJVOB [
45]), 5-methoxyindan-1-one (refcode: KACSOX01 [
46]), which are both isomers with the chemical formula C
10H
10O
2 (
Figure 5), and (1
RS,3
SR,4
SR)-trispiro(2.0.0.2.1.1)nonane-1-carboxylic acid with the chemical formula C
10H
12O
2 (refcode: FAFDEW [
47]). The Wyckoff sequences for the molecules are:
e22 for BEJVOB and KACSOX01, and
e24 for FAFDEW. This leads to a slightly different values of
Hmol:
Hmol = 4.459 bits/atom for BEJVOB and KACSOX01, and
Hmol = 4.585 bits/atom for FAFDEW. All other structures from the set of 4152 structural files show exactly the same distribution of atoms over general positions, i.e., they have the maximal
Hmol for the given
N (SIC = 1). Indeed, if a molecular center of gravity occupies a general position, then no atom is able to occupy an inversion center, otherwise the other atoms should be related by the inversion center and the molecule would either occupy the special position or have a symmetry-induced disorder (the latter was restricted by the structure selection). The linear correlation coefficient between
N and the molecular mass in the analyzed set is 0.936, between
N and
Hmol − 0.959, and between the molecular mass and
Hmol − 0.889. Consequently, there is a strong positive linear correlation between the molecular mass,
N, and
Hmol.
There are three crystal structures with CN
mol = 14, but characterized by the different topological types (
Table 3):
bcu-x,
gpu-x, and
tcg-x. Furthermore, all the structures have CN
mol′ = 9 and the same Wyckoff sequence for the midpoints of intermolecular contacts (
e5dcba). This means the same
Hedge = 3.093 as for the other 2352 structures of the Wyckoff sequence which contains
e5, including
e5dcba (2324 structures).
The critical nets for BEJVOB, KACSOX01, and FAFDEW are of different topological type. It was shown in [
26] that the value Δ = CN
crit′ − minCN
crit′ adopts almost normal discrete distribution, where 92% of structures demonstrate Δ ≤ 2 (for the set of crystalline hydrocarbons this portion was even more 95% [
48]). In the space group
P2
1/
c there are 3 generators in a minimal generating subset [
49]. If a molecule occupies some special position of the space group with a site-symmetry group containing a generating element of the space group (
in
P2
1/
c), then a fewer number of intermolecular contacts along the other symmetry elements could be sufficient for generating of a molecular net. However, for the structural class
P2
1/
c,
Z = 4(1) the value minCN
crit′ = 3. For KACSOX01 and FAFDEW the critical molecular nets are parsimonic (CN
crit′ = 3), while for BEJVOB the net is not parsimonic (CN
crit′ = 5). The last one contains two redundant contacts via the inversion centers. Any pair of two inversion centers separated by a translation generate this translation; however, if it is accompanied by a contact with the multiplicity 2 along the same direction, the pair of inversion centers becomes redundant. Conversely, a sole contact with the multiplicity 2 in the critical net cannot be redundant because a triplet of inversion centers would never generate a 3D-space group instead of a plane group with the triplet belonging to the plane. As a result of the redundancy the critical net in BEJVOB is more complex than in KACSOX01 and FAFDEW (
Hedge,crit = 2.252, 1.500 and 1.522 bits/atoms, respectively), i.e., about a half of the molecular net information content for KACSOX01 and FAFDEW, and more than 2/3 of that for BEJVOB. The nets are shown in
Figure 6.
The topological types of the molecular and critical nets, which are subnets of the former, are shown in
Figure 7. Surprisingly, for BEJVOB the prototype molecular net
bcu-x has 2 kinds of edges, while the prototype critical net
sxa has three kinds of edges because some Wyckoff positions are split when the symmetry group descends from
(
bcu-x) to
Cmme (
sxa). The group
Cmme has five elements in a minimal generating set [
49], and there are
Z = 4 (
mm2) equivalent vertices in
sxa. As the point group
mm2 has two generators, the vertex configuration of
sxa can be generated by 5 − 2 = 3 “contacts” of the vertices, therefore, the net
sxa could be realized even for CN
crit′ = 3. On the contrary, another similar 6-coordinated net
sxb of the strucutral class
Cccm,
Z = 4(2/
m) could not be realized in any space group at CN
crit′ = minCN
crit′, since
Cccm is generated by just a pair of elements. Recently,
sxb was found in a metal-organic framework (MOF) [Mg
3(btdc)
3(dmf)
4] [
50], which was synthesized by a topotactic reaction from [Mg
3(btdc)
3(dmf)
4]∙DMF of the
pcu type upon heating, thus, the former MOF is not parsimonic in principle.
The set of the combinatorically distinctive critical nets depends on the topology of the initial molecular nets. For
bcu-x,
gpu-x, and
tcg-x in the three above mentioned crystal structures, all subsets of edges, which may correspond to a CN
crit′ = minCN
crit′ = 3, were enumerated (
Table 4). As all the initial nets have edges with the Wyckoff sequence
e5dcba, there are 4 involutions and 5 contacts with the multiplicity 2. In BEJVOB (
bcu-x), KACSOX01 (
gpu-x) and FAFDEW (
tcg-x) there are four contacts along the pair of screw axes 2
1, four contacts along the pair of glide
c-planes, and two contacts along the translation vector, but their combination with the four involutions in different topological types is different. This leads to a different number and types of the critical subnets.
Apparently, the complexity of the partition of subnets into the combinatorically distinctive Wyckoff sequences (
e,
e2, and
e3), as well as into the topological types (
dia,
cds,
dmp, etc.), can be easily measured in terms of (1) and (2), but this is out of the topic of this work. In fact, the coordination shell of a molecule may be referred as fuzzy [
51], because upon the crystallization different subsets of bearing contacts arise simultaneously. In summary, the subnets of
gpu-x are obviously more diverse and include such exotic topological types as 4-coordinated
4T19 (2 subnets) and 5-coordinated
5T12 (2 subnets). Meanwhile, the leading topological type of the subnet in all cases is the diamondoid type
dia. In the structural class
P2
1/
c,
Z = 4(1)
dia, as any other 4-coordinated subnet, is formed by two involutions and two contacts with the multiplicity 2. The formation of
dia is limited by two combinatorically different options [
48]. In the first one, the generators are the glide
c-plane and the inversion centers located at a distance of
b/4 from each other along
Y. The second option entails the screw axis 2
1 located at the distance of
c/4 from one of the inversion centers (
Figure 8, top). If one of the inversion centers in the first
dia subtype is shifted by
b/2, the subnet transforms into the
cds type, for the second option it transforms into
dmp. The
bnn subnet, as
dia, exists in two different subtypes (
Figure 8, bottom). Each subtype has the only contact via the inversion center and 2 contacts along the translation
a. The only difference is the last contact with the multiplicity 2, either along the glide
c-plane or along the screw axis 2
1. However, if the contacts along
a are replaced by the contacts along the 2
1 axis located at a distance of (
a/2 +
c/4) from the initial inversion center, then the
bnn subtypes are transformed into
nov and
sqp, respectively. Finally, the
pcu subnet of each initial net is generated by three contacts with the multiplicity 2.
Of course, among the critical nets in
P2
1/
c,
Z = 4(1) there are those having CN
crit′ > minCN
crit′, for instance,
noz (5-coordinated),
acs,
bsn,
sxd (6-coordinated), the net of simple hexagonal packing
hex (8-coordinated), the body-centered cubic net with unextended coordination shell
bcu (8-coordinated), etc. Nevertheless, these topological types may be represented as an extension of some of the 5 minimal nets in
E3 without collisions and with equal vertex degrees (CNs):
dia,
cds,
ths,
pcu, and
srs [
13]. The last minimal net of this kind, the 3-coordinated
srs, was not observed in any crystal structure for the bearing contacts so far. Similarly, the 3-coordinated net
ths was not observed in
P2
1/
c,
Z = 4(1), but it is possible in some other monoclinic structural classes such as
C2/
c,
Z = 8(1). Up to date, in the TopCryst database it was exemplified not by a molecular crystal, but by a MOF of the crystal structure with the refcode RAGFAJ [
52]. The quotient graph of any critical net, including a redundant one, may be derived by an addition of an edge to the undirected quotient graph of some minimal net (
Figure 9). Remark that Δ = CN
crit′ − minCN
crit′ = 0 does not necessarily corresponds to a minimal net, because the deletion of an edge lattice and the deletion of the symmetrically equivalent edges are not exactly the same processes. The deletion of all equivalent edges implies the deletion of translationally equivalent edges, but the converse is not true. As a result, a series of not minimal nets such as
bnn,
sqp,
dmp,
nov (
Figure 8) also corresponds to Δ = 0.
The contribution of
Hedge,crit into
Hedge varies from 33.9 to 100% (
Table 5). Indeed, there are 4 crystal structures with CN
crit = CN
mol and
Hedge,crit =
Hedge, these are extremal cases with the most redundant critical net. The contribution of
Hmol,
Hedge and
H(2
N, CN
mol) into
HmolNet, is, on the average, is 78.9, 9.5 and 11.6%, respectively, with the value of σ being a few percent, i.e., the complexity of the molecular net is substantially defined by the value of
Hmol, but the impact of
Hedge and
H(2
N, CN
mol) is meaningful. The differences of min and max values for the contributions of
Hedge and
H(2
N, CN
mol) into
HmolNet are much more than σ, that means the outliers being not numerous. The values of SIC, calculated using (7), also show different variances. As it was mentioned above, since there are no atoms in a special position, SIC
mol = 1 for all the structures. As the maximal multiplicity of a contact is 2, theoretically, the minimal SIC
edge, crit = − CN
crit/2∙2/CN
crit∙log
2(2/CN
crit)/log
2CN
crit = 1 − 1/log
2CN
crit. All the structures with average multiplicity 2 in the set have CN
crit = 6, consequently, the minimal SIC
edge,crit = 1 − 1/log
26 = 0.613. The maximal SIC
edge, crit = 1 corresponds to the average multiplicity 1 in the structure of (6-methoxycarbonylmethoxynaphthalen-1-yloxy)acetic acid methyl ester (refcode: KOLRAF) [
39] with the Wyckoff sequence of edges
dcba and the critical net
dia. The values of SIC
edge and SIC
molNet have much smaller σ than SIC
edge,crit.
The distribution of the crystal structures by
Hmol and
HmolNet is shown in
Figure 10. Both values are best approximated by a logistic distribution applicable to the modeling of the degrees of pneumoconiosis in coal miners, chronic obstructive respiratory disease prevalence on smoking, survival time of diagnosed leukemia patients, etc. [
53]. Generally, it has the probability density function:
For Hmol μ ≈ 5.252, β ≈ 0.30; for HmolNet μ ≈ 5.572, β ≈ 0.25. Thus, the difference of the expected values μ is about 0.320 bits/d.f., and the variance for Hmol is greater than for HmolNet.
The discriminatory powers
D for
Hmol,
Hedge,crit,
Hedge,
H(2
N, CN
mol), and
HmolNet are listed in
Table 6. The simple combinatorial complexity
Hmol distinguishes only 99 values, whereas
HmolNet—531 values. Surprisingly,
Hedge has the least
D = 0.6372 and distinguishes only 26 values, while
Hedge,crit has even greater
D = 0.8762 and
s = 28. The reason of such substantial difference of
D at a small difference of
s is the abnormality of distribution. As shown above, 2355 structures have
e5dcba or similar Wyckoff sequence for the edge net (in this case
Hedge = 1.556 bits/contact). Meanwhile, the most widespread Wyckoff sequence for the critical net is
eba (or similar)—856 structures (
Hedge,crit = 1.500 bits/contact), i.e., with about three times less probability. The
H(2
N; CN
mol) has a remarkably high value of
s and
D in comparison with
Hmol, because CN
mol may vary at equal
N, i.e., at equal
Hmol.