1. Introduction
Consider an undirected simple connected graph comprising precisely n nodes and m edges. The set of nodes adjacent to a node in G is denoted as . The degree of a node , denoted as , represents the count of edges that are incident to the node . The maximum degree in graph G is represented as .
Chemical graph theory [
1] is a specialized area of computational chemistry [
2] that applies graph-theoretical concepts to model the structures of molecules. In this field, atoms are represented as nodes (vertices), and chemical bonds are depicted as edges connecting these nodes. This representation allows chemists to analyze and predict a variety of molecular properties through numerical descriptors known as topological indices (TIs) such as the Wiener index, Randic index, Balaban index and so on. These indices capture crucial information about the molecule’s connectivity, pathway lengths, and substructures, providing valuable insights into its overall properties and potential behavior.
Quantitative Structure–Activity Relationship (QSAR) and Quantitative Structure–Property Relationship (QSPR) models use these graph-derived descriptors as input features to predict specific chemical or biological properties. These models apply statistical and machine learning approaches to establish a relationship between the molecular structure and its properties, such as melting points, boiling points, solubility, and toxicity. The use of machine learning algorithms allows complex patterns to be identified in large datasets, improving the accuracy of predictions and aiding in the analysis of new compounds.
The integration of chemical graph theory with QSPR involves constructing molecular graphs and calculating relevant descriptors that will be used as features in predictive models. This process includes data preparation, model training, validation and evaluation, which are all crucial steps to ensure the robustness and generalizability of the model. Once validated, these QSPR models can be applied to predict properties for new, untested molecules, facilitating virtual screening and aiding in the design of new compounds with desired properties. The combined use of chemical graph theory and QSPR enables researchers to streamline the development of new substances and optimize existing ones, offering a powerful tool in modern chemistry and pharmaceutical research. During the 1970s, many degree-based topological indices were proposed as part of chemical graph theory to better understand the structure–activity relationships of molecules. These indices use the degree of each node in a molecular graph to derive numerical values that help predict the chemical and physical behavior of molecules. In [
3], the author provides a summary of the key properties of degree-based topological indices and conducts a critical comparative analysis of these indices. This work aims to highlight their distinct characteristics and how they compare in terms of their application and effectiveness in modeling molecular properties.
The Wiener index, introduced by Herman Wiener in 1947 [
4], is recognized as the first TI. The index, which measures the sum of shortest path distances between every pair of atoms in a graph representation of molecules, was shown to predict boiling points of alkanes (paraffins). This work established a predictive tool for characterizing and distinguishing chemical properties based on molecular topology, setting a new groundwork in theoretical and computational chemistry. The Randic index, introduced by M. Randic in 1975 [
5], is considered one of the first degree-based topological indices. The index measures the product of the degrees of connected atoms; it has been influential in studying the correlation between molecular structure and chemical properties such as boiling points and stability. It is gaining attention both theoretically and practically. Researchers have made numerous modifications to the Randic connectivity index to improve its performance.
Albalahi et al. (2023) [
6] introduced and examined the Harmonic–Arithmetic (HA) index, a new graph-theoretical descriptor derived from harmonic and arithmetic means of node degrees, to offer a unique perspective on molecular structure. The authors investigated its properties and established its bounds and applications in studying the structural attributes of molecular systems. The HA index determines its enhanced insights into molecular structures and physicochemical properties, with suggestions for continued research on generalization, enhanced bounds and extending applications.
It is defined as
where
and
are degrees of nodes
and
in
G.
In [
7], the authors present new theoretical results on the HA index that establish bounds and relationships between the HA index and other well-known topological indices, contributing to the understanding of its behavior and applications in analyzing molecular structures. Refs. [
7,
8,
9] motivated us to provide some new results using the HA index.
The primary objective of this work is to derive a boundary for the HA index of trees based on their number of nodes and maximum degree. In addition, comparative analysis with results from [
10] reveals that the H index demonstrated highly significant performance in their study. Similarly, in this paper, the HA index shows highly significant potential, as it achieves the same value as the H index while demonstrating enhanced predictive accuracy. The use of curvilinear models, such as quadratic and cubic regressions, further underscores the HA index’s efficacy in minimizing the root mean squared error (RMSE) and achieving superior predictive results. This highlights the HA index’s highly significant role in advancing QSPR methodologies, particularly for analyzing Parkinson’s disease [
11] related antibiotics and supporting drug development efforts.
2. Trees with Bounds on the Harmonic–Arithmetic Index
The following are the terminologies and notations used in this section. In a rooted tree, one particular node is designated as the root. A node with a degree of one is known as a pendant node. A node that is next to a leaf is called a support node. If a node is next to at least two leaves, it is known as a strong support node.
A starlike tree is characterized by having a single node with a degree of at least three, known as the center. In a starlike tree, a leg is defined as the path from the center node to a leaf. When all the legs in a starlike tree are of length one, it is referred to as a star.
Let T represent a rooted tree at node f where . Additionally, denotes the collection of all starlike trees with a maximal degree and n nodes.
Lemma 1. If a node in with the exception of f has a degree greater than two, then there is such that .
Proof. Let be a root node which contains the maximum degree in T with . Assume that with and , such that lies between the nodes f and g with the maximum possible distance.
Case 1: g is a strong support node of T.
Let
and
be two pendant nodes of
T. Assume that
is obtained by joining
to the edge
(see
Figure 1).
Figure 1.
In Case 1, the tree transformation occurs when g is an end-support node.
Figure 1.
In Case 1, the tree transformation occurs when g is an end-support node.
Case 2: g is a support node of T.
Consider
to be a leaf, and
is a path in a tree
T where
(
). Let
be a tree obtained from
T by removing the edge
and adding the path
(refer to
Figure 2).
Figure 2.
In Case 2, the tree transformation occurs when g is support node.
Figure 2.
In Case 2, the tree transformation occurs when g is support node.
Case 3: g is not a support node.
Consider two paths
and
in a tree
T and
. Let
be the tree obtained from
T by removing the edge
and adding the path
(refer to
Figure 3).
Figure 3.
In Case 3, the tree transformation occurs when g is not support node.
Figure 3.
In Case 3, the tree transformation occurs when g is not support node.
The proof that is concluded by the aforementioned three cases. □
Lemma 2. Let be a starlike tree with and be the maximum degree of a node. If there are at least two legs, each with a length greater than one, in T, then there is , such that .
Proof. Let
T be a starlike tree with a root node of
and
. Assume that
and
,
are the paths of two legs such that
and
. Let
be obtained by joining
to the edge
(refer to
Figure 4).
Figure 4.
Illustration of transformation.
Figure 4.
Illustration of transformation.
Finally,
Thus,
completes the proof. □
3. Main Results
Theorem 1. If of order , Moreover, the lower bound is attained if and only if a node has a degree greater than two except ∆ and the equality is attained if and only if a starlike tree with a maximum of one leg with a length exceeding one is denoted as T.
Proof. Assume that if
then
is a path graph and
as required.
Let
be a tree with
for
. By the choice of
T from Lemma 1,
T is a starlike tree with centre
f. Now, assume that
T has only one leg with a length exceeding one; then,
Hence, the proof is complete. □
Theorem 2. Consider of order ,Moreover, the upper bound is attained if and only if a node has a degree greater than one at least for two legs and the equality is attained if and only if a starlike tree with a maximum of one leg with a length exceeding one is denoted as T. Proof. Assume that if
then
T is a path graph and Equation (
1) holds. Let
be a tree with
for
. By the choice of
from Lemma 2,
is denoted as a starlike tree with centre f. Now, assume that
has exactly only one leg with a length exceeding one; then, Equation (
2) holds.
Hence, the proof is complete. □
Observation: For any tree
of order
,
and the equality is attained if and only if
T is a star.
4. Application: Chemical Connection of HA Index
The QSPR analysis of the Harmonic–Arithmetic (HA) topological index can be applied to a variety of drugs and demonstrates the physical and chemical properties that are highly correlated with the topological index of anti-Parkinson’s drugs. Also, correlation coefficients with the Harmonic index in [
10] and HA index are compared.
This section concentrates on the computation of a Harmonic–Arithmetic (HA) topological index for 18 Parkinson’s disease antibiotics, namely Opicapone, Rivastigmine, Rotigotine, Istradefylline, Safinamide, Trihexyphenidyl, Tolcapone, Ropinirole, Procyclidine, Pramipexole, Piribedil, Pergolide, Levodopa, Entacapone, Carbidopa, Biperiden, Apomorphine and Amantadine. The QSPR analysis of topological indices is examined using a curvilinear regression model.
The molecular structures (see
Figure 5) and physicochemical property data along with the HA index values of the listed drugs are presented in
Table 1. Both the physicochemical property data and molecular structures were obtained from ChemSpider.
4.1. Regression Models
The Harmonic–Arithmetic (HA) topological index is modelled using six physical properties of the eighteen anti-Parkinson’s drugs: boiling point (BP), enthalpy (E), flash point (FP), molar refractivity (MR), polarity (P) and molar volume (MV) (refer to
Figure 5).
The following regression models are considered for our comparative study:
where
Z,
,
and
are the physical properties of the drug (dependent variable), constant, regression coefficient and topological index (independent variable).
4.2. Regression Models for Harmonic–Arithmetic Index HA(G)
- (i)
Harmonic–Arithmetic index of linear regression model is
BP = 193.557 + 13.24
E = 42.734 + 1.5699
FP = 71.17 + 7.8829
MR = 11.578 + 3.3007
P = 4.5884 + 1.3085
MV = 47.177 + 8.965
- (ii)
The Harmonic–Arithmetic index of the quadratic regression model is
BP = 421.193 − 11.451 + 0.629
E = 72.521 − 1.6611 + 0.0823
FP = 248.11 − 11.31 + 0.4891
MR = −17.392 + 6.443− 0.0801
P = −7.1538 + 2.5822− 0.0325
MV = −96.811 + 24.583− 0.398
- (iii)
The Harmonic–Arithmetic index of the cubic regression model is
BP = −1485.1 + 296.4− 15.2 + 0.2614
E = −124.39 + 30.139− 1.5528 + 0.027
FP = −771.55 + 153.36− 7.978 + 0.1398
MR = −133.23 + 25.151− 1.042 + 0.0159
P = −53.032 + 9.9914− 0.4134 + 0.0063
MV = −530.44 + 94.613− 3.9988 + 0.0595
It was noted that the curvilinear regression model demonstrated significantly superior RMSE and
r in the models that exhibited a decrease in RMSE values, an increase in
r values and a
p-value less than 0.005, which indicates strong statistical significance in a QSPR model, thereby concluding that it is a superior regression model. In
Table 2,
Table 3 and
Table 4, the correlation between the HA topological index and the six physical properties of anti-Parkinson’s drugs is represented by the statistical parameters. The curvilinear variation between MR and the HA index is illustrated in
Figure 6a; the curvilinear variation between P and the HA index is illustrated in
Figure 6b; and the curvilinear variation between MV and the HA index is illustrated in
Figure 6c.
In [
10], a statistical analysis of Parkinson’s disease was examined using linear regression models, and then the authors calculated the values that showed strong positive correlations between various topological indices and the physical properties of anti-Parkinson’s drugs. Among the 14 topological indices analyzed, the symmetric division degree index (SDD) and the Harmonic index (H) exhibit highly significant positive correlations. Upon comparing the H index in [
10] with the HA index, we observed that MR, P and MV have the most similar correlation coefficient values. This significance is based on choosing the maximum correlation coefficient (r) and minimum RMSE value in the linear regression model (refer to
Figure 7).
Observation:
Similar to the linear regression model, the quadratic and cubic regression models yield the highest correlation coefficient r for the H index and HA index in relation to physical properties (PP), with nearly identical values.
The correlation coefficient values for the quadratic and cubic regression models are follows (see
Figure 8):
The correlation coefficients for the H index with MR and the HA index with MR are as follows: and , respectively.
The correlation coefficients for the H index with P and the HA index with P are as follows: and , respectively.
Also, the correlation coefficients for the H index with MV and the HA index with MV are as follows: and , respectively.
5. Discussion
The HA index is distinctive due to its unique properties and the way it quantifies molecular and graph structures, setting it apart from other topological indices. In
Section 2, this paper highlighted the unique application of the HA index in comparing different tree structures
. While previous research established that these trees have a minimum first irregularity Sombor index [
12], this study demonstrates that they can exhibit both maximum and minimum HA indices, showcasing the versatility and unique application of the HA index in characterizing graph structures.
In
Section 3, the QSPR analysis of the Harmonic–Arithmetic (HA) topological index on various drugs reveals its potential in identifying physical and chemical properties strongly correlated with the topological structure of anti-Parkinson’s drugs. The comparison of correlation coefficients between the Harmonic index in [
10] and the HA index demonstrates that both indices are effective in correlating with the drugs’ physical properties. The quadratic and cubic regression models, like the linear model, yield similar high correlation coefficients
r for the HA index and H index, indicating their strong relevance to property prediction. This highlights the application of the HA index as a valuable tool in QSPR analyses for evaluating and predicting the physicochemical attributes of drug compounds.
6. Conclusions
In this paper, we computed the bounds for the Harmonic–Arithmetic (HA) index of starlike trees with a maximum node degree and also conducted a comparative study between the H index and the HA index using the QSPR model to support chemists in discovering novel drugs because it applies regression techniques to predict molecular properties, typically in fields like drug design, material science and environmental chemistry. Future research could explore the HA index’s application in a broader range of molecular structures and its integration with machine learning algorithms to enhance predictive modeling. Additionally, investigating the HA index’s performance across different classes of drugs and comparing it with other emerging indices could provide further insights into its reliability and adaptability. Finally, extending this study to real-world pharmacokinetic and pharmacodynamic data would deepen our understanding of its practical implications in drug development.