1. Introduction
The field of genetics and genomics has evolved greatly in the wake of ongoing technological advancements [
1,
2,
3]. Consequently, diverse methods have arisen to investigate genetic diversity. Some of these methods gain popularity and momentum only to be replaced by subsequent more effective and faster techniques, e.g., allozymes [
4] and Restriction Fragment Length Polymorphisms (RFLPs [
5]), while some methods such as microsatellites endure the tests of time [
6,
7]. The development and subsequent availability of high-throughput sequencing methods (“next generation sequencing”-NGS) has once again changed the game for conservation genetics, providing highly informative and precise genetic information for diverse applications [
1,
3]. These methods, such as SNPs, genotyping-by sequencing (GBS), restriction-site associated DNA sequencing (RADseq [
8]), multiplexed ISSR genotyping-by-sequencing (MIG-seq [
9]) etc., provide invaluable insights into aspects such as population diversity, dynamics, and viability, identifying taxonomic and management units, detecting local adaptation, and predicting genetic effects, all of which can greatly inform conservation decisions [
1,
3,
10].
However, these high throughput NGS methods are often prohibitively expensive in developing countries due to a lack of infrastructure, skills required for the methodological complexity of these techniques, and unfavourable exchange rates for equipment and consumables [
2,
11,
12]. Consequently, these countries have to resort to cheaper genetic markers [
7]. These countries are coincidentally also often biodiverse, containing disproportionately high levels of rare taxa, and they experience great human- or climate-related pressures, further increasing extinction risks of endemic or rare species in these countries [
13].
Because of this financial and skills inequity, conservation research is scarcely performed in areas where it is most needed [
11,
12]. Conservation spending (and budgets) correlate to the level of research performed and reductions in the rate of biodiversity loss in these countries [
12,
14]. It is thus imperative to provide support for these countries, while also investigating and implementing simpler and more cost-effective genetic methods more accessible to local researchers and institutions [
2,
12].
One such method is Inter Simple Sequence Repeats (ISSRs [
15,
16]). ISSRs are multilocus markers that are generated from PCR reactions using flanking microsatellite (SSR) regions as priming sites [
15,
16]. The ubiquity and variability of microsatellites in eukaryotes mean many priming sites are available throughout the genome leading to increased resolution and almost full genome coverage, all whilst requiring no a priori sequence information [
15,
16,
17,
18,
19,
20,
21,
22]. The main benefits of ISSRs are their cost, speed, and simplicity compared to other methods [
6,
17,
23,
24]. Moreover, the use of PCR allows for the rapid generation of large volumes of markers from only a small amount of DNA [
20,
24]. ISSRs are also highly sensitive markers suitable for discriminating closely related species and investigating intraspecies variation [
21,
25,
26]. ISSRs thus offer a higher degree of resolution compared to other “fingerprinting” molecular methods [
21]. Compared to RFLPs and RAPDs, ISSR markers provide similar results but produce more extensive and informative datasets for less cost, time, and labour [
17,
24]. In contrast, methods, such as AFLP markers, although more reproducible and accurate, are more costly and complicated [
17]. ISSRs are also more reproducible than RAPD markers [
17] but are less reproducible than RFLPs [
24]. ISSRs are thus very useful molecular markers in ecological, genetic diversity, and even systematic studies due to their hypervariable nature and low cost [
20]. As a consequence of these factors, ISSRs are used widely in developing countries for a range of purposes. A search of the SCOPUS database (11 March 2024) using the search terms (“inter AND simple AND sequence AND repeat*) OR ISSR*” found 7852 publications. An analysis of the author affiliations of these papers using VOSViewer Ver. 1.6.15 [
27] indicated that the vast majority of these authors or co-authors are from developing countries, many of which are also mega-biodiverse (
Table 1).
ISSRs have proven to be valuable in a wide range of applications, including hybridisation and taxonomic studies [
21,
25], phylogeny reconstruction [
28], population genetic studies [
29,
30,
31,
32], demographics [
33], the investigation of the mating systems and reproduction of plants [
34], sex determination [
35,
36], and distinguishing ecotypes [
37], as well as studies on crops, crop relatives, medicinal plants [
38,
39,
40,
41], and identifying markers for traits such as toxin production or phenotypes [
35,
36,
42]. Of particular relevance to our study, this method has also been applied to rare and endangered or endemic species [
29,
43,
44,
45], as well as widespread and common species [
46].
The vast majority of ISSR studies utilise conventional agarose gel visualisation of banding patterns, a very cheap and readily available technology, which perhaps explains the extensive use of this method in developing countries. However, another benefit of ISSR fingerprinting is that the primers can be modified by labeling with fluorescent dyes that allow for the automated detection of bands using DNA-sequencing machines [
47]. This modification of the primers and use of slightly more costly automated detection systems provides greater sensitivity and resolution of bands, as well as the ability to accurately size much larger ISSR fragments, resulting in larger datasets and more accurate fragment sizing potentially able to differentiate fragments with as little difference as a single nucleotide [
48,
49]. Owing to the higher sensitivity of the automated process, much larger datasets are produced, but possibly with lower marker informativeness [
37]. However, despite these advantages, it has not been widely used. However, automated ISSR fingerprinting has been used effectively in plantains (
Musa L. sp. [
37]), cotton (
Gossypium L. [
50]),
Vachellia karroo (Hayne) Banfi and Galasso [
51], the endemic and widespread species within
Tolpis Adans. (Asteraceae [
52]), and endangered
Faucaria tigrina Schwantes (Aizoaceae [
45]).
Based on the above considerations and the merits of ISSRs, this study employs automated ISSR fingerprinting to determine the genetic diversity of the African cycad
Encephalartos eugene-maraisii I. Verd. species complex and to ascertain whether genetic diversity corresponds to currently defined taxonomic groups in this complex. Of relevance to this study is the fact that ISSRs have previously been used in cycads for a wide range of applications (
Table S1), but their use, along with automated fragment detection, has yet to be applied to cycads.
The Conservation Status of Cycads in Africa–Encephalartos as a Case Study
The cycads (Order Cycadales) comprise a group of dioecious gymnosperms with large cones and a palm-like appearance that is proposed to have emerged during the late Carboniferous approximately 300 million years ago (Mya) [
46,
47,
53,
54,
55,
56]. While all extant cycads originate from a more recent radiation in the late Miocene 12 Mya [
55,
56,
57,
58], these plants represent the oldest living seed plants, making them useful study organisms for investigating the evolution of seed reproduction and the emergence of angiosperms [
46,
53,
59,
60,
61].
The African cycad genus
Encephalartos Lehm. is considered the most threatened cycad genus globally and the most threatened group of organisms in South Africa, with 12 of 37 (32%) species in South Africa listed as critically endangered (compared to the global average of 17% in cycads), and an additional four of which are endangered [
62,
63]. Moreover, the five cycad species that are listed as extinct in the wild by the IUCN are from the genus
Encephalartos, all of which once occurred within the borders of South Africa (
E. brevifoliolatus Vorster,
E. nubimontanus P.J.H.Hurter,
E. woodii Sander., and
E. heenanii R. A. Dyer) or landlocked Eswatini (=Swaziland,
E. relictus P.J.H.Hurter). Additionally, South Africa is an important cycad diversity hotspot and site of endemism containing 58% of
Encephalartos species, of which 29 (79%) are endemic [
64,
65,
66,
67].
This South African cycad extinction crisis [
65,
67,
68] may result in South Africa losing 50% of its species within 2–10 years [
69]. This extinction is driven by poaching for the ornamental plant trade, the harvest of specimens for medicinal, recreational, and magical purposes [
62,
63,
64,
69,
70,
71,
72,
73], as well as pathogens [
74], herbivory [
75], and pollinator extinction [
66,
76]. Moreover, climate change leads to greater environmental stochasticity. Subsequent susceptibility to pests and pathogens [
13,
75,
77] also poses a threat, as well as habitat fragmentation and destruction, the spread of alien invasive species, and reproductive failure [
64,
65,
78]. The conservation of this group has thus never been more urgent.
Despite much activity in South African cycad conservation and research [
62,
65,
70,
72,
79,
80], there remains limited knowledge about even the most basic aspects of cycad biology or population size and trends for many species [
64,
81,
82,
83]. In addition, research directed at assessing the genetic diversity of South African cycads is required, as little work has been done on these taxa [
84,
85]. Moreover, the taxonomic relationships between some species, especially among closely related taxa, need to be resolved, thereby allowing for the correct designation of conservation status for these taxonomic units [
64,
81,
86,
87,
88]. Much of the taxonomically unresolved portions of the genus occur within species complexes containing recently diverged taxa [
64,
65].
Species complexes comprise groups of closely related species that often co-occur or have close geographical proximity. Owing to morphological and genetic similarities, members of these complexes are often difficult to distinguish, which can lead to unclear or biased species delimitation or incorrect designation of conservation units [
88]. These complexes are additionally enigmatic in that morphological distinctness does not necessarily correlate with the genetic differentiation of species, with the opposite occasionally true [
89]. Several examples of cycad species complexes exist [
90,
91,
92]. In the genus
Encephalartos, such complexes include the
E. hildebrandtii A.Braun and C.D.Bouché species complex of East Africa [
93], as well as a group of mostly glaucous
Encephalartos species in the Eastern Cape Province of South Africa [
89], and the glaucous cycads comprising the
Encephalartos eugene-maraisii I. Verd. complex occurring in the northern escarpment of South Africa, comprising six species [
94].
The Encephalartos eugene-maraisii complex
The
Encephalartos eugene-maraisii complex is a group of closely related cycads with glaucous foliage occurring mainly in the Limpopo and Mpumulanga provinces of South Africa (
Figure 1). Members of the complex comprise
E. eugene-maraisii,
E. dolomiticus Lavranos and D.L.Goode,
E. middelburgensis Vorster, Robbertse and S.van der Westh.,
E. dyerianus Lavranos and D.L.Goode,
E. cupidus R.A. Dyer, and
E. nubimontanus P.J.H.Hurter (
Table S2). In this complex, the taxonomic relationships are uncertain, and there is considerable morphological variation within the complex, with some species having as many as 11 different variants recognised (formally and informally) by collectors and growers ([
95]
Table S2). Within this pool of variation, there may lie undescribed species, or, alternatively, species that require merging. Due to the tendency among cycad taxonomists for the excessive subdivision of species [
96,
97], many taxa in such complexes may be possible artefacts of over-ambitious taxonomy.
Broad taxonomic and phylogenetic studies on
Encephalartos place the
E. eugene-maraisii complex into a single clade with little to no resolution and weak support between the member species [
86,
98,
99]. The morpho-geographical classification of
Encephalartos proposed by Vorster (2004) [
94] places the complex, with the tentative inclusion of
E. hirsutus P.J.H.Hurter, in the same grouping. Molecular studies by Stewart et al. (2023) [
99] and Mankga et al. (2020) [
98] supported the exclusion of
E. hirsutus from the
E. eugene-maraisii complex but provided no further insight into the molecular or taxonomic relationships between members of the complex. Species members of this complex were not well represented in these studies comprising singletons, pairs, or being absent entirely. This likely had consequences for phylogenetic resolution for this group, particularly since these taxa are closely related [
100].
2. Materials and Methods
2.1. Sampling
Young, but hardened-off leaflets from the six Encephalartos species of the E. eugene-maraisii complex, as well as two samples of E. hirsutus, were sourced from a private cycad collection in White River in the Mpumalanga province of South Africa and at the cycad gene bank of the South African National Biodiversity Institute’s (SANBI) Lowveld Botanical Gardens (Mbombela, Mpumalanga, South Africa) on 30 September and 1 October, 2021. Additional samples from University of Pretoria cycad collection, and several private collections in Pretoria, were collected on 11 May 2022. It was decided to include E. hirsutus in the study as a reference point to better conceptualise whether observed differences in genetic diversity between members of the complex were substantial when compared to a related but obviously different species. To ensure correct species identification, selected plants were cross-referenced with specimen records from each of the gardens and species identity was confirmed visually by Mr A. W. Frisby (curator of the University of Pretoria’s cycad collection). Care was taken to sample plants originating from as many disjunct localities as possible (where locality data for the individuals were available). Suspected hybrids were omitted from the study. Collected leaflets were temporarily stored in paper envelopes and refrigerated until they could be transferred to individual ziplock bags containing silica gel for desiccation.
2.2. DNA Extraction
Approximately 30 mg of silica-dried material per sample were ground with metal beads using the Geno/Grinder 2010 (Cole-Parmer, Metuchen, New Jersey) and extracted in two batches of 96 samples using Sbeadex Maxi Plant kit and the Oktopure robot (LGC Biosearch Technologies, Hoddesdon, United Kingdom) in the labs of the Forest Molecular Genetics, Forestry and Agricultural Biotechnology Institute (FABI), University of Pretoria. These runs additionally included duplicate sample pairs representing material of the same plant but were extracted in both batches to test for consistency in extraction runs. DNA purity and contamination were determined spectrophotometrically by calculating the ratio of absorbance at 260 nm to that of 280 nm, and the ratio of absorbance at 260 nm to that of 230 nm (Nanodrop, Thermofischer Scientific, Waltham, Massachusetts).
2.3. PCR Optimisation and Selection of ISSR-PCR Primers
Six ISSR primers manufactured with 5′ fluorescent labels were screened for their suitability in amplifying cycad DNA (
Table 2). Primers in the trial PCR runs that consistently produced mostly bright, clear bands for a wide range of samples when viewed under UV on 1% agarose gel were selected for this study (
Figure S1). Selected primers were also optimised under different MgCl
2 concentrations.
2.4. PCR Reaction Conditions
Following primer selection, bulk amplification of sets of 96 samples was performed in a Bio-Rad T100 thermal cycler under the following reaction conditions: the 25 μL reaction mixture constituted 4 μL of DNA (approximately 1200 ng), 1 μL of 25 μM ISSR primer, 12.5 μL of 2× Ampliqon Master mix (Ampliqon Taq DNA polymerase, 0.4 mM each of dNTP, Tris-HCl pH 8.5, (NH
4)
2SO
4, 3 mM MgCl
2, 0.2% Tween™ 20, inert red dye, and stabilizer) and 4 μL of distilled water. It must be noted that the amount of DNA template used is higher than generally recommended, but as a very high annealing temperature was used in the PCR cycling, the effects of this increased template quantity is offset by the very stringent PCR conditions. The PCR reaction was performed with an initial denaturation phase at 96 °C for 2.5 min, followed by 30 cycles of 96 °C denaturation for 30 s, 52 °C annealing for 30 s, 72 °C extension for 2 min, and ending with a 2-min extension at 72 °C. It must be noted that the annealing temperature (Ta) for the PCR conditions used was higher than the primer melting temperatures (Tm). This was done in order to ensure stringent amplification of PCR amplicons and avoid spurious primer annealing, thereby increasing PCR specificity. However, using a Ta higher than Tm will reduce the yield of PCR products, but as a very sensitive detection system is used in our study, this is not considered to be a limitation [
101].
PCR products of each sample using each selected primer were visualised under UV on 1% agarose stained with ethidium bromide to discern amplification success, and samples with no visible bands for any of the selected primers were omitted from subsequent analyses (
Figure S2, Table S3). If a sample yielded adequate bands for at least one of the selected primers, all PCR products of this sample (comprising each of the selected primers) were retained in the study. These PCR products were sent to Central Analytical Facility, Stellenbosch University for capillary electrophoresis and automated detection using an ABI 3500XL sequencer equipped with fragment profiling software. The 1200LIZ size standard was used, allowing for fragment size estimation between 20 and 1200 base pairs. The electropherogram from each sample was analysed using Genemapper software Version 5 (Applied Biosystems Waltham, Massachusetts, United States).
2.5. Construction of Datasets for Analysis
Three datasets per primer were produced using Genemapper, which was used to select bands based on the user-defined fluorescence cut-off values of 50, 100, and 200 relative fluorescence units (rfu). These datasets represent varying levels of sensitivity to band intensity, with 50 rfu cut-off being the most sensitive, scoring faint to bright bands, and potentially including inconsistently amplified bands, and the least sensitive, 200 rfu, which scored only bright bands. Bands brighter than the cut-off values were scored “1” for presence and “0” for absence, and invariant alleles were omitted from subsequent analyses. The resultant binary datasets were saved as spreadsheets for further analysis. Pooled datasets at each cut-off level containing the binary data for all primers were then created. These data were examined for the distribution range of band number (considered to an indicator of amplification success). Based on these analyses, a quarter (25%) of the samples that had the lowest number of detected bands were excluded from the subsequent analyses, as the PCR was deemed to be only partially successful or unsuccessful.
2.6. Methods to Assess Genetic Similarity and Diversity
Four different methods were chosen to analyse the data. These were cluster analysis (also called numerical taxonomy or phenetics [
102]), median-joining network analysis [
103], and STRUCTURE analysis [
104] using Bayesian Markov chain Monte Carlo (MCMC) estimation, as well as statistical analysis employing Analysis of Molecular Variance (AMOVA) and Tajima’s D statistic [
105].
2.6.1. Cluster Analysis
Genetic distance matrices were computed using various distance coefficients in NTSYS-PC Version 2.02k [
106] (
Table S4). Distance matrices were clustered using the Unweighted Pair Group method with Arithmetic Averages (UPGMA), and the Neighbor-joining (NJ) method. To determine the appropriate clustering method and distance coefficient for the data, dendrograms were compared visually for the “logical” clustering of samples (i.e., the somewhat subjective assessment of grouping of samples as species clusters), as well as computationally through cophenetic correlation analysis and normalised Mantel test [
107]. Additionally, although its application is not especially commonplace in dendrograms (see, for instance [
108,
109]), we used bootstrap resampling to assess the robustness of clusters. UPGMA trees were generated with 1000 bootstrap resamples using PAST software Version 4.1.7 [
110], and bootstrap resampling for NJ trees was performed using Darwin Version 6.0.2.1 [
111].
2.6.2. Statistical Analysis
Analysis of Molecular Variance (AMOVA) was also performed with PopART to assess the distribution of observed genotype variation and Φ
ST values calculated for 1000 permutations of ISSR haplotypes among populations. Species were also grouped into pairs to determine variation between and within species pairs, namely
E. cupidus and
E. dolomiticus;
E. dyerianus and
E. eugene-maraisii; and
E. middelburgensis and
E. nubimontanus, while
E. hirsutus was assigned its own group. Tajima’s D statistic was also computed in PopART to detect the presence of non-random evolution in the gene pool [
105].
2.6.3. STRUCTURE Analysis (Bayesian MCMC)
The Bayesian clustering of the populations was assessed using STRUCTURE software Version 2.3.2.1 [
104] for each of the three pooled datasets. Ten independent runs with a 10,000 iteration burn-in and MCMC chain of 100,000 generations were run with the number of populations (K) ranging from 1 to 10. Alleles were treated as haploid, and the allele frequencies were set to be correlated using the admixture protocol. Another 10 independent runs were performed on these K-values using the LOCPRIOR model in STRUCTURE, which accounts for locality data prior to the commencement of the run. For the sake of this study, and due to a lack of precise locality information for all samples, each species was considered to be a single locality. All runs were performed with the admixture model setting, and with allele frequencies correlated. The optimal K-value, generally considered the smallest K for which the probability of the observed data is maximised, was determined using STRUCTURE HARVESTER [
112] based on the method developed by Evanno et al. (2005) [
113].
2.6.4. Network Analysis
Haplotype network analysis was performed in PopART software Version 1.7 (
http://popart.otago.ac.nz (accessed on 15 May 2024); [
114]) using the Median-Joining method with epsilon set to zero [
103].
5. Conclusions
Using the automated ISSR-detection method and a range of analytical approaches, we were able to distinguish some of the species within the E. eugene-maraisii complex as distinct lineages. However, we recommend additional sampling and further optimisation of DNA extraction and PCR-amplification procedures for some of the currently recognised species, as these taxa may not warrant recognition at this rank. In addition, the use of additional primers may be necessary to improve resolution and elucidate the relationships among E. nubimontanus and E. cupidus, as well as the taxonomic validity of E. middelburgensis.
Our study has, moreover, highlighted the importance of using a variety of datasets and analytical methods to explore the signal in the data and to determine which datasets best suit each analysis.
Finally, we demonstrate the suitability of automated ISSR fingerprinting as a rapid, simple, and cost-effective method to investigate genetic diversity and taxonomic limits in closely related and range-restricted Encephalartos species, and potentially many other taxa. This method thus holds great potential in the application of conservation genetics and taxonomy of all taxa for scientists in developing countries.