1. Introduction
Species diversity is one of the most amazing aspects of our planet, and sufficient knowledge of global biodiversity contributes to biodiversity conservation and an understanding of nature. The total species number on Earth has been estimated to be between 2 and 10 million [
1,
2,
3], while the total number of described species has increased to approximately 1.8 million [
4]. A consensus among biologists is that many species are still awaiting discovery and description. The poor knowledge of the identity of species on Earth, with many species yet to be described and recorded, is called the Linnean shortfall [
5]. Moreover, geographical distributions for many species are also poorly understood and include numerous gaps [
5,
6]. This is the so-called Wallacean shortfall [
7]. The lack of knowledge about distributional data has been considered a serious problem for reserve design [
8,
9], especially systematic conservation planning [
10]. The discovery and description of species is still a major endeavor of the field of taxonomy. To overcome the Linnean and Wallacean shortfalls, there has been increased taxonomic effort for the discovery of more new species and new records in different regions. Costello et al. reported an increase in the number of authors describing species since the 1950s [
3], and the average discovery rate of species has increased from 17,500 per year to more than 18,000 per year since 2006 [
11]. Some issues, such as describing new species based on a single specimen/locality, often appear, especially from the destroyed parts of biodiversity hotspots. On the one hand, single-specimen/locality species provides the necessary information of a new species to help taxonomists know the species and relative taxa. On the other hand, incomplete distribution data and possible morphological variations associated with single specimen/locality usually require further taxonomic effort.
In general, species are identified based on several qualitative or quantitative morphological characters that are obviously distinguished from other species [
12]. A detailed morphological description should be derived from observing the stable and clear characteristics of enough specimens. Unfortunately, limited specimens, even single specimens, are often used in the identification of new species for some taxa. Given that infra-specific character variation often occurs in different environments (e.g., host plants, temperatures, and altitudes) and geographical regions, limited specimens hardly represent the whole picture of character diversity within a species, which then affects the reliability of species descriptions. Dayrat even proposed that new species should never be named or described based on a single specimen [
13], although this topic is considered controversial [
14,
15]. Although the phenomenon of single specimen/locality species has been mentioned in some previous literature [
16,
17], there are still very few comprehensive empirical data available to understand the current situation and dynamic tendency of this issue.
With the development of DNA sequencing techniques, new methods based on DNA barcoding are commonly used for species delimitation [
18]; these methods include the general mixed Yule coalescent (GMYC) [
19], Poisson tree process (PTP), and automatic barcode gap discovery (ABGD) [
20,
21]. Compared to traditional morphological identification, these new methods provide numerical or quantitative analysis, and can find slight differences in gene sequences (even one base-change) for species identification. Furthermore, these techniques often require a larger sample size for accurate species delimitation. For example, DNA barcoding requires multiple samples (ideally at least 20 individuals as suggested by Luo et al., though this may be hard to achieve for describing new species in most cases) to calculate inter-/intra-genetic distances to show distinct species boundaries [
22]. Hence, it should be interesting to know whether the use of DNA data can help decrease the occurrence rate of new species based on only one specimen/locality.
Insects are the most diverse group and have the highest species richness on Earth, including approximately half of all described species [
3,
4]. To date, the number of described insects has increased to 854,031 species [
4]. Therefore, insects are ideal candidates to assess the current taxonomic practices in terms of species description. In the present study, we compiled a dataset of 4811 species described in 1261 articles between 2009 and 2017 from
ZooKeys, one of the top-ranked journals publishing a large number of insect new taxa. Then, we examined the prevalence of using single-specimen/locality for new species and investigated whether the use of DNA data in species description accompanies a lower rate of single-specimen/locality species. This empirical analysis may promote our understanding of the issue of single-specimen/locality species and other related issues in taxonomy.
2. Materials and Methods
Our data collection was based on articles about insect new taxa from 2009 to 2017 in the taxonomic journal ZooKeys, which is ranked second in Thomson Reuters’ Index of Organism names of the top 10 journals publishing the greatest number of new taxa in zoology (detailed in the ZooKeys website). As an open access journal from which all papers can be downloaded, data extraction was also convenient and efficient.
A literature survey was carried out on the
ZooKeys website using the search text of ‘new species’ and then filtered by taxon (Insecta) to keep only articles reporting new insect species. The following data were extracted: (i) Date of publication of new species, (ii) locality of holotype, (iii) numbers of type specimens as well as collection localities, (iv) countries of the first and corresponding authors, (v) use of DNA data, and other information. A total of 57 fossil insects were excluded from the final dataset. The paper by Blahnik and Holzenthal was also excluded because the paper used 11,562 specimens for
Oecetis houghtoni sp. n. [
23], which indicated an extreme data record, while the specimen numbers used in all the other papers were below 2000. Our final dataset contained a total of 1261 papers with 4811 new species between 2009 and 2017 (Supporting Information
Table S1). In this work, additional material examined in
Table S1 was not included for the calculation of the specimens number, because the proportion of papers including additional material examined was very low (335/4811 = 7%) and some values of additional material examined were huge (e.g., Mathis and Zatwarnicki examined 987 specimens for
Polytrichophora adarca sp. n.) [
24], which might lead to a bias.
The number of specimens and localities per new species were summarized, and the temporal trend of specimen/locality number was investigated from 2009 to 2017. To show the geographical pattern of usage of a single specimen/locality, we also analyzed the rates of these species published by authors from different countries. Data for the top 10 countries were plotted as histograms. In addition, we compared the rates of single-specimen/locality species among different insect orders.
To test whether the use of DNA data can decrease the rate of single-specimen/locality species, the proportion and temporal trend of species and papers with DNA data were analyzed for the period of 2009 to 2017. Our expectation was that the rates of single-specimen/locality species in papers with DNA data should be lower. We also calculated the average specimen number for each species (including additional material examined) with/without DNA data.
A distribution map of the holotypes at the global scale was constructed to show the taxonomic effort published in ZooKeys. In addition, the distribution patterns of holotypes of new species published by corresponding authors from the top four countries that described the greatest number of species were also constructed. Finally, a Venn diagram was constructed to show the relationship between three variables, including the first author’s country (FAC), corresponding author’s country (CAC), and collecting country of holotype (CC), which can indicate international collaboration efforts in insect species descriptions.
3. Results
Our results showed that among all 4811 new insect species, 1036 (21.53%) were described based on a single specimen and 2419 (50.28%) were based on fewer than five specimens (
Figure 1A). For the number of collection localities, 1046 species (21.74%) were described based on specimens from only a single locality, and 2620 species (54.46%) were described from specimens from two or fewer localities (
Figure 1B). Furthermore, the temporal trend of the rate of species established using a single or very limited specimens was basically constant (
Figure 1C), indicating that the problem of using few specimens did not change over time. Based on the trend of the numbers of collection localities between 2009 and 2017 (
Figure 1D), the species from only a single locality or limited localities occupied a relatively stable proportion.
The proportions of single-specimen/locality species in different countries fluctuated. The top 10 countries described more than 84.46% of single-specimen species and 84.32% of single-locality species (
Figure 2A). The proportions of single-specimen species were highest in Egypt, Belgium, and Costa Rica, at 100.00%, 66.67%, and 62.86%, respectively. Although the number of single-specimen species from the USA and China were the highest in all countries, the rates of these species were not very high (approximately 20%) (
Figure 2A). Similar results were shown in the rates of single-locality species (
Figure 2B). In addition, the top 10 insect orders for the number of species described were Coleoptera, Hymenoptera, Diptera, Lepidoptera, Hemiptera, Trichoptera, Blattodea, Orthoptera, Plecoptera, and Neuroptera. The mean rate of single-specimen species was 21.30% (highest: 24.43%, Hymenoptera; lowest: 17.18%, Lepidoptera) (
Figure 3A). The average value of single-locality species for the 10 orders was 20.88% (
Figure 3B).
The number of species with DNA data increased since 2009 and then rapidly declined since 2014 (
Figure 4A). The year with the highest proportion of species with DNA data (39.34%) was 2014. The possible reason was that some papers included a large number of species with DNA data, which led to the instable trend, such as the papers of Fernandez-Triana et al., Staines and García-Robledo, and Riedel et al. [
25,
26,
27]. Meanwhile, the rate of papers including DNA data was relatively stable, with a mean value of 11.44% (
Figure 4B). We found that the rate of single-specimen species with DNA data (15.06%) was lower than that without DNA data (23.43%) (
Figure 5). A similar result also appeared between the rates of single-locality species with and without DNA data. The average specimen number for each species with DNA data was 30.61 (standard deviation (SD): 72.91; 95% confidence interval (CI): 26.28–34.95), which was significantly higher (
p = 0.00012 < 0.001) compared to 20.89 (SD: 73.04; 95% CI: 18.54–23.23) of species without DNA data.
A distribution map of the holotypes of all species showed that it was very uneven for holotype collection localities, though the distributions of holotypes covered 136 countries (
Figure 6). The top five countries occupied 51.10% of all holotype records (China: 23.61%; Costa Rica: 9.00%; USA: 7.20%; Australia: 6.29%; and Brazil: 5.00%). We also analyzed the distribution of holotypes of species published by corresponding authors from four countries with the highest numbers of species (
Figure 2; USA 27.95%, China 21.37%, Canada 11.54%, and Germany 7.84%). The holotype specimens for USA authors were from 80 countries, especially those on the American continent (USA: 19.42%; Brazil: 8.67%; Mexico: 6.75%) and Australia (11.05%) (
Figure 7A). Although Chinese authors published the second-highest number of new species, more than 93.58% holotypes were collected only in China (
Figure 7B). For Canadian and German authors, their holotype specimens were from 38 and 28 countries, respectively. Costa Rica (45.95%) became a major collecting country for Canadian authors (
Figure 7C), while Papua New Guinea (23.42%), Indonesia (21.38%), and China (17.31%) were popular collection areas for German authors (
Figure 7D). As the Venn diagram (
Figure 8) indicates, the first and corresponding authors for 4040 of the total 4811 (83.97%) species were from the same countries. There were 1694 species (35.21%) shared by the first author (FAC), the corresponding author (CAC), and the collecting country of holotype (CC), suggesting that the possibility of congruence among the country of holotype collection and the first and corresponding authors was not high. Meanwhile, the holotype collection countries of more than half of all species (2844, 59.11%) were different from the countries of the first and corresponding authors.
4. Discussion
Considering that many species remain undiscovered and undescribed [
3,
28], it is necessary to increase taxonomic efforts and improve the currently used taxonomic practices to describe more species and their associated information. In traditional insect taxonomy, if a species falls outside of the intraspecific variability range of closely related species, most taxonomists describe it as a new species [
17]. However, for species with high phenotypic plasticity (the change in the expressed phenotype of a genotype as a function of the environment) [
29], a small number of samples do not adequately represent the entire continuum of morphological variation, which causes uncertainty in species descriptions and may lead to synonyms [
13,
30]. It has been estimated that 20% of the currently recognized species are undiscovered synonyms [
11,
31]. Insufficient specimens examined and described by taxonomists might result in the occurrence of these synonyms. Our results suggest a higher proportion of single-specimen species (21.53%) for 4811 insects in comparison to 17.7% single-specimen species (123 of 695 species) in Lim et al.’s survey [
17]. Based on our data, the constant temporal trend of the rate of species with single or limited specimens (
Figure 1C) indicates a lack of sufficient attention to this problem.
As described by Whittaker et al., many regions of the world remain seriously under-collected for most taxa, and only some available parts of the Earth’s surface have undergone robust analyses of diversity patterns [
6]. We found that more than half of new species were collected from only one or two localities, and this phenomenon lasted over time (
Figure 1B,D). To overcome the Wallacean shortfall, it is necessary to obtain more available distribution data for every new species. In fact, geographical morphological variation in species is common. For example, in
Trilophidia annulata (Orthoptera: Oedipodidae), different sizes of the forewing and hindwing and shapes of the forewing among populations are easily observed [
32]. Both the biological and the morphological characteristics of
Parthenolecanium corni (Hemiptera: Coccoidea) populations vary geographically and even between host plants within the same region [
33]. Therefore, using more specimens from different localities helps represent geographical morphological variation for species identification.
Our results also showed that the rate of single-specimen species with DNA data was lower (15.06%) (
Figure 5). This result suggests that incorporating DNA data in species descriptions can decrease the proportion of single-specimen species. Meanwhile, the average specimen number for each species with DNA data was significantly higher than those without DNA data, indicating less specimens might result in a reduction of the probability of using the DNA method. Additionally, another explanation can be due to the possible lack of funding and facilities, as DNA data is hard to obtain for many taxonomic researchers. With the development of molecular techniques, many new methods based on DNA sequencing have been introduced to delimit species boundaries, which commonly require sufficiently large sample sizes for accurate species identification. For example, insufficient sample sizes are considered to lead to possible species misidentification in DNA barcoding [
34]. Larger samples are beneficial for the efficacy and accuracy of DNA barcoding [
35]. Approximately half of the new species described are based on fewer than five individuals, and 77.36% of 4811 new species do not include DNA data. New types of data yielded from new technologies, such as DNA barcoding and integrative taxonomic practices, can help improve the quality and efficiency of taxonomy [
36,
37]. We advocate that more DNA data should be introduced into taxonomic practices, which provide an important complementary for morphological identification and drive taxonomists to collect and describe more specimens for each new species.
The holotype numbers from European countries were not high from 2009 to 2017 (
Figure 6), but this is understandable if we consider the long history of taxonomy and the well-studied insect fauna in this region. However, by collaborating to collect specimens from other geographical regions, these countries, such as Germany, can still publish many new species, which helps promote international collaboration. The USA and Canada not only describe many new local species but also collaborate extensively with other countries (
Figure 7A,C). Compared to taxonomists from these countries, Chinese authors describe most new species based on native specimens. One possible reason is that the taxonomic history in China is not long, and many new species remain to be discovered by Chinese taxonomists. Another possibility is that the international collaboration opportunities available to Chinese taxonomists are still insufficient, especially in describing species collected from other regions. As
Figure 8 shows, most collaborations (83.97%) occur when the first authors and corresponding authors are from the same countries, which also indicates that international collaboration should be improved. We think governmental agencies should provide corresponding policy and funding opportunities to improve international collaboration in insect taxonomy. These can include, for example, a funding application that is open for taxonomists from other countries, and training courses in taxonomy provided for young researchers from developing countries.
5. Conclusions
In conclusion, our study shows that single-specimen/locality species are prevalent in the field of insect taxonomy. This problem may prevent a deep understanding of insect diversity patterns in many regions. The average shelf life of a species from the collection of the first specimen to its formal description as a new species is 20.7 years [
38]. It seems to there is enough time for taxonomists to collect more specimens before the publication of a new species; in fact, it also possibly reflects that taxonomists cannot prepare enough specimens to describe and publish new species. Some perceived realities may hinder taxonomists from getting more specimens. For instance, some taxonomists may not achieve enough funding sources. The high cost of collecting specimens from distant areas may prevent the use of more specimens in taxonomy. Some old specimens’ lack of imprecise locality or habitat information also hinder further collection. Large funding expenditures have accelerated scientific research in China [
39]. China is the second country describing the most new species in
ZooKeys (
Figure 2). Although our study examined the prevalence of single-specimen/locality species, we do not deny the necessity and value of single-specimen species in taxonomic work. For example, if more specimens are very difficult to obtain and a single specimen is significantly different (stable and clear morphological or genetic characteristics) from all other taxa, single-specimen/locality species should be published. However, it cannot be denied that the species descriptions based on single-specimen/locality species sometimes are unwarranted or deficient and may incur further taxonomic revisions and future time and economic costs. By reporting this empirical analysis, we hope more discussion on this issue can be stimulated. We suggest that practices, such as increasing the number of specimens and geographical coverage of sampling, the use of DNA data and integrative taxonomy, and international collaboration, might be adopted by more taxonomists in the description of new insect species.