1. Introduction
In field experiments, estimates of crop yield or any other crop metric are derived from measurements taken for all the plants within an experimental plot, rather than from individual plant measurements [
1,
2,
3]. It is common practice to exclude plants from the guard rows and harvest only those from the middle row or two middle rows of the plot, depending on the number of sowing lines [
4]. This approach eliminates the “margin effect”, which refers to differences in yield or other characteristics that may occur in plants located at the edge of an experimental plot, and this, in turn, increases the reliability of the estimate [
5,
6,
7]. Plants located at the edge of a plot are typically not considered representative. Therefore, measurements for these plants should be excluded when statistically processing the data from the respective experiments [
4]. It has been observed that these “margin” plants exhibit yield differences of up to 12% or even 15% when compared to other plants from the inner rows [
5,
6,
7]. The yield difference in these “margin” plants is reflected in their fresh weight compared to other plants. The underlying causes are still being studied. Nevertheless, it can be inferred that increased sunlight exposure and improved ventilation play significant roles in this difference or this issue may be linked to both intraspecific and interspecific plant competition [
4].
Excluding the “margin” rows of a plot raises questions about the representativeness of the sample, consisting of plants from the middle lines, when compared to the entire population of plants in the experimental plot. The accuracy of crop estimates is essential for approximating real crop values and those expected under large-scale farming conditions. The exclusion of “margin” plants can be viewed as a sampling method for estimating crop yields, and various sampling designs can be employed for this purpose. The following sections in this study outline the primary components affecting maize yield and introduce key sampling designs based on Random Sampling principles [
8,
9,
10].
The objective of this study is to assess the performance of nine spatial sampling designs in estimating specific maize yield traits, including plant height (cm), fresh weight per plant (g), dry weight per plant (g), and total ear (cob) weight per plant (g).
1.1. Components of Maize Yield Estimates
In various agro-climatic environments, maize plant yields are influenced by a range of agronomic factors, including plant population density, shelling percentage, and grain moisture content [
11]. The primary components contributing to maize yield estimates include: (a) plant density and kernel count, (b) moisture content, (c) maize harvest and shelling percentage, (d) harvest area [
11].
Plant density, determined by the total number of plants cultivated in a given area, is a pivotal factor in yield estimation. It can be calculated based on row spacing, the number of plants, and the distance between consecutive plants, or by dividing the total number of plants harvested in a specific plot by the designated soil plot area [
12]. Conversely, maize yield is strongly positively correlated with the number of kernels, which, in turn, is dependent on the number of ears per maize plant [
13]. To attain higher yields, usually hybrids are employed because some of them have the ability to withstand greater plant density [
14]. The moisture content at the time of harvest, defined as the amount of water in the grains, can also serve as a representative measure of the grown plants. This measure can be easily estimated either by employing a moisture meter or by calculating it based on the moisture content of grains in the laboratory from at least 10 randomly sampled ears [
15]. In addition to ear weight, which can be measured in kilograms after harvesting, another valuable yield measure is the shelling percentage. Maize shelling is defined as the process of removing kernels from the ear. It has been observed that maize with a moisture content of around 12% tends to exhibit the best shelling performance. Typically, the average shelling percentage of maize ears is approximately 80% when plants are harvested with a moisture content ranging from 20% to 25%, although this can vary depending on local environmental conditions [
16]. Recently, for the measurement of harvest areas and crop metrics, especially over large areas, the Global Positioning System (GPS) and satellite image data, in conjunction with Geographic Information Systems (GIS), have proven to be effective tools [
11].
The key parameters for the yield components are based on the estimates for plant height, fresh and dry weight, as well as the number and weight of maize ears. Therefore, the comparison of the nine spatial sampling designs in this study was focused on these specific maize yield traits.
1.2. Factors Affecting the Maize Yield
The stage of maturity of maize plants is the most crucial parameter for silage harvesting [
16]. In practice, the selection of the harvest time can be viewed as a trade-off among three main attributes: maize yield, nutritive value, and silage quality [
17,
18].
Farmers typically harvest maize plants 50–55 days after the “ear silking” stage when the plants have attained physiological maturity [
17,
18,
19,
20]. Two visual indicators for this stage are crucial in the grain: (a) the milk line (the boundary between the dent and liquid parts of the grain) and (b) the black layer (visible at the kernel’s base once the plant has reached maturity). Given this, the optimal timing for harvesting is when the milk line falls between 1/3 and 2/3 of the grain from the top. If the black layer appears, the grain should be ground to prevent a loss of digestibility [
17]. Consequently, the timing of harvesting is critical as it directly impacts silage quality, animal digestibility, and the ultimate product quality. For silage, hybrids that result in plants with a high percentage of dry weight, corresponding to tall plants with high yields, are considered the most suitable [
21]. Nitrogen fertilizers can contribute to improving the plant height of maize. However, it is important to note that several photosynthetic factors and environmental conditions can have a negative impact on plant height. The height of maize plants is considered one of the most crucial factors for estimating the total crop yield in a given area [
22]. Ear weight is a significant factor that influences yield and it is affected by the rate of effective “filling time”, during which the ear develops. This is why total ear weight has been chosen as one of the primary traits for comparing various sampling designs in this current study. Filling time can vary and is primarily dependent on the genotype. Limited availability of photosynthetic products, caused by climate changes or different weather conditions, can shorten the time required for efficient filling. Fresh weight is another critical plant trait for maize yield and serves as an indicator for estimating the moisture content of each plant. A higher moisture content before fermentation results in better silage quality in the final product.
1.3. Sampling Designs under Consideration
In this study, we assessed the performance of nine spatial sampling designs to calculate estimates for critical maize plant characteristics. Among several sampling methods, Simple Random Sampling is the fundamental approach, where each element
φ (sampling unit) selected from a set
K (target population) has an equal probability of being chosen as any other element from the same set [
8,
9,
10,
23].
Simple Random Sampling design is widely used across various scientific fields, primarily because of its simplicity in terms of statistical evaluation and inductive inference. In addition to its independent use, this technique also serves as a basis for more complex sampling designs, such as Stratified Random Sampling and Cluster Sampling. In practice, Simple Random Sampling can be applied as follows: (a) Definition of the Sampling Frame: The Sampling Frame is a complete current list of K units from the target population. These population units are then assigned sequential numbering. The minimum required sample size is predetermined, denoted as “n”. Subsequently, randomization methods, such as random number tables or software generating random numbers, are employed to select these “n” units, while the unit selection process involves replacement. Simple Random Sampling is used in cases where the units of the target population exhibit homogeneity concerning the considered characteristics (primary variables or outcomes). The samples obtained through Simple Random Sampling are representative of the respective target population only if the condition of homogeneity among the population units is met.
Systematic Random Sampling is another sampling method that introduces a systematic approach to the process of selecting population units. For example, one unit is randomly selected from the ordered and numbered sampling frame. This is the first unit of the sample. Next, after determining a fixed periodic interval, let us say containing
m units, the next selected unit is the
mth unit after the first one; and the process continuous until the desired sample size is reached. In this method, the resulting samples can be deemed representative only if the units of the Sampling Frame are registered randomly and not based on a specific criterion within the given population. Taking another step forward, we find another valuable sampling design: Stratified Random Sampling. In this sampling design, a population of size
N is divided into
k internally homogeneous sub-populations, each with sizes
N1,
N2, …,
Nk, referred to as “strata”. From each of these
k strata within the population, a random sample of size
ni (where
i = 1, 2, …,
k) is independently selected. This sampling can be carried out using either the Simple Random Sampling or Systematic Random Sampling method. The final sample, with a size of
n =
n1 +
n2 +… +
nk,’ results from combining these
k-independent simple random samples, and it is known as a “stratified random sample”. Samples obtained through Stratified Random Sampling are representative of the respective target population when there is maximum possible homogeneity of sampling units within each stratum and maximum possible heterogeneity among the strata, with respect to the characteristics or variables under consideration. Stratified Random Sampling is used in agricultural experiments mainly for measuring soil properties [
24,
25].
Cluster Random Sampling involves dividing the population into heterogeneous clusters and then randomly selecting a number of these clusters to include all individuals within them in the sample (census of units). This method is used when creating a sampling frame for the entire population is impractical mainly due to its size. Cluster sampling is particularly useful when a list of all individuals in the population is not available, making it a practical and feasible approach in various research settings [
26]. Samples obtained through Cluster Random Sampling are representative of the respective target population when there is maximum possible heterogeneity of sampling units within each cluster and maximum possible homogeneity among the clusters, with respect to the characteristics or variables under consideration.
Multistage Random Sampling is a design that combines many sampling methods. In each stage, a random sampling method is used. For example, in the first stage, the population is divided into strata. In the next stage, a number of clusters within strata are randomly selected. At the final stage a random sample of units is selected using simple or systematic random sampling (or other sampling method). Multistage sampling can be less expensive and time-consuming than other sampling techniques, and it can create a more representative sample of the population. This method is often used when the population is large (or usually) geographically dispersed, making it difficult to create a sampling frame that lists every member of the population. However, a larger sample size is required for multistage sampling to achieve the same statistical inference properties as simple random samples. Overall, multistage sampling is a useful approach for large-scale survey research and can help limit the aspects of a population that need to be included within the frame for sampling [
27].
2. Materials and Methods
A maize hybrid (AGN720, Italy) was chosen for the comparative analysis of several sampling methods to effectively estimate maize yield. This hybrid is known for producing plants with a high percentage of dry weight, making it suitable for silage. The maize crop was cultivated in an experimental field at Aristotle University of Thessaloniki (AUTH) farm, located at latitude 40°53′48.75″ N and longitude 22°99′18.98″ E in 2016 (as shown in
Figure 1). The seedbed preparation followed local agricultural practices, and the selected fertilization plan involved 200 kg N and 100 kg P/ha.
The maize field was sown in late April on 27 April 2016, using a 4-row pneumatic sowing machine called “Gaspardo”. Herbicide Modett 25/28 SE (containing 25% terbuthylazine and 28% dimethamid-p) was used for weed control at the recommended rate. Irrigation of the maize plants was carried out according to the crop’s specific requirements. After the crop emerged, three plots with dimensions of 4 m × 4.25 m were randomly selected and marked. Each plot consisted of six rows, each containing a maximum of 25 plants. The distance between the plots was 2.5 m, and the crop rows were spaced 80 cm apart, with a 17 cm gap between the plants in the same row. In total, each plot contained 6 rows and 150 plants, resulting in a total of 450 plants. For the purposes of this study, all individual plants within the three plots were considered representative units of the target statistical population. Therefore, the assessment of the sampling methods was based on all the samples collected from the three experimental plots.
In the current study, the following sampling designs were considered: (a) CS: Cluster Sampling with systematic constraints; (b) SS: Stratified Sampling with systematic constraints; (c) MS: Multi-Stage Sampling with systematic constraints; (d) SR: Stratified Random Sampling; (e) RS: Random Stratified Sampling; (f) RC: Cluster Random Sampling (
Table 1). To examine the effect of different constraints on the efficiency of sampling methods, systematic constraints were set. We focused on the same type of method and applied different constraints to assess its performance on the metrics of the main maize yield components (plant height, fresh weight, dry weight, and ear weight). Therefore, the term “systematic constraints” refers to the pattern of selecting or harvesting plants from specific rows or plants with specific codes in each row (
Table 1).
For each sampling design (D1–D9) and each maize plant characteristic (plant height, fresh weight, dry weight, and ear weight), the following statistics were computed: minimum (min) and maximum (max) values, mean values, standard deviations, standard errors, and coefficient of variation (CV) values, while 95% confidence intervals were calculated for the means, except for D0. Comparisons between sampling designs D1–D9 and D0 were based on: (a) the relative difference (%) between their mean values and the corresponding average values of the target population (D0); (b) the corresponding CV values, and (c) whether the corresponding 95% confidence intervals clearly include the mean value of D0. Statistical analysis was conducted using IBM SPSS Statistics version 23.0 software. The purpose of the study was to determine the most appropriate sampling method for each of the main maize crop metrics. Our evaluation of the sampling methods was based on a three-stage assessment: (a) the first stage involved determining whether the confidence intervals in each method included the corresponding mean of the target population (D0); (b) the second stage determined which of the methods identified in the first level had the smaller coefficient of variation (CV) value, indicating lower variability; and (c) the third stage identified which of the methods specified at the previous stage had a lower distance (difference %) from the corresponding mean of the target population (D0).
3. Results
Figure 2 provides a graphical representation of the nine sampling methods (D1–D9) employed across the three plots. The considered sampling designs included: (a) CS: Cluster Sampling with Systematic Constraints (in D1, D2); (b) SS: Stratified Sampling with Systematic Constraints (in D3); (c) SR: Stratified Random Sampling (in D5, D6); (d) MS: Multi-stage Sampling with Systematic Constraints; (e) SR: Simple Random Sampling (in D7, D8); (f) RC: Cluster Random Sampling (in D9).
The comparison results of the nine sampling design methods, concerning the yield components under consideration (height per plant in cm, fresh weight per plant in g, dry weight per plant in g, and total ear weight per plant in g) of maize, are presented in
Table 2,
Table 3,
Table 4 and
Table 5. The number of ears (cobs) was not included in the comparison of the sampling designs because, in most cases (roughly 80%), the plants had only one ear.
Based on the information presented in
Table 2 and
Figure 3, it can be inferred that at the first stage of assessment of the sampling designs for efficient estimation of the average maize plant height, all the sampling methods D1–D9 can be considered as “competitive” methods to be employed (marked as *), since their corresponding 95% confidence intervals clearly include the mean value of target population (D0). Moving to the second stage, the D1, D3, D4, D6, and D9 methods exhibit the lower
CV (%) values among those identified in stage 1 (marked as **). Regarding the third stage of assessment, the D1 sampling design (involving the harvest of plants from rows 2, 3, 4, and 5 in each plot using Cluster Sampling with systematic constraints) has a lower distance from the corresponding mean of the target population (D0). Therefore, Cluster Sampling with systematic constraints can be considered as the most appropriate sampling method for measuring plant height.
Based on the information presented in
Table 3 and
Figure 4, it can be inferred that at the first stage of assessment of the sampling designs for efficient estimation of the average maize fresh height, the sampling methods D5, D6, and D9 can be considered as “competitive” methods to be employed (marked as *), since their corresponding 95% confidence intervals clearly include the mean value of the target population (D0). Among them, the D5 and D6 methods exhibit lower
CV (%) values (marked as **). At the third stage of assessment, the D6 sampling design (involving the harvest of 6 random plants from each row in each plot) has a lower distance from the corresponding mean of the target population (D0 design). Therefore, Stratified Random Sampling (D6) can be considered as the most appropriate sampling method for measuring maize fresh height.
Based on the information presented in
Table 4 and
Figure 5, it can be inferred that at the first stage of assessment of the sampling designs for efficient estimation of the average maize dry weight, the sampling methods D1, and D5–D9 can be considered as “competitive” methods to be employed (marked as *) since their corresponding 95% confidence intervals clearly include the mean value of target population (D0). At the second stage of assessment, the D5 and D8 methods exhibit lower
CV (%) values (marked as **). Among them, the D5 sampling design (involving the harvest of 19 random plants from each row in each plot) has a lower distance from the corresponding mean of the target population (D0 design). Therefore, Stratified Random Sampling (D5) can be considered as the most appropriate sampling method for measuring maize dry height.
Based on the information presented in
Table 5 and
Figure 6, it can be inferred that at the first stage of assessment of the sampling designs for efficient estimation of the average maize ear weight, the sampling methods D2, D5, D6, D8, D9 can be considered as “competitive” methods to be employed (marked as *), since their corresponding 95% confidence intervals clearly include the mean value of D0. Among them, the D2, D6, and D8 methods exhibit lower
CV (%) values (marked as **). At the third stage of assessment, the D6 sampling design (involving the harvest of 6 random plants from each row in each plot) has a lower distance from the corresponding mean of the target population (D0). Therefore, Stratified Random Sampling (D6) can be considered as the most appropriate sampling method for measuring maize ear weight.
In
Figure 7 and
Figure 8, the differences (%) in maize plant yield component values from the corresponding mean value of the target population (D0) are depicted. The results suggest that Multistage Sampling shows the highest underestimation of plant characteristics values, Cluster Sampling demonstrates lower but still significant underestimation, while Random Sampling methods yield either underestimated or overestimated values, with Cluster Random Sampling exclusively providing overestimated values. Contrarily, the Stratified Sampling method exhibits a lower difference (%) from the mean value of the target population (D0) for all the plant characteristics under consideration.
Figure 9 summarizes the findings of the current study and indicates which sampling method is more suitable for estimating each maize plant trait. Based on the three-staged assessment proposed ((a) the relative difference (%) between their mean values and the corresponding average values of the target population (D0); (b) the corresponding
CV values, and (c) whether the corresponding 95% confidence intervals clearly include the mean value of the target population (D0)), it can be inferred that, for maize plant height, almost all sampling methods can be employed, but Cluster Sampling appears to be the most appropriate. For fresh, dry, and ear weight, Stratified Sampling seems to be the most suitable.
Based on the coefficient of variation (
CV) values presented in
Table 2, it can be observed that, in general, the height of the maize plants exhibits acceptable variability (
CV < 20%) irrespective of the sampling design. On the other hand, as shown in
Table 3,
Table 4 and
Table 5, the other three crop parameters (fresh weight, dry weight, and total ear weight) exhibit high variability (
CV > 37%) across all sampling designs.
It is also worth noting that the preliminary analyses conducted using the ANOVA method revealed that there were no statistically significant differences among the three experimental plots for each of the sample designs examined in this study and for all the maize crop yield components under consideration.
4. Discussion
Our analysis conducts a comparative study of spatial sampling designs for estimating key maize traits, utilizing crop data from experimental plots. Typically, in experimental plots, all plants are harvested. However, for obtaining a quick and representative estimate of maize morphophysiological and yield traits in the crop field context, the most employed sampling method is the “Simple Random Sampling” method [
28,
29]. In this case, estimates of crop traits are typically derived from either randomly selected plants [
29] or from several randomly chosen sample small plots within the field [
30], often utilizing random coordinates. To the best of our knowledge, there has been no prior assessment in previous studies aimed at determining the most suitable sampling design for estimating maize crop traits in experimental plots.
Based on the results of the present study, Stratified Random Sampling (sampling design D5 with 330 plants and sampling design D6 with 108 plants) provides estimates for the four maize yield components without a significant loss of information or accuracy. The conventional practice of excluding the “guard” rows to mitigate the “margin effect” (sampling design D1 with 266 plants) results in a “reliable” estimate only for plant height. Stratified Random Sampling requires only a quarter of the total number of plant measurements (sampling design D6), making it suitable for cases where crop losses occur. The remaining plants in the experimental plot can be utilized for further analysis of other plant characteristics, especially in cases where some plants should be destroyed for this, or for measurements at different stages of plant growth. The findings of this study strongly support the use of Stratified Random Sampling as an effective method for calculating maize yield estimates, even in the estimation of yields across large maize cultivation areas.
4.1. Estimates of Maize Plant Traits as Metrics for Crops Production
Maize plant height can serve as an indicator of vegetative growth and potential maize yield [
31]. However, potential changes in maize height due to cultivar replacement have been observed, and these changes can vary by region. For instance, Ma and colleagues [
32] observed a slight increase in plant height and a decrease in both ear height and ear ratio among maize cultivars introduced in China from the 1950s to the 2010s.
The use of the commonly employed sampling method, “Simple Random Sampling”, resulted in overestimations (D7) and underestimations (D8) of plant height. Conversely, the “Cluster Sampling with systematic constraints” sampling method (D1) provided an estimate of mean plant height much closer to that of the target population (D0) (harvesting all plants). According to the findings of our study, “Cluster Sampling with systematic constraints” appears to be a suitable approach for effectively estimating maize plant height compared to other sampling methods that involve manual field harvesting. Furthermore, it is important to note that new methods have recently been developed to circumvent the need for manual crop harvesting, weighing, and recording, which can streamline large-scale, long-term measurements that would otherwise be labor-intensive and time-consuming. Moving from the context of experimental plots to large crop areas these methods can be used to estimate above-ground maize biomass using machine-learning approaches with UAV remote-sensing data [
33] or multi-year RADARSAT-2 polarimetric observables [
34]. It is worth noting that remote sensing can be considered the fastest and most cost-effective technical means for monitoring and estimating maize plant height, especially over large areas. However, among the manual harvesting sampling methods for estimating maize plant height in the field, “Cluster Sampling with systematic constraints” is recommended as the most suitable.
Estimates for maize fresh, dry, and ear weight are typically derived from either harvesting all the plants in the experimental plot or from simple random sampling, which involves averaging the measurements of traits from randomly selected plants [
28,
29]. Based on the results of our study, the commonly used method “Simple Random Sampling” exhibited the same pattern for fresh and dry maize weight as observed for plant weight, with either overestimated (D7) or underestimated weights (D8). In contrast, both “Stratified Random Sampling” (D5, D6) provided estimates of mean plant fresh and dry weight that closely matched the D0 sampling design (harvesting all plants) compared to “Simple Random Sampling”. These findings suggest that, rather than harvesting all maize plants in experimental plots, we can adopt a less cost-effective approach, while still obtaining accurate estimates for the crop traits.
4.2. Difficulties of Estimating Crop Metrics Based on Measured Crop Traits in the Field
In experimental plots, maize is typically planted in rows with a specific seeding density, maintaining a set distance between consecutive planting stations. This practice facilitates the implementation of various sampling methods for estimating maize crop traits more effectively. However, in the field, harvest yield estimation, especially in smallholder farms, can be complicated because maize planting often does not adhere to any specific rules regarding plant density or seed rates [
11]. In the case of mixed cropping systems in the fields, estimating plant density can be even more challenging, as it depends on the heterogeneous performance of crops within a given area [
16]. Farmer interviews revealed that yield estimation is typically a prediction made by farmers based on the previous year’s harvest or an expert’s assessment, which may include uncertainties and biases [
35].
Hence, maize yield production is typically estimated roughly at the time of harvest based on the number of bags harvested, considering either the fresh or dry weight of the maize [
11]. Another simple and rapid estimation method is the “test weight technique”, which involves counting the number of ears per planting station in a one-square-meter area (referred to as crop-cuts) and repeating this process 5–7 times within the plot to determine the mean value of the measurements [
11,
35]. Likewise, the number of kernel rows is measured in 20 to 25 randomly selected maize ears, and the mean value is used as a crop metric. In summary, the most used methods for estimating maize yield include crop-cuts, farmer estimates, expert assessments, and, in some cases, whole-plot harvest methods [
30,
35].
Our findings suggest that “Stratified Random Sampling” can provide more accurate estimates of mean plant traits compared to the commonly used “Simple Random Sampling” in experiments. Further research is needed to confirm whether these sampling methods yield similar results in field conditions.
5. Conclusions
The main findings our study suggest that based on a three-staged assessment context ((a) the relative difference (%) between their mean values and the corresponding average values of target population (D0); (b) the corresponding CV values, and (c) whether the corresponding 95% confidence intervals clearly include the mean value of target population (D0)), Stratified Random Sampling appears to be the most appropriate for all the maize crop yield components (plant height, fresh, dry, and ear weight).
These estimates are very close to those obtained when harvesting all plants (D0 sampling design). Particularly, “Stratified Random Sampling” can be considered even more suitable as it requires fewer measurements and can be applied in experiments with several plants that have not emerged. However, it is important to note that the choice of the most appropriate sampling method can vary depending on the objectives, specific requirements, plot size, environmental conditions, and available resources. This study establishes a foundation for developing a more effective and adaptable spatial sampling approach for estimating maize crop characteristics in experimental plots.