1. Introduction
Currently, global food demands entail one of the most challenging problems addressed by society. Indeed, as a consequence of the population growth expectations, the demand for crop production is estimated to increase on the order of 100% in 2050, when compared to 2005 reports [
1]. This scenario forces society to develop agricultural and food systems prone to proactively satisfy such a demand while being capable of minimizing the environmental impact. In this sense, crop phenotyping constitutes a crucial tool in order to achieve this balance.
Indeed, deep knowledge about observable crop trails and the way the genotype of plants expresses in relationship with the environmental factors comprise a relevant and valuable information for farmers [
2]. Within this context, individual plant counting is a key factor, not only regarding to crop phenotyping, but also providing valuable information, supporting farmers when planning breeding strategies and another agricultural tasks. Thus, the plant population determines the crop density, defined as the number of plants per cultivated hectare. This statistic is closely related to different aspects, such as the efficiency of water and fertilizer resources, or pathogen susceptibility [
3]. In addition, it plays a key role when estimating crop yield in tree-based cultivation, and it helps farmers when designing watering and/or fertilization schemes [
4]. The importance of the plant population does not stop here, as it is a significant indicator when applying for public subsidies [
5], pricing plantations [
6], or assessing losses after any kind of extraordinary event, such as fire damage, pest infestations or other natural disasters. However, traditional counting methods are usually based on in-field human visual inspections, so as happens with other phenotyping activities [
7,
8], it implies tedious, time consuming and prone-to-error tasks, especially when it comes to large-scale plantations [
3]. Due to these difficulties, there is a pressing need for the development of new techniques aimed at carrying out plant counting in an accurate, efficient and automated way.
Nowadays, Unmanned Aerial Vehicles (UAVs) have popularised as part of the remote sensing technologies incorporated into precision agriculture, and they have become widely used in crop phenotyping research [
9,
10]. This is mostly due to the advantages they offer over traditional aerial imaging systems already tested within this application, such as those based on manned airplanes or satellites. When compared to them, UAV-based imaging implies lower operational costs, less weather constraints and the possibility of operating under cloudy conditions [
9,
11,
12,
13]. Furthermore, the growth that the market related to UAVs and remote sensing equipment is experiencing nowadays makes this technology increasingly accessible and affordable. Hence, they are definitely promising tools within the scope of smart farming and precision agriculture, with potential uses in crop phenotyping tasks [
9,
14].
In fact, when focusing on plant detection and counting, a considerable amount of research where crop tree identification is realised from UAV-based imagery can be found already. Images acquired are usually processed, generating representative data structures of the study sites which are subsequently analysed in order to detect and count the plants. Hence, Malek et al. [
5] approached palm tree detection, by analysing a set of candidates, previously computed using the scale-invariant feature transform (SIFT), with an extreme learning machine (ELM) classifier. Candidates categorised as trees were post-processed by means of a contour method based on level sets (LS) and local binary patters (LBP), in order to identify the shapes of their crowns. In Miserque-Castillo et al. [
15], a framework for counting oil palms was developed, where a sliding window-based technique procured a set of candidates. After processing with LBP, they were classified by a logistic regression model. Primicerio et al. [
16] studied plant detection within vine rows. The segmentation of the plant mass was carried out on the basis of dynamic segmentation, Hough space clustering and total least squares regression. After individual plant identifications were estimated, a multi-logistic model for the detection of missing plants was applied. Jiang et al. [
17] introduced a GPU-accelerated scale-space filtering methodology for detecting papaya and lemon trees in UAV images. To that end, initial captures were converted to a Lab-based colour space, mostly exploiting the information contained in the channel
a (representative of the colour values from red to green) to differentiate the plants from the ground. Koc-San et al. [
18] undertook citrus trees location and counting from UAV multispectral imagery. To that end, they proposed a set of procedures based on sequential thresholding and the Hough transform. In the same vein, Csillik et al. [
19] focused on citrus crops, intending the identification of trees by using convolutional neural networks (CNNs). In addition, they used a simple linear iterative clustering (SLIC) algorithm for classification refinement. CNNs were also used by Ampatzidis and Partel [
20] in order to detect citrus trees. Specifically, the CNN model was trained by using a YOLOv3 object detection algorithm. Furthermore, they implemented a normalised difference vegetation index (NDVI)-based image segmentation method for estimating the canopy area. In Selim et al. [
21], approached orange tree detection from high-resolution images, by applying an object-based classification methodology, using a multi-resolution segmentation of the data derived from aerial imagery. Deep learning and CNN technology was exploited by Aparna et al. [
4]. In this case, coconut palm tree detection was the aim. Initial captures were transformed into an HSV colour representation, and then binarized and conveniently cropped in sub-images, with which the CNN classifier was trained. In Kestur et al. [
22], an ELM methodology was proposed for detecting tree crowns from aerial images captured in the visible spectrum. Thus, the developed ELM spectral classifier was applied in order to segment the tree crowns-pixel areas from the rest of the image. The methodology was validated by studying banana, mango and coconut palm trees. Marques et al. [
23] focused on the detection of chestnut trees. They considered different kinds of sensorics for acquiring aerial images. Thus, RGB and Colour Infrared (CIR) images were used in their research, where different segmentation techniques were explored in order to properly isolate the tree-belonging pixel-regions to subsequently carry out the eventual identification of the trees.
Regarding olive plantations, which constitute the study case considered throughout the experimentation developed here, several studies where olive tree phenotyping is approached by using UAV-based aerial imagery can be found. Thus, Díaz-Varela et al. [
24] attempted the estimation of the height and crown diameter of olive trees by means of structure-from-motion (SfM) image reconstruction and geographical object-based image analysis (GEOBIA). Along the same line, Torres-Sánchez et al. [
25] also proposed a methodology for the estimation of different olive tree features. Particularly, height, crown volume and canopy area were addressed. This was accomplished by generating digital surface models (DSMs) from aerial imagery, and object-based image analysis (OBIA). This study was extended in [
26], where different flight altitudes and overlapping degrees were tested in order to optimise the DSM generation, in terms of computational cost. In Salamí et al. [
6], olive trees counting was approached by using a UAV equipped with a small embedded computer. This device was aimed at processing captures on board, and to provide via cloud services, nearly real-time plant count estimations to the end-user.
In this paper, a new methodology for the identification of crop trees located in intensive farming-based orchards, by means of the analysis of aerial images, is proposed. To that end, we start from a set of aerial captures acquired by a UAV equipped with a multispectral camera while flying over the land plot under study. These multispectral images are processed in order to yield a DSM, following standard image matching and photogrammetry techniques. The core of the novel proposal of the methodology is comprised by an image analysis-based algorithm, aimed at identifying the trees by exploiting the elevation information contained in this data structure. To that end, the DSM is converted into as a greyscale image, where elevation information is approached as grey level values. Then, this image is transformed by means of mathematical morphology, in order to individually segment the tree-belonging pixels from the ground, by a statistical global thresholding-based binarization. Eventually, that resulting segmentation is analysed by an ad-hoc procedure to detect intra-row tree aggregations, consisting in studying the second central moment of the tree pixel-regions. The whole methodology was tested in an intensive olive orchard, obtaining results that highlight its effectiveness as a full-automated solution for crop trees detection and counting, and its robustness against complex scenarios, as intra-row tree aggregations and a strong ground elevation variability were present in the study plot.
Hereafter, the present manuscript is structured as follows:
Section 2 focuses on the experimental design. Thus,
Section 2.1 describes the characteristics of the olive orchard in which, as study case, images were acquired for the purpose of testing the methodology.
Section 2.2 exposes all the aspects related to how aerial image acquisition was performed. In
Section 2.3, the image analysis methodology for trees detection, counting and geolocation is developed, addressing the stages of image pre-processing (
Section 2.3.1), the generation of a DSM as a base data structure (
Section 2.3.2), and the image segmentation and analysis (
Section 2.3.3 and
Section 2.3.4, respectively). Then, in
Section 2.4, the set of metrics computed to assess the performance of the methodology is proposed.
Section 3 presents the results obtained, which are then discussed in
Section 4.
Section 5 concludes the manuscript, giving a brief summary of the main findings achieved and identifying aspects that might be approached in further investigations. Finally,
Appendix A formally defines all the morphological operators used throughout the developed image analysis methodology.
3. Results
According to the metrics proposed, the results provided by the presented methodology for crop trees detection, location and counting are exposed in
Table 2. As it can be observed, 99.92% of tree proposals were correct, and 99.67% of the actual trees were found.
Regarding the failures detected, and focusing on the false positives (
) reported, each of them can be justified by a different reason. Thus, one of them was caused by a car that was parked very close to the study site. Because of its height, it could not be discarded during image processing, neither filtered when the image was cropped according to the specified region of interest. As a result, a very small residual connected component, corresponding to this vehicle, was inevitably considered when analysing the ultimate binary image. A second false positive resulted from a tree with an anomalously damaged crown, so it was detected by the algorithm as two different plants. Finally, a last false tree detection was obtained when processing a large connected component containing seven aggregated olive trees. Due to the morphology and disposition of the overlapped tree pixel regions, the number of plants contained was overestimated in one unit. The different issues related to the false positives detected during the assessment of the methodology are illustrated in
Figure 14.
With respect to false negatives (
), one of them was detected to come from the absence of information in the DSM, this probably due to not having enough matching points from different captures when reconstructing this part of the image. As a result, the elevation information in those corresponding points, provided by the DSM, was not significant enough to enable the discrimination of this plant (the phenomenon is illustrated in
Figure 15).
In this respect, it should be noted that the density and quality of the 3D point cloud used to generate the DSM, is directly related to the overlapping with which aerial imagery is captured [
25]. As commented in
Section 2.2.2, the image acquisition flight and the multispectral camera setup were planned for the purpose of achieving a forward overlap of 85%. By increasing this overlapping, results could be virtually improved. However, since 99.97% of the trees were properly reconstructed, i.e., 3918 among 3919, it seems plausible to consider the setup proposed for image capture as valid. Being discarded defects in flight and image capture parametrisation, it is difficult to determine the reasons that provoked this issue, but it might be probably related to problems when capturing the aerial images, either due to weather conditions that could occasionally affect the stability of the UAV, or due to problems with the operability of the camera. Meanwhile, the rest of false negative cases detected were related to small trees, most of them in growth stage, which did not reach the minimum height (1 m) to be properly segmented from the background.
4. Discussion
Table 3 compares the results of the methodology presented in this paper to those of the main published works also aimed at automated crop tree detection in orchards. A first aspect to be highlighted is that the present work outperforms the other proposals, despite the fact it was tested on a considerably greater plant population when compared to most of the reported research. Consequently, this surely included a wider variability in terms of the individual characteristics of the trees, and the way they are disposed throughout the land plot under study. Also, it should be underscored that, contrary to most of those works, this study considered challenging conditions related to overlapping intra-row tree crowns, aspect with a special impact on the accuracy with which plant population can be estimated in intensive orchards.
Thus, focusing on the case of the olive, a crop around which the proposed methodology was validated, counting of trees based on aerial imagery was attempted in Salamí et al. [
6], obtaining a remarkable average precision of 99.84%. Nevertheless, plant detection was approached by using a circular template, imposing the prerequisite of only considering isolated trees, thus preventing that their crowns could appear overlapped in aerial captures. Contrary, the methodology presented here was able to deal with the individual location and counting of 385 trees configuring 293 aggregated connected components. Only in the case shown in
Figure 14, the number of trees contained in such a component was not properly estimated. Moreover, the replicability of the methodology presented in [
6] is questionable, as trees segmentation was attempted by colour discrimination. Indeed, it is very probable that any kind of natural or artificial artefact with similar colour to that of the olive tree canopies, could generate false positives. In this case, the precision of the colour segmentation approach, and consequently of the subsequent trees detection and counting, is compromised. A segmentation also based in pixel reflectance, although not only in the visible bands, followed by OBIA analysis was used in Torres-Sánchez et al. [
25]. Concretely, a multi-resolution segmentation was firstly performed using the DSM and the green and NIR bands, considering colour, shape, smoothness and compactness, for which threshold values were manually adjusted. The manual decision of such key segmentation parameters questions concerns about its replicability in different situations. Furthermore, the approach requires a subsequent OBIA analysis to filter the first segmentation results. Conversely, the methodology described here proposes an analytical solution to the segmentation problem, only making use of the
parameter (equation (7)) in the segmentation step. In addition, this is a comprehensible parameter as it represents the minimum desired height in meters for the trees to be segmented. Then,
is more likely to be seen as a configuration parameter rather than a performance one. On a set of 135 olive trees, the study presented in Torres-Sánchez et al. [
25] yielded sensitivity values ranging from 0.945 to 0.969, not considering the case of overlapping tree crowns. Later, the same main author and others assessed the influence of image overlap in the quality of the resulting DSM [
26]. The methodology described in [
25] was slightly modified and tested on an indeterminate number of trees, corroborating in the best scenario of those tested a sensibility of 0.97 in olive trees counting. The case of overlapping trees was not faced either.
Beyond the olive case, in Malek et al. [
5], an overall precision of 0.9009 when detecting palm trees was achieved. They proposed a method based on training an ELM classifier on a set of key points, potentially representative of the occurrences of the trees, extracted from the initial captures. Csillik et al. [
19] made also use of machine learning, concretely CNNs, for detecting citrus trees. Ampatzidis and Partel [
20] also focused their research on citrus orchards, and also using CNN-based tree location. Despite the fact all these studies reported solid results, it should be noted that these kinds of machine learning solution tend to be strongly linked to the visual features of the crown trees with which they are trained. This fact makes their direct application to different kinds of crops difficult, but it surely implies the generation of new training sets and models. Contrary, the method proposed in this paper comprises an analytical solution, based on the morphological analysis and characterisation of the general features of trees within the frame of an intensive cultivation, thus not being linked to a concrete type of crop. In Selim et al. [
21], it was proposed a method for detecting orange trees from aerial imagery. The problem was undertaken in this case by means of object-based image analysis, correctly detecting 87 out of the 105 trees visible in the orthomosaic of the study case. Nevertheless, as with other researches previously referenced, difficulties were reported when dealing with overlapping tree crowns. In Kestur et al. [
22], tree detection was faced on the basis of ELM- spectral and spatial classification. Despite promising results were reported about identifying trees belonging to different crops, it was not clearly specified how the training set was generated, hindering the replicability of the methodology. Marques et al. [
23] proposed an effective method for detecting chestnut trees, clustering the plants by exploiting elevation data and vegetation indices (VI) information. About this latter, it should be noted that VI-based segmentations are strongly dependent of the spectral reflectivity features of the vegetation cover present in the study sites. Indeed, depending of its nature, the coverage may be confused with the plants aimed to be identified, thus potentially increasing the number of false positives yielded. This phenomenon may affect the generality of the proposed solution.
5. Conclusions
This investigation was undertaken in order to design and evaluate a framework for the automated identification, geolocation and counting of crop trees in intensive cultivation areas by means of UAV-based aerial imagery, multispectral sensing and image analysis techniques. The results reported support the viability of the methodology proposed as a valuable tool in phenotyping tasks, within the scope of the precision agriculture.
After testing in an olive orchard with 3919 trees, 99.67% of the plants were rightly identified, outperforming the results given by previous published work. Indeed, the algorithm designed for segmenting and analysing the data structure obtained from aerial captures, based on morphological image processing principles and the statistical analysis of the moments of tree-corresponding pixel artefacts, showed a remarkable performance in terms of tree discrimination, achieving very high detection rates. In addition, the solution also showed to be solid when dealing with multiple intra-row overlapping tree crowns. These findings should also be framed within the context of the complexity of the considered scenario, since the study plot was outstandingly larger than those used in most of previous studies, and it presented a remarkable variability in terms of soil composition, elevation and amount of weed.
Future work will test the application of the presented methodology to other types of orchards. In addition, it would be interesting to assess the performance of the algorithms when dealing with different plant spacing patterns, all of this for the sake of increasing confidence in the generality of the proposed solution.