Under-Canopy Drone 3D Surveys for Wild Fruit Hotspot Mapping

Trybała, Paweł; Morelli, Luca; Remondino, Fabio; Farrand, Levi; Couceiro, Micael S.

doi:10.3390/drones8100577

Open AccessArticle

Under-Canopy Drone 3D Surveys for Wild Fruit Hotspot Mapping

by

Paweł Trybała

^1,*

,

Luca Morelli

¹

,

Fabio Remondino

¹

,

Levi Farrand

² and

Micael S. Couceiro

³

¹

3D Optical Metrology (3DOM) Unit, Bruno Kessler Foundation (FBK), 38-123 Trento, Italy

²

Deep Forestry AB, 754-54 Uppsala, Sweden

³

Ingeniarius, Ltd., 4445-147 Alfena, Portugal

^*

Author to whom correspondence should be addressed.

Drones 2024, 8(10), 577; https://doi.org/10.3390/drones8100577

Submission received: 22 August 2024 / Revised: 19 September 2024 / Accepted: 1 October 2024 / Published: 12 October 2024

(This article belongs to the Section Drones in Agriculture and Forestry)

Download

Browse Figures

Versions Notes

Abstract

:

Advances in mobile robotics and AI have significantly expanded their application across various domains and challenging conditions. In the past, this has been limited to safe, controlled, and highly structured settings, where simplifying assumptions and conditions allowed for the effective resolution of perception-based tasks. Today, however, robotics and AI are moving into the wild, where human–robot collaboration and robust operation are essential. One of the most demanding scenarios involves deploying autonomous drones in GNSS-denied environments, such as dense forests. Despite the challenges, the potential to exploit natural resources in these settings underscores the importance of developing technologies that can operate in such conditions. In this study, we present a methodology that addresses the unique challenges of natural forest environments by integrating positioning methods, leveraging cameras, LiDARs, GNSS, and vision AI with drone technology for under-canopy wild berry mapping. To ensure practical utility for fruit harvesters, we generate intuitive heat maps of berry locations and provide users with a mobile app that supports interactive map visualization, real-time positioning, and path planning assistance. Our approach, tested in a Scandinavian forest, refines the identification of high-yield wild fruit locations using V-SLAM, demonstrating the feasibility and effectiveness of autonomous drones in these demanding applications.

Keywords:

forestry; 3D mapping; SfM; SLAM; deep learning; mobile app; point cloud segmentation

1. Introduction

Northern European forests are rich in wild fruits, particularly various types of berries such as blueberries, lingonberries, and cloudberries (Figure 1). Not only are these berries a crucial and unique part of the local ecosystem, providing food for wildlife, but they also hold significant economic value. Despite the abundance of these natural resources, it is estimated that less than 10% of the total annual wild berry yield is harvested. This under-utilization is a result of several interrelated factors that impede efficient berry collection.

The primary challenge in harvesting these wild berries lies in the manual nature of the process. Berry searching and picking in natural forests is labor intensive and physically demanding, often requiring workers to navigate through uneven and challenging terrain for several hours. The thick underbrush, variable ground conditions, and the need to avoid damaging vegetation add to the complexity of the task. The work is further complicated by the short harvesting seasons, typically spanning only a few weeks, which creates a narrow window of opportunity to collect the berries at their peak ripeness.

Due to the labor-intensive nature of berry picking and the limited availability of local labor, the majority of this work is performed by seasonal foreign workers. These workers, often unfamiliar with the local language, culture, and forest practices, face additional challenges. Their lack of knowledge about the local flora and fauna, combined with the physical demands of the job, can lead to decreased productivity and increased safety risks. Navigating the dense forest areas without a comprehensive understanding of the landscape can result in accidents and injuries, and lower the efficiency of the harvest.

Moreover, the traditional methods of harvesting have remained largely unchanged for decades, relying heavily on manual labor with minimal technological intervention. Currently, there is little technological innovation in this field, and this dependence on outdated techniques contributes to the low efficiency and productivity of the berry harvesting process. As a result, significant economic opportunities are lost each year, with the unharvested berries left to decompose in the forest. This represents not only a direct economic loss but also a missed opportunity for value-added products such as jams, juices, and dietary supplements, which could benefit local businesses and communities. The berries are also a source of various bioactive compounds with health benefits, including antioxidants, vitamins, and polyphenols, which are in high demand in the nutraceutical and pharmaceutical industries. Thus, improving the efficiency of berry harvesting could open up new revenue streams and support the development of local industries focused on health and wellness products.

Environmental sustainability is another critical aspect of wild berry harvesting. Efficient and responsible harvesting practices can help maintain the ecological balance of the forests. Over-harvesting or damaging the vegetation can disrupt the habitat of numerous species and affect biodiversity. Therefore, there is a need for sustainable methods that ensure the long-term availability of these resources while minimizing the environmental impact. A possibility for estimating and monitoring the total yield in the area could significantly contribute to the latter, allowing one to avoid over-harvesting some areas and balancing the harvest in the region.

The integration of modern technologies, such as drones, or Unmanned Aerial Vehicles (UAVs) [1] as well as Artificial Intelligence (AI), offers a promising solution to these challenges. By introducing even partial physical aid in the berry picking process and providing precise mapping and navigation aids, these technologies can significantly enhance the efficiency and safety of harvesting operations. The increased use of UAVs across various fields demonstrates their versatility and effectiveness. As UAVs were originally used predominantly in outdoor environments with good availability of Global Navigation Satellite System (GNSS) signals for tasks such as 3D mapping, agricultural monitoring, construction surveying, and environmental conservation [2,3,4,5], the scope of UAV applications is rapidly expanding. Researchers are now extending UAV operations to scenarios requiring more sophisticated navigation solutions, such as indoor environments, underground sites, forests, and other GNSS-denied spaces. This progression highlights the growing potential for UAVs to revolutionize industries by providing innovative solutions to previously intractable problems.

The European Union’s FEROX project (https://ferox.fbk.eu; accessed on 8 October 2024) aims to leverage these technological advancements, together with AI methods, to improve the traditional berry harvesting process. The project seeks to (i) increase the wild fruit yield, (ii) reduce the physical strain on workers, and (iii) improve pickers’ well-being. The ultimate goal is to optimize the entire harvesting workflow, improving the working conditions of the berry pickers, increasing the efficiency and productivity of the harvesting process, and contributing to the economic development of local communities. One of the main means to achieve these targets is through providing detailed mobile maps for pickers in the field. Such support will not only inform them of the estimated wild berry spatial distribution but also act as a base navigational support for the pickers, walking through (often unknown and unexplored by them) remote forest areas. Finally, FEROX focuses on developing user-friendly tools and interfaces that can be easily adopted by seasonal workers, ensuring that the benefits of these technologies are accessible to all.

The goal of this study was to demonstrate and test several crucial components of the FEROX UAV-based methodology to perform berry mapping missions in a complex environment (a Finnish forest). The overall workflow of these components is illustrated in Figure 2. In particular, this research examined the suitability of the selected UAV platforms and SLAM-based data processing pipeline to perform the following:

(i) Sensor pose estimation for reliable data georeferencing in GNSS-degraded conditions (Section 4.1);

(ii) Forest 3D mapping for improving the knowledge of workers about forest geography (Section 4.2) and generating contents to support berry picking operations (Section 4.3).

Other FEROX tasks, such as AI-based berry detection or human–robot collaboration, are discussed in detail in separate studies [6,7,8,9].

Figure 2. Geodata processing workflow utilized in this study. The AI-based berry detection methodology is detailed in the work by Riz et al. [9].

2. Related Works

The agriculture and food manufacturing sectors are prime candidates for the digital transformation and integration of technological innovations into their standard operations. These industries benefit from a broad and consistent market, driven by the natural and continuous demand for food. The well-understood biological mechanisms underlying crop growth and the relatively predictable nature of agricultural cycles further support the integration of advanced technologies. These factors, combined with the need to reduce production costs and accelerate time-to-market, make automation highly attractive and viable for large-scale industrial adoption.

Indoor food production and processing facilities have long been at the forefront of automation due to their controlled environments, which allow for the precise management of variables such as temperature, humidity, and light [10]. This level of control has enabled the development of highly automated systems that ensure consistent and efficient production. However, replicating this level of technological maturity in outdoor agricultural settings presents a more complex challenge. The variability in outdoor conditions, including weather fluctuations, uneven terrain, and unpredictable biological factors, complicates the development of long-term, reliable automation solutions that can operate effectively in the field [11].

Despite these challenges, recent years have witnessed a growing number of applications of mobile robotics in outdoor agriculture [12]. Advances in sensor technology, machine learning, and robotics have paved the way for innovative solutions that address the complexities of outdoor farming environments. These technologies are beginning to demonstrate their potential to improve efficiency, reduce labor costs, and improve the sustainability of agricultural practices [13].

Wheeled vehicles have been the dominant robotic platform in outdoor agricultural applications [14]. However, the use of UAVs in the last decade has significantly expanded across numerous industries, including construction, urban planning, cultural heritage preservation, animal monitoring, mining, and geology [15,16,17,18]. Drones have been applied to solve numerous agricultural problems, including calculating plant phenotyping traits [19,20,21,22,23], yield estimation [24,25,26], and fruit localization and picking [27,28].

While UAVs have achieved a high level of autonomy in open areas with reliable GNSS-RTK (Real-Time Kinematic) positioning, navigating within complex environments—such as the forest canopy—continues to be a challenging task. Mapping and navigation under forest canopy have primarily relied on LiDAR sensors. Hyyppä et al. [29] presented an approach of using a commercial LiDAR-based simultaneous localization and mapping (SLAM) methods to extract valuable forest inventory data. This study was further expanded by Wang et al. [30], who incorporated a survey-grade terrestrial laser scanner into the methodology. Additionally, Tian et al. [31] utilized LiDAR sensors for collaborative SLAM, focusing on UAV fleet navigation in search-and-rescue operations. An efficient method of exploring forests area using a LiDAR-equipped quadrotor was presented by Yao and Liang [32]. A similar setup was used in the research of Liang et al. [33], who performed UAV-based LiDAR measurements to estimate tree inventory parameters. The authors concluded that for complex forest stands, the trajectory estimation and mapping results from LiDAR SLAM drop significantly in quality.

Besides LiDAR, visual sensors have also been investigated for under-canopy forest mapping or for flying UAVs in GNSS-degraded conditions. Processing is normally based on conventional Structure-from-Motion (SfM)/Photogrammetry or Visual SLAM (V-SLAM) methods [34,35]. Morelli et al. [36] proposed an extension of COLMAP [37] for real-time feature-based V-SLAM in complex environments. Krisanski et al. [38] conducted a study measuring tree diameters based on imagery captured by a small UAV. Similarly, Zhang et al. [39] examined the potential of 3D reconstruction from drone imagery for biomass measurements. Both studies used manually flown drones and processed the data with Agisoft Metashape [40]. Karjalainen et al. [41] carried out pioneering tests with an in-house-built vision-based autonomous drone system in a Finnish forest. The accuracy of the tree-related statistics, calculated from photogrammetric 3D reconstruction, was relatively high compared to a LiDAR SLAM reference. However, one of the eight test flights resulted in a drone crash, underlining the difficulty of achieving safe autonomous UAV operation in such conditions.

The increasing accessibility and availability of open source AI-based methods have significantly expanded the capabilities for extracting meaningful information from images. These methods, including object detection, semantic segmentation, and instance segmentation, have become essential tools in various fields. A particularly notable trend in this area is the emergence of foundation models, i.e., large, pre-trained models that have been trained on vast datasets. These models, which include popular frameworks like YOLO [42], Segment Anything Model (SAM) [43], CLIP [44], and DINO [45], offer powerful capabilities for image understanding, enabling them to be applied to new tasks with a high degree of success.

When task-specific performance improvements are needed, these foundation models can be fine-tuned using transfer learning, a process that adapts the pre-trained model to a new, more narrowly defined task. This approach drastically reduces the effort and the amount of data required compared to building a model from scratch. Fine-tuning leverages the extensive knowledge embedded in the foundation model, allowing faster deployment and often yielding satisfactory results with limited data [46]. This has made it possible to address complex image analysis problems in various applications with significantly reduced development time and resources. Numerous agricultural applications have been presented in the literature. Tian et al. [47] modified YOLOv3 to detect apples in an orchard at different stages of growth. Li et al. [48] adapted DINO to identify various plant diseases based on the images of their leaves. Balasundaram et al. [49] combined Grounding-DINO with SAM for plant health classification. Vision models have also been employed to estimate the novel vegetation index, an alternative to the well-established Normalized Difference Vegetation Index (NDVI), which does not require the acquisition of near-infrared imagery. Feuer et al. [50] used DETIC [51], a CLIP-derived detector, for zero-shot insect detection, which could find application in pest monitoring. Junos et al. [52] proposed modifying YOLO to detect fresh fruit bunches of oil palms with an automatic harvesting system. A review containing more applications of large vision models (LVMs) in this field can be found in [53].

In the context of outdoor agricultural and robotics applications, these advancements are particularly impactful. They enable more efficient and effective analysis of images captured by drones and other remote sensing techniques, facilitating a deeper understanding of the environments being studied. The integration of advanced computer vision techniques with UAV-based data collection significantly enhances our ability to address complex environmental challenges, leading to more informed decision making and improved operational efficiency in these fields.

3. Platforms and Data

3.1. UAV Platforms

In this study, we evaluated two distinct UAV platforms: a hexarotor developed by Ingeniarius and a quadrotor developed by Deep Forestry. Both platforms were designed to operate in environments with poor GNSS signal quality, such as dense forested areas, and are equipped with software compatible with the Robot Operating System (ROS) [54].

The hexarotor (Figure 3a) is a configurable solution focusing on visual data acquisition. It is equipped with two ZED X stereo cameras (nadir- and front-facing), an RTK-enabled Emlid Reach GNSS receiver, an inertial measurement unit (IMU), and a Pixhawk flight controller. Due to the high volume of image data generated, frames were captured at a reduced framerate—0.5 Hz for nadir images and 15 Hz for frontal stereo imagery—to ensure manageable data throughput and storage. As the platform is still in development, with its autonomous capabilities not yet fully reliable in such complex environments, the drone was manually controlled during data collection flights. A flight speed of approximately 1 m/s was maintained as the best balance to maximize both flight duration and data accuracy.

While in this study we did not account for the real-time processing of the proposed methods, this is an envisaged natural future direction. Therefore, an NVIDIA Jetson Orin AGX (32 GB) running ROS was employed to orchestrate sensor operations and data logging and meet the high computational demands posed by real-time image data processing for under-canopy operations. While the CPU handles general-purpose tasks, such as running the ROS middleware and basic navigation processes, the GPU’s 200 TOPS computational capacity is expected to be leveraged for advanced real-time AI-based tasks. The primary objective of deploying this platform, equipped with vision-focused sensors, is to capture nadir images for AI-based wild berry detection, together with a combination of V-SLAM and GNSS technologies for precise trajectory estimation, to finally create a georeferenced orthomosaic. This orthomosaic allows for high-precision geographic localization of detected berries, significantly enhancing the accuracy and utility of the data compared to relying solely on weak GNSS positioning.

The quadrotor (Figure 3b) represents a robust solution for 3D spatial data acquisition in forested environments. It is outfitted with a multiline 3D LiDAR, an IMU, a GNSS receiver, and an onboard processing unit capable of executing proprietary Deep Forestry SLAM and navigation algorithms in real time. This configuration enables the quadrotor to generate coregistered 3D point clouds on the fly, which are used both for creating detailed spatial representations of the environment and for guiding the UAV safely and efficiently through the survey area. The quadrotor has demonstrated its capabilities to produce accurate 3D point clouds and forest inventory data in previous experimental studies [55,56]. In this research, the quadrotor was utilized as a black-box platform for the autonomous acquisition of dense 3D data within the study area, providing crucial spatial information to support and enhance berry picker operations.

In this study, the diverse datasets collected from the hexarotor and quadrotor platforms were integrated to create comprehensive, georeferenced semantic layers that support efficient and informed wild berry harvesting (Section 4.3). High-resolution imagery acquired by the hexarotor enable the precise identification and localization of various berry species across the surveyed area, while the quadrotor’s dense LiDAR-derived 3D point clouds are utilized to construct accurate and high-resolution digital elevation models (DEMs) and extract critical terrain features, including spatial tree distribution patterns. These distinct data products are combined, resulting in a multilayered map that encapsulates essential environmental and resource information. This enriched spatial dataset can then be deployed through a user-friendly mobile application designed for berry pickers, providing real-time access to berry abundance hotspots, optimal navigation routes, and terrain conditions, thereby enhancing harvest efficiency, safety, and yield.

3.2. Test Area

Data collection was carried out during the FEROX fieldwork in June 2023 in Ilomantsi, located in eastern Finland, in North Karelia (Figure 4). This sparsely populated region is characterized by vast expanses of forest and rich biodiversity, making it an ideal location for wild berry habitats. The forests cover approximately 82% of the region’s area and consist of a mix of coniferous and deciduous trees [57]. These characteristics create a challenging yet suitable environment for testing advanced drone technologies for under-canopy berry mapping. The relative sparsity of the forests, compared to the denser natural forests of Central Europe, reduces the risk associated with autonomous drone operations. This enables the utilization of off-the-shelf UAVs for autonomous operations in partially GNSS-denied environment, including the solutions adopted in this study. Consequently, this study was able to focus more intensively on data post-processing and visualization, critical aspects for the advancement of under-canopy mapping technologies.

The geographical location of Ilomantsi creates unique environmental conditions that influence the growth and distribution of wild berries. The climate of the area is typically characterized by short, warm summers and long, cold winters, which affect the phenology and yield of wild berries [58]. During summer, extended daylight hours and relatively high humidity create favorable conditions for berry ripening. However, these same conditions also pose challenges for data collection, such as the need to manage varying light levels and potential interference from dynamic foliage.

3.3. Data Acquisition

In this research, two datasets were collected using the UAV platforms described in Section 3.1. The data acquisition took place during a single test session conducted in July 2023, within a forested area in Ilomantsi.

The hexarotor, focusing on visual data acquisition, collected a total of 5644 images using its front-facing stereo camera system (see an example in Figure 5) and an additional 632 frames from the nadir-facing ZED X camera rig. All images were captured at a resolution of 1920 × 1200 pixels. The synchronization of the image data with GNSS positions was achieved using ROS timestamps, which provided temporal alignment across the different data streams. The hexarotor’s flight path, as recorded by its RTK-enabled GNSS receiver, comprised 815 unique global positions, offering comprehensive geospatial coverage of the 0.4 ha test area.

The other under-canopy drone system, provided by Deep Forestry, was used in the autonomous mode to perform a simple 3D forest mapping task. The UAV flight covered an area of approximately 40 × 100 m. The processing of the LiDAR data using the Deep Forestry SLAM method resulted in a final 3D point cloud containing ca 11 milion points, corresponding to a scanning density of 2860 points/m². The visualization of this raw input 3D data for further processing is presented in Figure 6. A summary of the collected data is presented in Table 1, detailing the type, size, and purpose of each dataset.

4. Methodology

4.1. Stereo Image Sequence Processing

For years, V-SLAM techniques have been a core technology for advanced mobile robotic applications, particularly in enabling real-time perception and navigation in dynamic environments. Numerous V-SLAM solutions have been developed for processing sequences of images captured by monocular or stereo cameras, often utilizing optional supplementary sensors, such as IMU. These approaches have been widely adopted, with many being available as open source packages, contributing to the robotics community.

Since the hexarotor used in this study was equipped with both a stereo camera and an IMU, two state-of-the-art V-SLAM frameworks, namely RTAB-Map [59] and ORB-SLAM3 [60], were also considered to be used for UAV pose estimation. Unfortunately, both methods failed to maintain feature tracking for any substantial part of the stereo sequence. Consequently, these frameworks were excluded from further analysis.

Another challenge arises from the need to maintain a low flight altitude for the UAV, typically between 2 and 3 m, to effectively detect small objects in nadir images. Additionally, the frame rate of the camera must be limited due to the high data rate required for capturing high-resolution images. Combined, these factors make it impossible to directly align nadir images for trajectory estimation because of their low overlap and the lack of distinct and stable features on the forest floor.

To overcome these challenges, a dedicated SfM workflow was developed, as illustrated in Figure 7. This workflow begins with the application of the Deep Image Matching (DIM) tool [61] to extract local features with state-of-the-art approaches, such as SIFT [62], Superpoint [63], and DISK [64]. SIFT is a blob detector, where the detection of keypoints involves the identification of local minima and maxima in both scale and space on a difference-of-Gaussian pyramid. The SIFT descriptor is a 128-dimensional vector that encodes the gradient distribution within the patch surrounding each keypoint. SuperPoint and DISK are both end-to-end DL-based local features, trained jointly for both the detection and the description steps on wide datasets, challenging both in terms of illumination changes and critical viewing angles. SuperPoint is a corner detector composed of an encoder and two different decoders used to independently estimate a score map of where keypoints are detected and a dense matrix of descriptors. SuperPoint was initially trained on a synthetic dataset composed of elementary geometric shapes, with a final fine-tuning performed on real images. DISK feature extraction is based on U-Net [65] and introduced reinforcement learning in the training of DL local features. It usually finds more tie points with respect to SIFT and SuperPoint. While the descriptor size of DISK is 128, same as SIFT, SuperPoint shows a wider range with a length of 256 values.

For feature matching, Lightglue [66] is utilized for deep learning-based extractors. It is a deep neural network trained to be fast and reliable in finding corresponding points between image pairs and is specifically trained on SuperPoint and DISK features, so that each one has its own weights for the best performance. A nearest-neighbor matching strategy is employed for the SIFT features [67]. To optimize processing time, custom pairs for stereo matching are selected with predefined overlap, leveraging the known sequence of image acquisition. These matched features are then input into COLMAP [37] for incremental SfM, which additionally utilizes a constraint of the known rigid relative pose between the stereo cameras in periodic bundle adjustments (BAs). Although the initial relative poses of the stereo cameras and intrinsic camera parameters are assumed to be known a priori with high accuracy, they can be refined during the bundle adjustment to improve the final results.

Following image orientation, the photogrammetric outputs are georeferenced by calculating a Helmert transformation between the local reference system and the global geographical reference system provided by onboard GNSS observations. This process involves custom scripting and the use of COLMAP [68]. TGNSS positions, thresholded using self-reported accuracy to include only reasonably accurate points, are used as a reference. Since the GNSS data and images are not synchronized, the nearest-neighbor matching of timestamps is performed to establish correspondences. A known offset between sensors is applied to shift the GNSS positions from the receiver’s coordinate frame to the camera’s coordinate frame.

The global trajectory derived from the front-facing camera rig is then converted into a format compatible with the evo library [69] for further quality analysis. Simultaneously, this trajectory is transformed into the reference frame of the nadir-facing camera rig, which is then processed in Agisoft Metashape [40]. Here, the initial nadir camera poses are refined. Subsequently, depth maps are estimated to generate a digital elevation model, finally leading to the creation of a georeferenced orthomosaic. The orthomosaic serves as the ultimate product of the workflow, providing a high-resolution, georeferenced image for identifying and mapping wild berry hotspots with high precision.

4.2. LiDAR Data Processing

In our application, the analysis of forestry terrain relies heavily on the dense 3D reconstructions generated by the Deep Forestry LiDAR SLAM system instead of the sparse image-based point cloud due to its higher completeness. The input data for later post-processing are already delivered as a single coregistered point cloud, obtained from the quadrotor system. The process described below aims to derive information important for humans traversing the forest to aid them in navigating and understanding the forest’s topography and terrain morphology.

The LiDAR data processing workflow is structured into several key steps. The first one involves filtering out noise from the point cloud. The noise may originate from various sources, including the presence of dynamic objects, such as people, or inaccuracies caused by the laser beam scattering off due to thin obstacles, like branches or leaves. To address this issue, well-established methods, such as statistical outlier removal (SOR) [70], are employed.

Following noise removal, the point cloud segmentation process begins. Initially, the points on the ground are extracted with a Cloth Simulation Filter (CSF) [71]. However, this process requires proper parameter tuning to balance the trade-off between avoiding the incorrect inclusion of connected objects, such as tree trunks, and achieving a high degree of completeness in ground reconstruction. Thus, to refine these results, a mesh surface of the ground is reconstructed, using a constrained Delaunay triangulation [72,73]. The triangulation is performed on the elevation values resampled and averaged on a regular grid. In this way, the impact of the initially incorrectly classified points on the final ground surface is reduced.

Once the ground surface has been accurately reconstructed, the original point cloud is divided into subsets: one representing points belonging to the ground and the other representing objects above it, i.e., trees and high bushes. For the latter set, a graph-based approach, Treeiso [74], is employed to identify and extract individual tree instances. In the context of this application, only the forest density is considered, rather than the detailed features of individual trees. Consequently, the point sets corresponding to each tree are reduced to their centroids in the horizontal plane, providing an approximate position for each tree. Finally, these tree centroids are used to generate a planar tree density map using a two-dimensional Gaussian kernel density estimation (KDE) [75]. This density map offers a clear and intuitive visual representation of the tree distribution in the surveyed area. This information can help pickers assess the relative difficulty of traversing various parts of the forest, enabling better path planning and navigation.

4.3. Data Visualization: The Picker App

The Picker App is a cornerstone in our efforts to revolutionize berry picking operations through technological integration. Developed with the Unity3D game engine, this app was crafted to deliver a seamless user experience, broad accessibility, and an intuitive interface that aligns with contemporary UI design principles. The use of Unity3D guarantees robust performance across different devices and operating systems, facilitating easy updates and system maintenance.

Central to the app’s functionality is its user-friendly interface, which simplifies access to advanced features. It allows pickers to view their real-time location and optimize routes—capabilities inspired by conventional mapping platforms, such as Google Maps. What sets the Picker App apart is its ability to dynamically adjust routes based on real-time berry yield data and terrain conditions, ensuring that pickers can maximize productivity by following the most efficient paths through varying forest landscapes.

Nonetheless, for these features to be usable, one standout feature is key: the multi-layered mapping interface. This core innovation integrates and visualizes complex datasets generated from the stereo camera and LiDAR-based methodologies described in Section 4.1 and Section 4.2, respectively. This interface displays updated, georeferenced information on berry locations and other critical terrain attributes, such as elevation and forest density. By toggling between different informational layers, pickers can not only track berry distributions but also view the positions of other pickers in real time, fostering collaborative and efficient harvesting strategies.

The backend integration of the Picker App employs Flask for robust API development and ROS for drone coordination, ensuring seamless synchronization between user interactions and system responses. This sophisticated architecture allows the app to seamlessly integrate data from drone surveys into the interface, providing pickers with immediate updates as new data become available. Additionally, the Picker App includes an interactive yield-mapping capability that allows pickers to directly modify berry yield maps based on their field observations. By adding or removing data points, pickers contribute to the accuracy and currency of the maps, enhancing the overall utility of the collected data without constantly relying on the drones’ availability. This crowdsourcing feature not only improves the precision of the maps but also empowers pickers to actively participate in refining the collective data resources, which in turn supports more strategic and informed decision making in the field.

The integration of data from both the stereo image sequence processing and the point cloud data processing into the Picker App translates complex environmental data into actionable insights for pickers. By merging these data streams, the app provides a comprehensive overview of the terrain and berry distributions, enabling pickers to plan and execute their tasks with unprecedented efficiency and precision. This integration highlights the app’s role not just as a navigational tool, but as a critical component of a broader system designed to optimize human–robot interaction and data-driven decision making in berry picking operations.

5. Results

5.1. Photogrammetric 3D Reconstruction

As explained in Section 4.1, the image sequences from the front and nadir stereo camera rigs of the hexarotor were processed separately with the proposed pipeline as other available state-of-the-art V-SLAM frameworks failed. First, the front-facing stereo images are used to estimate the drone’s trajectory. The DIM tool is used to extract and match features from all images using three state-of-the-art approaches: SIFT with nearest-neighbor matching is used as a conventional, baseline method; SuperPoint and DISK are both used with LightGlue as representative deep learning-based image matching pipelines. The pre-trained models are used as-is, without any transfer learning.

The SfM reconstruction is performed in COLMAP. Due to the large dataset size, custom image pairs were defined for image matching candidates, using an overlap of five subsequent stereo pairs. The processing is split in 10 approximately equal-sized subsets, with an overlap of 10 stereo pairs. For each subset, pose estimation and sparse 3D reconstruction are obtained using COLMAP’s incremental SfM pipeline, followed by a global bundle adjustment, forcing the relative pose constrain on the stereo rig (known from the accurate factory calibration of the Zed X system).

The results of all three approaches are shown in Figure 8. In particular, the workflow based on SIFT was unable to maintain tracking throughout the sequence (Figure 8a). However, after passing a difficult part of the dataset (at approximately 25% of the flight time, where the drone rapidly changes orientation inside the forest), it was able to orient the remaining 4162 frames, which comprise almost 74% of the input images. The DISK-based trajectory was reported by COLMAP as fully correctly oriented; however, even the qualitative examination of its shape and the sparse point cloud shows issues in the results (Figure 8b). Although the initial part of the sparse reconstruction, at the forest edge, seems of high quality, inside the forest, the 3D reconstruction became corrupted and produced an unreliable pose estimates. Finally, the photogrammetric process based on SuperPoint (Figure 8c) produced qualitatively good results and oriented all frames, resembling the SIFT-based trajectory where the latter was successfully computed. A more in-depth analysis of the pose estimation results is presented in Section 6.

Given SuperPoint’s complete trajectory, the SuperPoint-based results (Figure 8c) were selected as the source for the rest of the image processing pipeline.

The next step involved georeferencing the previously computer UAV trajectory. To do this, only the highest quality GNSS positions (i.e., with planar positioning accuracy below 1 m) are selected as a global reference. The GNSS timestamps are matched with the front-facing stereo images, obtaining the average discrepancy of 0.09 s. Given the limited quality of global positions and low flight speed, such a low discrepancy value is insignificant for transformation calculations.

Then, the COLMAP command line tools are used to calculate the transformation between the local coordinate system of the recovered UAV trajectory and the global (GNSS) reference system. However, these camera poses refer to the front-facing camera system and need to be converted into the reference frame of the nadir camera. Thus, the georeferenced trajectory of the front-facing stereo camera is transformed to the nadir camera system with custom Python scripting. This procedure required matching image timestamps from both stereo systems, which was carried out following the same approach as with the GNSS positions. This time, the mean error reached 0.06 s.

Since several steps of matching corresponding data with nearest timestamps in the proposed methodology may introduce small errors, the nadir-facing camera poses are finally refined using Agisoft Metashape, with a pose uncertainty of 0.5 m and imposing the stereo constraint provided by the factory calibration of the Zed camera system. Then, a DSM and an orthomosaic of the surveyed area are generated, as shown in Figure 9. The orthomosaic enables us to (i) derive global 3D coordinates of identified berries or (ii) directly use cropped tiles to train a deep learning berry detection model, if a sufficient resolution of the orthomosaic is reached.

5.2. LiDAR Point Cloud Segmentation

The first step of processing the LiDAR data involves point cloud cleaning. After removing the noise with a SOR filter, outliers clearly falling outside of the area of interest and dynamic obstacles were removed (i.e., by a human supervisor following the UAV). The filtered point cloud contained ca 9 million points (~80% of the initial size). After the initial CSF filtration, the pre-segmented ground points were used as input for the meshing procedure. The mean average error (MAE) and the root mean square error (RMSE), calculated based on distances between the initial ground points and the generated mesh, were equal to 10 cm and 13 cm, respectively. These relatively high residuals were caused by elements of the natural forest floor, such as grass and low shrubs. The output ground surface model is visualized in Figure 10.

To obtain the high vegetation point cloud, signed distances from the ground surface were calculated. Points located at least three times further from the ground than the fitting RMSE (i.e., 40 cm) were assigned to the tree class and further analyzed with the Treeiso method. The algorithm detected 258 tree instances, shown in Figure 11. Their planar centroids were used to calculate the 2D KDE of the tree density distribution for the later visualization in the mobile app (Figure 12).

5.3. Mobile App Visualization

The data acquired during the first experimental session in Finland was used to prepare demonstrations of several components of The Picker App. As in this study, the wild fruit detection problem was not tackled directly, for visualization purposes, we used example results (i.e., berry detections in the geolocalized images) from the study in [9]. The heatmap, obtained in the same way as for the tree density (Figure 12), is presented in Figure 13.

The berry density information is included in the map as a set of layers, split by species. OpenStreetMap open data was used as a basemap, including feature categories relevant for navigation, such as roads, forest areas, and water bodies. The approximate position and orientation of the user is determined using smartphone sensors. Simple path planning tools are available for the pickers, allowing the app to calculate and display the shortest or least-effort path to a selected high-berry-concentration region, taking into account the obstacles present in the map and the terrain type (sparse/dense forest, terrain slope, and so on). Figure 14 presents an example screenshot of the main app view in the smartphone emulator. The app is planned to be exhaustively examined in the field by beta testers in future studies.

6. Discussion

Although the tests with the hexarotor, which carried the vision-based positioning system, were performed with the GNSS receiver working in RTK mode, poor under-canopy satellite visibility produced very few good-quality positioning observations. Over 58% of the data exceeded the 10 m planar positioning accuracy threshold, and only 5% of the positions were reported by the receiver to have a horizontal positioning accuracy below 1 m (Figure 15). These metrics further prove that the GNSS cannot be used as a reliable positioning solution for under-canopy UAV flights. Moreover, it poses a challenge to use such data as a reference for other pose estimation methods, as they cannot be taken as error-free ground truth estimates. Thus, for our analyses, we report the differences between trajectories and GNSS positions at different self-reported accuracy thresholds to show the full picture of all employed positioning techniques. In the following plots, all available GNSS positions were used as a reference.

The comparisons were carried out using the evo tool [69], designed for the qualitative and quantitative evaluation of the results of SLAM algorithms. All trajectories were co-aligned using Umeyama’s method [76]. Their projection on a horizontal plane is shown in Figure 16, and all coordinates are plotted separately in time in Figure 17. For the quantitative evaluation, the absolute pose error (APE) metric was calculated for different GNSS accuracy thresholds. The results are summarized in Table 2. Relative pose errors (RPEs) and rotational pose components were not analyzed, since the instability of low-quality GNSS solutions introduced substantial noise to subsequent relative poses in the reference data.

The comparison between GNSS-derived trajectories and those estimated using V-SLAM techniques highlights the clear advantages of image-based trajectory estimation in forested environments. Both SIFT and SuperPoint demonstrate more consistent and stable trajectories, whereas the GNSS positioning suffers from frequent and significant pose jumps (Figure 16). This stability is critical for ensuring the UAV system’s navigational accuracy and reliability in challenging conditions.

Despite the challenges, both SfM solutions using SIFT and SuperPoint maintained good overall alignment with the global positioning trends indicated by the GNSS data. This is evident when examining the X and Y components of the trajectories (Figure 17), where both of them (in the portion of the dataset where orientation was successful for the latter) closely follow the GNSS-derived path. The alignment is particularly pronounced in the Z component, where the GNSS data exhibit large, abrupt jumps of up to 20 m.

However, challenging environments, such as forests, always pose a risk of failure of relative pose estimation techniques, such as V-SLAM. In this study, one of the deep learning-based solutions, namely DISK, diverged after correctly orienting the initial part of the dataset. Despite that, COLMAP kept reporting a successful proceeding of the incremental SfM, which in the case of using it for navigation, could result in a critical failure of the system. Moreover, the comparison with global positioning at the highest precision threshold analyzed showed DISK to be the most accurate method, slightly ahead of SuperPoint and SIFT (Table 2). This can be misleading, since this assessment was performed only on a small subset of the trajectory, and does not reflect the actual global accuracy, where both other feature extractors outperformed it greatly. This underscores the importance of employing rigorous assessment methods to estimate the quality of pose estimation, especially for approaches using deep learning, in mobile robotics applications.

7. Conclusions

This study successfully developed and tested a novel pipeline for processing UAV image and LiDAR data to create georeferenced wild fruit maps from under-canopy surveys. By employing a custom hexacopter equipped with stereo systems and GNSS positioning, we captured an experimental dataset in a Finnish forest during the summer season. Using these data, we demonstrated the processing pipeline capability to handle the complex scenario of forest environments in terms of reliable UAV pose estimation and georeferencing berry locations. The successful implementation of this pipeline in a challenging, real use case highlights its potential for broader application in similar natural environments.

A notable advancement in our methodology is the integration of deep learning-based feature detectors and matchers, which significantly enhanced the quality and reliability of the orientation of all images in the dataset. Traditional methods, such as SIFT, were only partially effective, not reconstructing the entire trajectory. In contrast, the SuperPoint-based workflow achieved a high degree of accuracy, compared to the weak GNSS reference, and reliably aligned all images in the dataset. However, the DISK-based workflow exhibited a significant drop in accuracy during the more complex part of the UAV flight. This discrepancy underscores the importance of developing methods to both improve and assess the reliability of pose estimation methods for UAVs operating in GNSS-denied (or GNSS-degraded) conditions, which will be pursued in our future studies.

An integral component of our approach is the development of a mobile application designed to provide pickers with detailed berry and forest maps. By combining UAV-based geodata with approximate user positioning and large-scale basemaps, the application offers both high-resolution, high-quality information and easily accessible, comprehensive data. This integration ensures that pickers can access accurate, up-to-date maps with minimal effort, enhancing their ability to harvest berry-rich areas efficiently, safely and with reduced physical effort. This aid is particularly beneficial for foreign seasonal workers who may be less familiar with local geography. In future studies, we plan to include tighter integration between the data collected with the drones and open geodatabases, such as upscaling and updating country-scale elevation models with LiDAR survey data.

Moreover, we recognize the importance of validating our workflow in different forest environments to assess its generalizability and robustness. This will involve field tests in varying terrains and climatic conditions, providing a comprehensive evaluation of the methodology’s performance. By addressing these aspects, we aim to refine our approach, ensuring that it can be effectively implemented in diverse forest ecosystems. End-user feedback will also play a vital role in the success of our solutions. Although our technologies are still under active development, the methodology for capturing user feedback has already been developed. This is detailed in a parallel study that outlines a structured approach for gathering user-centered data. In particular, we will gather insights from berry pickers and forestry experts through psychometric surveys, interviews, and in-the-field observations to capture usability and human factors influencing the integration of drones in forest environments [6]. This methodology ensures that, as the technology evolves, it remains aligned with the real-world needs and expectations of its users, enabling future validation under operational conditions.

In conclusion, the developed UAV data processing pipeline represents a significant advancement in the field of autonomous agricultural technologies. The demonstrated procedure for the precise georeferencing of wild fruit locations offers a transformative solution to the challenges of wild berry harvesting. In addition, enabling UAVs to rely on a vision-based system for autonomous navigation instead of LiDAR could potentially significantly reduce the cost of robotics solutions. The integration of AI-based detection models and continuous hardware improvements will further enhance the efficiency and applicability of the system. Our ongoing research and development efforts will focus on ensuring the reliability and scalability of this technology, ultimately contributing to more sustainable and productive forest management practices.

Author Contributions

Conceptualization, F.R., M.S.C. and L.F.; methodology, P.T., L.M. and F.R.; software, L.M. and P.T.; validation, P.T., F.R. and M.S.C.; formal analysis, P.T., L.M., and F.R.; investigation, P.T. and L.M.; resources, F.R., M.S.C. and L.F.; data curation, P.T. and L.F.; writing—original draft preparation, P.T., L.M., F.R. and M.S.C.; writing—review and editing, P.T., F.R., L.M., M.S.C. and L.F.; visualization, P.T.; supervision, F.R. and M.S.C.; project administration, F.R. and M.S.C.; funding acquisition, F.R., L.F. and M.S.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work has been partly funded by the EU FEROX project (https://ferox.fbk.eu/, accessed on 8 October 2024). FEROX has received funding from the European Union’s Horizon Europe Framework Programme under Grant Agreement No. 101070440.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors upon request.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of the data; in the writing of the manuscript; or in the decision to publish the results.

References

Granshaw, S.I. RPV, UAV, UAS, RPAS… or just drone? Photogramm. Rec. 2018, 33, 160–170. [Google Scholar] [CrossRef]
Colomina, I.; Molina, P. Unmanned aerial systems for photogrammetry and remote sensing: A review. ISPRS J. Photogramm. Remote Sens. 2014, 92, 79–97. [Google Scholar] [CrossRef]
Nex, F.; Remondino, F. UAV for 3D mapping applications: A review. Appl. Geomat. 2014, 6, 1–15. [Google Scholar] [CrossRef]
Giordan, D.; Hayakawa, Y.; Nex, F.; Remondino, F.; Tarolli, P. The use of remotely piloted aircraft systems (RPASs) for natural hazards monitoring and management. Nat. Hazards Earth Syst. Sci. 2018, 18, 1079–1096. [Google Scholar] [CrossRef]
Nex, F.; Armenakis, C.; Cramer, M.; Cucci, D.A.; Gerke, M.; Honkavaara, E.; Kukko, A.; Persello, C.; Skaloud, J. UAV in the advent of the twenties: Where we stand and what is next. ISPRS J. Photogramm. Remote Sens. 2022, 184, 215–242. [Google Scholar] [CrossRef]
Fletcher, S.; Oostveen, A.M.; Chippendale, P.; Couceiro, M.S.; Ballester, L.S. Developing unmanned aerial robotics to support wild berry harvesting in Finland: Human factors, standards and ethics. In Proceedings of the 8th International Conference on Robot Ethics and Standards (ICRES 2023), Utrecht, The Netherlands, 17–18 July 2023. [Google Scholar]
Yalçinkaya, B.; Couceiro, M.S.; Soares, S.P.; Valente, A. Human-aware collaborative robots in the wild: Coping with uncertainty in activity recognition. Sensors 2023, 23, 3388. [Google Scholar] [CrossRef]
Yalcinkaya, B.; Couceiro, M.S.; Pina, L.; Soares, S.; Valente, A.; Remondino, F. Towards Enhanced Human Activity Recognition for Real-World Human-Robot Collaboration. In Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA), Yokohama, Japan, 13–17 May 2024; pp. 7909–7915. [Google Scholar] [CrossRef]
Riz, L.; Povoli, S.; Caraffa, A.; Boscaini, D.; Mekhalfi, M.L.; Chippendale, P.; Turtiainen, M.; Partanen, B.; Ballester, L.S.; Noguera, F.B.; et al. Wild Berry image dataset collected in Finnish forests and peatlands using drones. arXiv 2024, arXiv:2405.07550. [Google Scholar] [CrossRef]
Shamshiri, R.; Kalantari, F.; Ting, K.; Thorp, K.R.; Hameed, I.A.; Weltzien, C.; Ahmad, D.; Shad, Z.M. Advances in greenhouse automation and controlled environment agriculture: A transition to plant factories and urban agriculture. Int. J. Agric. Biol. Eng. 2018, 11, 1–22. [Google Scholar] [CrossRef]
Vougioukas, S.G. Agricultural robotics. Annu. Rev. Control. Robot. Auton. Syst. 2019, 2, 365–392. [Google Scholar] [CrossRef]
van Henten, E.J.; Tabb, A.; Billingsley, J.; Popovic, M.; Deng, M.; Reid, J. Agricultural robotics and automation. IEEE Robot. Autom. Mag. 2022, 29, 145–147. [Google Scholar] [CrossRef]
Pearson, S.; Camacho-Villa, T.C.; Valluru, R.; Gaju, O.; Rai, M.C.; Gould, I.; Brewer, S.; Sklar, E. Robotics and autonomous systems for net zero agriculture. Curr. Robot. Rep. 2022, 3, 57–64. [Google Scholar] [CrossRef]
Oliveira, L.F.; Moreira, A.P.; Silva, M.F. Advances in agriculture robotics: A state-of-the-art review and challenges ahead. Robotics 2021, 10, 52. [Google Scholar] [CrossRef]
Agrafiotis, P.; Skarlatos, D.; Georgopoulos, A.; Karantzalos, K. Shallow Water Bathymetry Mapping from UAV Imagery based on Machine Learning. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2019, 48, 9–16. [Google Scholar] [CrossRef]
Ayamga, M.; Akaba, S.; Nyaaba, A.A. Multifaceted applicability of drones: A review. Technol. Forecast. Soc. Chang. 2021, 167, 120677. [Google Scholar] [CrossRef]
Shukla, V.; Morelli, L.; Remondino, F.; Micheli, A.; Tuia, D.; Risse, B. Towards Estimation of 3D Poses and Shapes of Animals from Oblique Drone Imagery. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2024, 48, 379–386. [Google Scholar] [CrossRef]
Trybała, P.; Rigon, S.; Remondino, F.; Banasiewicz, A.; Wróblewski, A.; Macek, A.; Kujawa, P.; Romańczukiewicz, K.; Redondo, C.; Espada, F. Optimizing Mining Ventilation Using 3D Technologies. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2024, 48, 427–434. [Google Scholar] [CrossRef]
Han, L.; Yang, G.; Yang, H.; Xu, B.; Li, Z.; Yang, X. Clustering field-based maize phenotyping of plant-height growth and canopy spectral dynamics using a UAV remote-sensing approach. Front. Plant Sci. 2018, 9, 1638. [Google Scholar] [CrossRef]
Su, W.; Zhang, M.; Bian, D.; Liu, Z.; Huang, J.; Wang, W.; Wu, J.; Guo, H. Phenotyping of corn plants using unmanned aerial vehicle (UAV) images. Remote Sens. 2019, 11, 2021. [Google Scholar] [CrossRef]
Xie, C.; Yang, C. A review on plant high-throughput phenotyping traits using UAV-based sensors. Comput. Electron. Agric. 2020, 178, 105731. [Google Scholar] [CrossRef]
Feng, L.; Chen, S.; Zhang, C.; Zhang, Y.; He, Y. A comprehensive review on recent applications of unmanned aerial vehicle remote sensing with various sensors for high-throughput plant phenotyping. Comput. Electron. Agric. 2021, 182, 106033. [Google Scholar] [CrossRef]
Herrero-Huerta, M.; Gonzalez-Aguilera, D.; Yang, Y. Structural component phenotypic traits from individual maize skeletonization by UAS-based structure-from-motion photogrammetry. Drones 2023, 7, 108. [Google Scholar] [CrossRef]
Johansen, K.; Morton, M.; Malbeteau, Y.; Aragon, B.; Al-Mashharawi, S.; Ziliani, M.; Angel, Y.; Fiene, G.; Negrão, S.; Mousa, M.; et al. Predicting biomass and yield at harvest of salt-stressed tomato plants using UAV imagery. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2019, 42, 407–411. [Google Scholar] [CrossRef]
Yang, Q.; Shi, L.; Han, J.; Zha, Y.; Zhu, P. Deep convolutional neural networks for rice grain yield estimation at the ripening stage using UAV-based remotely sensed images. Field Crop. Res. 2019, 235, 142–153. [Google Scholar] [CrossRef]
Qu, H.; Zheng, C.; Ji, H.; Barai, K.; Zhang, Y.J. A fast and efficient approach to estimate wild blueberry yield using machine learning with drone photography: Flight altitude, sampling method and model effects. Comput. Electron. Agric. 2024, 216, 108543. [Google Scholar] [CrossRef]
Tang, Y.; Chen, M.; Wang, C.; Luo, L.; Li, J.; Lian, G.; Zou, X. Recognition and localization methods for vision-based fruit picking robots: A review. Front. Plant Sci. 2020, 11, 510. [Google Scholar] [CrossRef]
Li, D.; Sun, X.; Elkhouchlaa, H.; Jia, Y.; Yao, Z.; Lin, P.; Li, J.; Lu, H. Fast detection and location of longan fruits using UAV images. Comput. Electron. Agric. 2021, 190, 106465. [Google Scholar] [CrossRef]
Hyyppä, E.; Hyyppä, J.; Hakala, T.; Kukko, A.; Wulder, M.A.; White, J.C.; Pyörälä, J.; Yu, X.; Wang, Y.; Virtanen, J.P.; et al. Under-canopy UAV laser scanning for accurate forest field measurements. ISPRS J. Photogramm. Remote Sens. 2020, 164, 41–60. [Google Scholar] [CrossRef]
Wang, Y.; Kukko, A.; Hyyppä, E.; Hakala, T.; Pyörälä, J.; Lehtomäki, M.; El Issaoui, A.; Yu, X.; Kaartinen, H.; Liang, X.; et al. Seamless integration of above-and under-canopy unmanned aerial vehicle laser scanning for forest investigation. For. Ecosyst. 2021, 8, 10. [Google Scholar] [CrossRef]
Tian, Y.; Liu, K.; Ok, K.; Tran, L.; Allen, D.; Roy, N.; How, J.P. Search and rescue under the forest canopy using multiple UAVs. Int. J. Robot. Res. 2020, 39, 1201–1221. [Google Scholar] [CrossRef]
Yao, H.; Liang, X. Autonomous Exploration Under Canopy for Forest Investigation Using LiDAR and Quadrotor. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5704719. [Google Scholar] [CrossRef]
Liang, X.; Yao, H.; Qi, H.; Wang, X. Forest in situ observations through a fully automated under-canopy unmanned aerial vehicle. Geo-Spat. Inf. Sci. 2024, 27, 983–999. [Google Scholar] [CrossRef]
Gupta, A.; Fernando, X. Simultaneous Localization and Mapping (SLAM) and Data Fusion in Unmanned Aerial Vehicles: Recent Advances and Challenges. Drones 2022, 6, 85. [Google Scholar] [CrossRef]
Zhuang, L.; Zhong, X.; Xu, L.; Tian, C.; Yu, W. Visual SLAM for Unmanned Aerial Vehicles: Localization and Perception. Sensors 2024, 24, 2980. [Google Scholar] [CrossRef]
Morelli, L.; Ioli, F.; Beber, R.; Menna, F.; Remondino, F.; Vitti, A. COLMAP-SLAM: A framework for visual odometry. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2023, 48, 317–324. [Google Scholar] [CrossRef]
Schonberger, J.L.; Frahm, J.M. Structure-from-Motion Revisited. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 4104–4113. [Google Scholar] [CrossRef]
Krisanski, S.; Taskhiri, M.S.; Turner, P. Enhancing methods for under-canopy unmanned aircraft system based photogrammetry in complex forests for tree diameter measurement. Remote Sens. 2020, 12, 1652. [Google Scholar] [CrossRef]
Zhang, Y.; Onda, Y.; Kato, H.; Feng, B.; Gomi, T. Understory biomass measurement in a dense plantation forest based on drone-SfM data by a manual low-flying drone under the canopy. J. Environ. Manag. 2022, 312, 114862. [Google Scholar] [CrossRef]
Agisoft LLC. Agisoft Metashape. Available online: https://www.agisoft.com/ (accessed on 8 October 2024).
Karjalainen, V.; Koivumäki, N.; Hakala, T.; George, A.; Muhojoki, J.; Hyyppa, E.; Suomalainen, J.; Honkavaara, E. Autonomous robotic drone system for mapping forest interiors. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2024, 48, 167–172. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef]
Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.Y.; et al. Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 4015–4026. [Google Scholar] [CrossRef]
Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning transferable visual models from natural language supervision. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 8748–8763. [Google Scholar]
Caron, M.; Touvron, H.; Misra, I.; Jégou, H.; Mairal, J.; Bojanowski, P.; Joulin, A. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 9650–9660. [Google Scholar] [CrossRef]
Zhuang, F.; Qi, Z.; Duan, K.; Xi, D.; Zhu, Y.; Zhu, H.; Xiong, H.; He, Q. A comprehensive survey on transfer learning. Proc. IEEE 2020, 109, 43–76. [Google Scholar] [CrossRef]
Tian, Y.; Yang, G.; Wang, Z.; Wang, H.; Li, E.; Liang, Z. Apple detection during different growth stages in orchards using the improved YOLO-V3 model. Comput. Electron. Agric. 2019, 157, 417–426. [Google Scholar] [CrossRef]
Li, W.; Zhu, L.; Liu, J. PL-DINO: An Improved Transformer-Based Method for Plant Leaf Disease Detection. Agriculture 2024, 14, 691. [Google Scholar] [CrossRef]
Balasundaram, A.; Sharma, A.; Swaathy, K.; Shaik, A.; Kavitha, M.S. An Improved Normalized Difference Vegetation Index (NDVI) Estimation using Grounded Dino and Segment Anything Model for Plant Health Classification. IEEE Access 2024, 12, 75907–75919. [Google Scholar] [CrossRef]
Feuer, B.; Joshi, A.; Cho, M.; Chiranjeevi, S.; Deng, Z.K.; Balu, A.; Singh, A.K.; Sarkar, S.; Merchant, N.; Singh, A.; et al. Zero-shot insect detection via weak language supervision. Plant Phenome J. 2024, 7, e20107. [Google Scholar] [CrossRef]
Zhou, X.; Girdhar, R.; Joulin, A.; Krähenbühl, P.; Misra, I. Detecting twenty-thousand classes using image-level supervision. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2022; pp. 350–368. [Google Scholar] [CrossRef]
Junos, M.H.; Mohd Khairuddin, A.S.; Thannirmalai, S.; Dahari, M. An optimized YOLO-based object detection model for crop harvesting system. IET Image Process. 2021, 15, 2112–2125. [Google Scholar] [CrossRef]
Zhu, H.; Qin, S.; Su, M.; Lin, C.; Li, A.; Gao, J. Harnessing Large Vision and Language Models in Agriculture: A Review. arXiv 2024, arXiv:2407.19679. [Google Scholar] [CrossRef]
Quigley, M.; Conley, K.; Gerkey, B.; Faust, J.; Foote, T.; Leibs, J.; Wheeler, R.; Ng, A.Y. ROS: An open-source Robot Operating System. In Proceedings of the ICRA Workshop on Open Source Software, Kobe, Japan, 12–17 May 2009; Volume 3, p. 5. [Google Scholar]
Muhojoki, J.; Tavi, D.; Hyyppä, E.; Lehtomäki, M.; Faitli, T.; Kaartinen, H.; Kukko, A.; Hakala, T.; Hyyppä, J. Benchmarking Under-and Above-Canopy Laser Scanning Solutions for Deriving Stem Curve and Volume in Easy and Difficult Boreal Forest Conditions. Remote Sens. 2024, 16, 1721. [Google Scholar] [CrossRef]
Muhojoki, J.; Hakala, T.; Kukko, A.; Kaartinen, H.; Hyyppä, J. Comparing positioning accuracy of mobile laser scanning systems under a forest canopy. Sci. Remote Sens. 2024, 9, 100121. [Google Scholar] [CrossRef]
Kilpeläinen, H.; Miina, J.; Store, R.; Salo, K.; Kurttila, M. Evaluation of bilberry and cowberry yield models by comparing model predictions with field measurements from North Karelia, Finland. For. Ecol. Manag. 2016, 363, 120–129. [Google Scholar] [CrossRef]
Rinne, J.; Laurila, T.; Hypén, H.; Kellomäki, S.; Rouvinen, I. General Description of the Climate and Vegetation at the BIPHOREP Measurement Sites; European Commission: Luxembourg, 1999. [Google Scholar]
Labbé, M.; Michaud, F. RTAB-Map as an Open-Source LiDAR and Visual Simultaneous Localization and Mapping Library for Large-Scale and Long-Term Online Operation. J. Field Robot. 2019, 36, 416–446. [Google Scholar] [CrossRef]
Campos, C.; Elvira, R.; Rodríguez, J.J.G.; Montiel, J.M.; Tardós, J.D. ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual–Inertial, and Multimap SLAM. IEEE Trans. Robot. 2021, 37, 1874–1890. [Google Scholar] [CrossRef]
Morelli, L.; Ioli, F.; Maiwald, F.; Mazzacca, G.; Menna, F.; Remondino, F. Deep-Image-Matching: A Toolbox for Multiview Image Matching of Complex Scenarios. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2024, 48, 309–316. [Google Scholar] [CrossRef]
Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
DeTone, D.; Malisiewicz, T.; Rabinovich, A. Superpoint: Self-supervised interest point detection and description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 224–236. [Google Scholar] [CrossRef]
Tyszkiewicz, M.; Fua, P.; Trulls, E. DISK: Learning local features with policy gradient. Adv. Neural Inf. Process. Syst. 2020, 33, 14254–14265. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III 18; Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar] [CrossRef]
Lindenberger, P.; Sarlin, P.E.; Pollefeys, M. Lightglue: Local feature matching at light speed. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 17627–17638. [Google Scholar] [CrossRef]
Riba, E.; Mishkin, D.; Ponsa, D.; Rublee, E.; Bradski, G. Kornia: An open source differentiable computer vision library for pytorch. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Snowmass, CO, USA, 1–5 March 2020; pp. 3674–3683. [Google Scholar] [CrossRef]
Bellavia, F.; Morelli, L.; Menna, F.; Remondino, F. Image Orientation with a Hybrid Pipeline Robust to Rotations and Wide-Baselines. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2022, 46, 73–80. [Google Scholar] [CrossRef]
Grupp, M. evo: Python Package for the Evaluation of Odometry and SLAM. 2017. Available online: https://github.com/MichaelGrupp/evo (accessed on 8 October 2024).
Han, X.F.; Jin, J.S.; Wang, M.J.; Jiang, W.; Gao, L.; Xiao, L. A review of algorithms for filtering the 3D point cloud. Signal Process. Image Commun. 2017, 57, 103–112. [Google Scholar] [CrossRef]
Zhang, W.; Qi, J.; Wan, P.; Wang, H.; Xie, D.; Wang, X.; Yan, G. An easy-to-use airborne LiDAR data filtering method based on cloth simulation. Remote Sens. 2016, 8, 501. [Google Scholar] [CrossRef]
Chew, L.P. Constrained Delaunay triangulations. In Proceedings of the Third Annual Symposium on Computational Geometry, Waterloo, ON, Canada, 8–10 June 1987; pp. 215–222. [Google Scholar]
Pfeifer, N.; Mandlburger, G. LiDAR data filtering and DTM generation. In Topographic Laser Ranging and Scanning; CRC Press: Boca Raton, FL, USA, 2017; pp. 307–334. [Google Scholar]
Xi, Z.; Hopkinson, C. 3D graph-based individual-tree isolation (Treeiso) from terrestrial laser scanning point clouds. Remote Sens. 2022, 14, 6116. [Google Scholar] [CrossRef]
Silverman, B.W. Density Estimation for Statistics and Data Analysis; Routledge: London, UK, 2018. [Google Scholar] [CrossRef]
Umeyama, S. Least-squares estimation of transformation parameters between two point patterns. IEEE Trans. Pattern Anal. Mach. Intell. 1991, 13, 376–380. [Google Scholar] [CrossRef]

Figure 1. Unique Scandinavian berry species investigated in this research: (a) bilberries (dark blue); lingonberries (red); (b) cloudberries (orange).

Figure 3. Drones used in this study for data collection: (a) Ingeniarius hexarotor; (b) Deep Forestry quadrotor.

Figure 4. Test area (red rectangle) in Ilomantsi, Finland (red circle).

Figure 5. Example input RGB frame from the front-facing stereo camera of the hexarotor.

Figure 6. The raw, coregistered point cloud produced by the quadrotor LiDAR SLAM system with a zoom-in view of a single tree.

Figure 7. Proposed workflow for processing data from a dual (front and nadir) stereo vision system.

Figure 8. Retrieved camera trajectories (in red) and sparse point clouds from COLMAP using (a) SIFT, (b) DISK, and (c) SuperPoint.

Figure 9. Orthomosaic created using SuperPoint-based photogrammetric processing. The two close views show cropped full resolution details with a cloudberry (left) and a mushroom (right).

Figure 10. Hillshaded ground mesh obtained from Delaunay triangulation of points filtered by CSF.

Figure 11. Results of tree instance segmentation on LiDAR point cloud.

Figure 12. Tree centroids from the instance segmentation (marked as green circles) and the heatmap of the forest density for visualization in the mobile app. Darker regions depict denser forest.

Figure 13. Example of a bilberry distribution heatmap generated from the detections in the georeferenced images. Dots represent image positions (scaled by the number of detections); darker regions depict higher spatial concentrations of fruits.

Figure 14. Visualization of the mobile app running on an Android phone emulator.

Figure 15. GNSS position accuracy as reported by the receiver during the under-canopy flight.

Figure 16. A planar plot of all obtained drone trajectories on top of the GNSS-derived, noisy trajectory.

Figure 17. Comparison of XYZ components of drone trajectories over time, plotted alongside the GNSS-derived noisy trajectory.

Table 1. Summary of the data collected with UAV platforms.

UAV System	Data Type	Size	Sensor	Format	System Purpose
Hexarotor	Front-facing stereo images	5644 images	Zed X stereo camera	Rosbag	Wild berry detection in a global reference system
	Nadir images	632 images	Zed X stereo camera
	GNSS positions	815 positions	Emlid Reach
Quadrotor	3D point cloud	11,443,116 points	Ouster OS0-32	.pcd	Forest modeling

Table 2. APE statistics for 3D reconstructions based on different feature extractors. Best results in bold.

Reference GNSS Positions	Feature Extractor	Mean (m)	Median (m)
All available	SIFT	3.81	3.47
	DISK	16.10	13.44
	SuperPoint	3.09	2.69
Accuracy < 10 m	SIFT	3.29	2.72
	DISK	14.06	10.29
	SuperPoint	2.57	2.24
Accuracy < 1 m	SIFT	1.36	1.09
	DISK	1.18	0.89
	SuperPoint	1.25	1.01

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Trybała, P.; Morelli, L.; Remondino, F.; Farrand, L.; Couceiro, M.S. Under-Canopy Drone 3D Surveys for Wild Fruit Hotspot Mapping. Drones 2024, 8, 577. https://doi.org/10.3390/drones8100577

AMA Style

Trybała P, Morelli L, Remondino F, Farrand L, Couceiro MS. Under-Canopy Drone 3D Surveys for Wild Fruit Hotspot Mapping. Drones. 2024; 8(10):577. https://doi.org/10.3390/drones8100577

Chicago/Turabian Style

Trybała, Paweł, Luca Morelli, Fabio Remondino, Levi Farrand, and Micael S. Couceiro. 2024. "Under-Canopy Drone 3D Surveys for Wild Fruit Hotspot Mapping" Drones 8, no. 10: 577. https://doi.org/10.3390/drones8100577

APA Style

Trybała, P., Morelli, L., Remondino, F., Farrand, L., & Couceiro, M. S. (2024). Under-Canopy Drone 3D Surveys for Wild Fruit Hotspot Mapping. Drones, 8(10), 577. https://doi.org/10.3390/drones8100577

Article Menu

Under-Canopy Drone 3D Surveys for Wild Fruit Hotspot Mapping

Abstract

1. Introduction

2. Related Works

3. Platforms and Data

3.1. UAV Platforms

3.2. Test Area

3.3. Data Acquisition

4. Methodology

4.1. Stereo Image Sequence Processing

4.2. LiDAR Data Processing

4.3. Data Visualization: The Picker App

5. Results

5.1. Photogrammetric 3D Reconstruction

5.2. LiDAR Point Cloud Segmentation

5.3. Mobile App Visualization

6. Discussion

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI