1. Introduction
The nutational movement of climbing plants refers to the rhythmic, circular, or nodding motion exhibited by certain plant parts, such as stems, tendrils, or growing tips, as they explore and interact with their environment during the process of climbing [
1]. This movement is often associated with the search for support structures, such as poles, trellises, or other plants, which the climbing plant can use for stability and upward growth [
2]. It occurs as a result of differential growth rates on different sides of the plant organ. As the plant grows, the cells on one side of the organ elongate more rapidly than those on the opposite side, causing the organ to bend or curve [
3]. This bending or curving allows the plant to explore its surroundings and find a suitable support for climbing.
Climbing plants, such as common beans (
Phaseolus vulgaris L.), employ various mechanisms for climbing, including twining, where the plant winds around a support, and the use of tendrils, specialized structures that coil around objects for support [
4].
The study of nutation movements in climbing plants is part of the broader field of plant tropisms, which involves the growth responses of plants to external stimuli such as light, gravity, and mechanical touch, first studied in the pioneering works of Charles Darwin [
5]. This nutation movement allows climbing plants to optimize their growth in response to their environment, increasing their chances of successful climbing and their access to sunlight for photosynthesis. Moreover, it has been the inspiration for the development of new techniques and methods in robotics [
6], computational intelligence [
7,
8], and other bioinspired innovations [
9].
This current study of nutation and other plant movements requires the recording of time-lapse images of plants, due to the long time span in which the movements occur, and the use of artificial vision algorithms for the efficient analysis of the videos obtained. For example, Navarro et al. [
10] developed a specific capture chamber to monitor the growth of plants at intervals from 30 s to hours or even days. The captured images were then analyzed with computer vision algorithms based on thresholding, morphological operators, and blobs detection. Using the parameters obtained from the blobs, features such as area, length, compactness, and perimeter were extracted for further analysis. Stolarz et al. [
11] developed software called the “Circumnutation Tracker” to analyze the circular movement of the tip of the plants in zenithal time-lapse videos. The X and Y positions of the tip are manually selected by the user, and the system includes useful tools for the visualization and characterization of the plant movement, automatically extracting circumnutation parameters.
More recently, Tenzer and Clifford [
12] proposed a technique to analyze the growing movement of hydroponic plants using different neural networks on grayscale images. U-net, Linknet, FPN, PSPNet, and a 34-layer ResNet architecture were used for comparison, obtaining a maximum area under the curve (AUC) for the validation set over 0.92. Another recent method based on deep learning was presented by Mao et al. [
13], which proposed a U-net network architecture to segment the plant. This method is an improvement of their free software “Plant Tracer”, which analyzed plant movement using the classic computer vision techniques of object tracking and blob detection. Díaz-Galián et al. [
14] presented a methodology to analyze the movement of plants using different reference points, which were obtained from image analysis. This methodology included coordinates transformation, curve fitting, and statistical analysis, allowing a comparison between different plant species. Plant movement has also been analyzed using other types of sensors. For example, Geldhof et al. [
15] developed a digital inertial measurement unit (IMU) sensor to measure in real-time the movement of leaves, achieving a precision of pitch and roll angles under 0.5°. However, the exact 3D position of the leaves was not obtained.
The aim of this work is to develop a stereo system for the visual tracking of plant nutation movement, which enables obtaining the precise 3D location of the tip of the plant. This work was focused on the study of the common bean (
Phaseolus vulgaris L.), since the ultimate purpose is to analyze the differences in the movement between using or not using a support pole for plants. Previous work from the MINT Lab research group suggests that the dynamic patterns of plant nutation are influenced by the presence of support to climb in their vicinity [
2]. This fact has also been observed by other researchers. For example, Guerra et al. [
16] analyzed the 3D kinematics of the movement of climbing plants, demonstrating that plants are not only able to perceive their environment but can also scale the movement of their tendrils depending on the size of the support.
The objective of this present study is not to analyze these kinematic changes under different environmental conditions but to develop computational tools for the accurate 3D measurement of plant tip movement that will be used in further studies. Consequently, for our analysis to be effective in a broader range of conditions, we obtained data on plants from different settings, i.e., with and without a support to climb onto. Since the proposed method is able to reconstruct the 3D position of the tip in both conditions, our method promises to offer a framework for future studies to analyze how plant nutation responds to different environments.
2. Materials and Methods
The proposed computer vision method to track the movement of the plant tip is shown in
Figure 1. It consists of five main stages: calibration, data acquisition, plant tip detection in the top and side images, 3D position estimation, and visualization of the results.
In the following subsections, the characteristics of the experimental setup and the stereo imaging system are presented first. Then, a brief introduction to the mathematical foundations and artificial vision algorithms used is given. Finally, the procedure used to estimate the position of the plant tip at each of the moments captured in the time-lapse videos is described.
2.1. Experimental Setup
The experiments were carried out by the MINT Lab research group at the Scientific and Technical Research Area of the University of Murcia (Spain). These experiments were conducted under controlled conditions and consisted of two scenarios. In the first scenario, the climbing plant was placed near a pole, which served as a support for the plant to grip when it reached its position; in the second scenario, the plant was placed within a cabin with no pole in it. The pole was 90 cm in height and 1.8 cm in diameter and was placed at a distance of 30 cm from the plant center. Concerning the soil, a mixture of peat moss and perlite (70–30%) was used throughout all the experiments. Both cabins were equipped with white parabolic reflectors to provide symmetrical lighting. The temperature of the growth chamber was kept constant at 20 °C and the relative humidity at 85% ± 5%. We have to note that, although this is a high relative humidity, plants are able to accomplish their life cycles successfully with no harm, especially in controlled conditions in which heat stress can be prevented. A L16:D8 h photoperiod was provided via high-pressure sodium lamps (Lumatek pulse-start HPS Lamp 250 W; height: 150 cm and photon fluence rate: 430 ± 50 μmol m−2 s−1 at leaf level). During the 8 h of darkness, a dim phototropically inactive green safelight (fluence rate under 5 μmol m−2 s−1) was activated, which was enough to take pictures in the dark using a 4.2 μm high dynamic range (115 dB) image sensor.
Figure 2 shows an example of these two scenarios, with and without a supporting pole. Specifically, the plant species used for this experiment were the common bean (
Phaseolus vulgaris L.). Seeds of the cultivar “Garrafal Oro Vega” variety were provided by Semillas Ramiro Arnedo S.A., Spain (
https://www.ramiroarnedo.com). “Garrafal Oro Vega” is a variety with abundant foliage, having medium-sized leaves and containing 18–20 cm long pods. The plants went through multiple stages from germination to the start of the recording. The seeds took 48 h to germinate. Upon germination, the young seedlings were transferred to coconut fiber growing pellets and kept on propagation trays for further development. Once the first true leaves matured, which took around 8–12 days from the young seedling stage, healthy-looking, 25–30 cm tall plants were selected and transferred into the recording booths for video capture. The videos were recorded until the plant either grabbed the pole, or the tip came out of the experimental area (i.e., the area visible to the cameras), which usually happened after 3 or 4 days.
It can be observed in
Figure 2 that a dark color was used at the bottom part of the booth. The purpose of this dark cover was to increase the contrast with the plant contour, thus enhancing the performance of the algorithm for extracting the position of the plant tip in the videos corresponding to the top view in time-lapse. We made sure that the white part remained disproportionately more than the black part. The proportion of the lower black part was calibrated so that the plants did not show any shade avoidance response, thereby altering the nutation movement.
Once the young seedlings were transferred to the coconut fiber growing pellets and kept on propagation trays for further development, 30 mL of water was initially added every day. Afterwards, when the plants started to mature, they were watered so that around 80% moisture was maintained throughout the growing stage. For recording purposes, the plants were transferred into big, black plastic pots. At the onset of the recording, 400 mL of water was added to each plant, which was found sufficient to keep the soil moist throughout the recording phase. No extra water was added during the four days of recording.
To estimate the nutational movement of the plant, a stereo pair of cameras was used, recording synchronously and in time-lapse. The target of the motion tracking was the plant tip. The configuration of the stereo vision system is shown in
Figure 3. This imaging system consisted of two Brinno TLC200 Pro cameras (Brinno Ltd., Taipei, Taiwan) positioned at the top and side of the plant, forming a 90° angle. The technical specifications of the cameras are shown in
Table 1. This top-side configuration of the cameras was chosen because it can offer high accuracy in detecting the tip of the plant, although it may not be the best choice for tracking other parts of the plant.
To synchronize the captured images, the recording was started simultaneously on both cameras. Each camera used its own internal clock. The images were captured at one-minute intervals for a continuous 24 h recording, lasting from 3 to 4 days. Because each camera had its own clock, there could be a slight offset in the images between the two cameras for some recordings. To determine the number of frames corresponding to this offset, the light–dark cycle used in the experiments to provide the L16:D8 photoperiod was used. For this purpose, the zenithal images were used as a reference, based on the photoperiod changes. During one of the lighting changes, the offset between the top and side view images was computed, measured as the number of frames of difference. This number was used in the further analysis of the sequence.
2.2. Mathematical Methods
In general, a camera maps 3D points in space to 2D points on the image plane. The cameras used in the experiments were modeled using a pinhole camera model. In a pinhole model, a ray from a point in space passes through a fixed point called the projection center. The intersection of this ray with a chosen plane in space, the image plane, is the projection of the point on the image plane [
17]. The projective transformation given by a pinhole camera model is as follows:
where:
s is an arbitrary scaling factor of the projective transformation.
is the 2D pixel on the image plane.
is the 3D point expressed in the world coordinate system.
is the matrix with the intrinsic parameters of the camera.
and are the rotation and translation, respectively, describing the coordinate change from the world to the camera coordinate system.
The matrix with the intrinsic parameters is composed of the focal lengths,
and
, expressed in pixel units, and the principal point of the camera, (
,
).
This matrix with the intrinsic parameters does not depend on the scene, so it can be reused once estimated, as long as the focal length remains fixed. On the other hand, matrix
contains the extrinsic parameters of the camera. This matrix performs the homogeneous transformation and represents the change of basis from the world coordinate system to the camera coordinate system, as follows:
Combining these two matrices yields the projective transformation, which maps the 3D points in the world to the 2D points on the image plane in normalized camera coordinates.
The pinhole camera model does not consider the distortion caused by the lenses used in the cameras. To accurately represent a real camera, the camera model includes both radial and tangential lens distortion. Radial distortion occurs when light rays bend more at the edges of the lens than at the optical center. The radial distortion coefficients model this type of distortion as follows:
where:
is the location of the pixel without distortion.
, and are the radial distortion coefficients of the lens.
.
On the other hand, tangential distortion occurs when the lenses and the image plane are not perfectly parallel. In this case, the tangential distortion coefficients model this type of distortion as follows:
where:
, is the location of the pixel without distortion.
and are the tangential distortion coefficients of the camera.
.
Using this basis, the computer vision algorithms that have been used in the implementation of the pipeline for the method used to track the plant nutation movement are described in the following points.
2.2.1. Perspective-n-Point (PnP)
The pose computation problem consists of solving the rotation and translation of an object with respect to the camera, while minimizing the reprojection error for correspondences between the 3D points in the world and the 2D points on the image. Several methods have been proposed to estimate the pose of an object relative to the camera [
18,
19]. These methods can be used for calibration using a planar calibration object [
20].
2.2.2. Calibration of a Stereo Pair System
The procedure outlined in the previous subsection enables the obtaining of the intrinsic parameters of the camera as well as the camera pose relative to the calibration object. If the calibration object is visible to both cameras, we have the pose
of the first camera relative to the calibration object and the pose
of the second camera. These two poses are related as follows:
Therefore, by using the previous equation, it is possible to obtain the pose of the first camera relative to the second camera. Once the pose of the first camera relative to the second one is obtained, the essential matrix can be constructed as follows:
where
are the components of the translation vector
]. From this, the fundamental matrix can be computed as follows:
2.2.3. Background Segmentation
The background segmentation problem involves detecting changes that occur in a sequence of images. Pixels in an image are classified either as part of the background or as part of an object of interest based on a background model. In real applications, depending on the characteristics of the problem, different methods have been proposed: models based on color thresholding, clustering, and fuzzy logic, and, more recently, neural network-based models [
21] have been used. The usual steps in image segmentation are:
The background model is initialized with some of the video frames.
Once the background model is initialized, each pixel in the image is classified based on the background model.
The background model is updated with information from the last processed image.
In the algorithms developed to extract the position of the plant’s tip in time-lapse images, the Gaussian mixture method was used. This method uses multiple Gaussian functions per pixel to model the background of the scene. This method takes into account that a pixel can have different states over time. The implementation of this method in the OpenCV library is based on Zivkovic [
22].
2.3. Motion Tracking and 3D Position Estimation
This section first describes the procedure used to calibrate the stereo system. Next, the steps in the process of extracting the position of the plant tip from the images captured by the stereo system cameras are outlined. Finally, it describes how the nutation motion of the plant was tracked based on the 2D position of the extracted plant tip from the images.
2.3.1. Camera Calibration
The stereo system calibration was carried out following the procedure described in the previous subsection. The calibration process was conducted in two phases. First, each of the stereo cameras was individually calibrated. A calibration object with a 10 × 7 chessboard pattern and dimensions of 59.4 × 84.1 cm was used for the calibration. These dimensions were chosen to cover a large part of the image plane, allowing the precise estimation of the distortion coefficients.
Once the individual calibration of the cameras was completed, the stereo system calibration was performed to estimate the projection matrices for each camera and the fundamental matrix. In this case, a calibration object with dimensions of 42.0 × 59.4 cm was used, visible in both cameras for a sufficient number of poses to carry out the stereo system calibration.
Table 2 shows the results of the calibration, measured in terms of the reprojection error, that is, the distance from the calibration pattern key points detected in the images to a corresponding world point projected into the same images.
In
Figure 4, the points used in the calibration of the stereo system can be observed, combining the positions of all the images used for calibration.
2.3.2. Detection of the Plant Tip in the Overhead View
The process consisted of two steps, as shown in
Figure 5. First, for each frame,
n points, where the plant tip could be located, were detected. Once the candidate points were obtained, a selection process for the position of the plant tip was performed for each frame, followed by interpolation for frames where no candidate points were detected. The steps of the first step can be seen in
Figure 6.
The following are the highlights of the algorithm used in the first step of the process:
The segmentation of the plant contour was performed using the mixture of Gaussians algorithm.
After each change in lighting, the process reset the background model to reduce the stabilization time.
To detect when a lighting change occurred, the process checked the area of the selected contours, and, if the area exceeded a threshold specified by the lighting_change_area parameter, it was assumed that a lighting change had occurred.
After a lighting change, the process did not check again for a lighting change until after the frame period specified by the lighting_change_est_time parameter to prevent the repeated detection of a lighting change in every frame after the background model had been reset.
The process for selecting the plant tip position among the candidate points followed these two criteria:
If there was only one candidate point, it was selected.
If there was more than one candidate point, the point closest to the last point selected for previous frames was selected. In segments of frames where there was no candidate point, the position of the plant tip was estimated by performing linear interpolation using the plant tip position in the previous and subsequent frames of the segment.
2.3.3. Detection of the Plant Tip in the Lateral View
This process consisted of two stages, as in the case of the process used for extracting the position of the plant tip in the top view images. First, for each frame, the
n points, where the plant tip could be located, were detected. Once the candidate points were obtained, a selection process for the plant tip position was executed for each frame, followed by interpolation for frames where no candidate points were detected. The steps of the first stage are depicted in
Figure 7.
The highlights of this algorithm used in the first stage of the process are:
The segmentation of the plant contour was performed using the mixture of Gaussians algorithm, just as for the segmentation of the plant contour in the top view.
For detecting changes in lighting, the same procedure used in processing images from the top view was followed.
There were cases where multiple fragments of the plant contour are obtained when applying the background model to the image. In these cases, the use of the epipolar line helped to identify the fragment where the plant tip was located.
Once the contours near the epipolar line were selected, a minimum path algorithm was used to select the potential position of the plant tip within the contour.
The process for selecting the plant tip position among the candidate points followed the same criteria as indicated in the previous section.
2.3.4. Correcting Wrong Detections and Obtaining the 3D Positions of the Plant Tip
The algorithms described in the previous subsections were not always able to detect the plant tip position in all cases, as mentioned in each section. For this reason, a “Plant Tracker” tool was developed to allow the user to modify the automatically performed plant tip extractions. This tool consists of three screens, as shown in
Figure 8. Once the user has completed the manual corrections, the application allows the extraction of a CSV file with the estimation of the 3D position for a selected range of frames.
4. Conclusions
The study of the nutation movement of plants has aroused great interest in the research community in different areas, from philosophy to computer science, and has been a great source of inspiration for new algorithms and bioinspired systems. This paper has presented the development of a stereo vision system for studying the movement of climbing plants. The system consists of a pair of RGB cameras that synchronously record a side and top view of the plants in time-lapse. This study is focused on the common bean as a typical climbing plant model. Subsequently, the images are analyzed with a computer vision algorithm that obtains the tip of the plant and, using a previous calibration of the stereo pair, estimates the position in 3D coordinates. The method is able to extract the correct tip position in 86–98% of cases, depending on the video, with an average reprojection error below 4 px, which is translated to an approximate error in the 3D localization of about 0.5 cm. The proposed method allows researchers to know precisely and robustly the nutation movements of the plants and to compare their behavior under different situations, such as the use or absence of support structures for climbing.
In future work, it would be interesting to apply the latest deep learning methods to perform the accurate segmentation of the plants in the images, as well as subsequent matching between the stereo pair images for a complete estimation of the 3D position of the plant. In this way, it would be possible to know not only the position of the tip of the plant, but also of other parts such as the stems and leaves.