1. Introduction
Agriculture employs a significant number of workers all over the world, particularly in the developing countries. The adoption of modern technologies, including Information and Communication Technologies (ICT), such as Internet of Things (IoT), Artificial Intelligence (AI), Farm Management Information Systems (FMIS), wearable computers and robotics, has led to the so-called “agriculture 4.0” [
1,
2]. However, despite the plethora of technological advances, which have arguably ameliorated the farmers’ living standards to a substantial degree, their safety is frequently underestimated. Epidemiological studies have identified several safety and health issues associated with the agricultural occupations [
3,
4]. Focusing on the non-fatal health issues, work-related musculoskeletal disorders (MSDs) have proved to be the most ordinary ones [
5,
6].
The most common manual operations in agriculture include harvesting, weeding, digging, pruning, sorting, and load lifting and carrying, with the last task having gained relatively little attention. This does not conform to the gravity of the matter, as has been reported recently in [
7]. According to the literature, during load carriage, the energy expenditure increases as the load and carriage distance increase, while the possibility of getting injured increases when the worker carries the load incorrectly [
8]. Repetitive load carriage, usually combined with lifting, has strongly been related to knee injuries and cumulative stresses on the lumbar region that can result in lower back pain, which have reached epidemic proportions among farmers [
9]. Overall, the physical symptoms associated with load carriage can include aches, soreness, and feeling of fatigue that, in turn, can reduce noticeably farmer’s performance. Another important aspect is that pickers, in tree and other perennial high value crops, spend considerable amount of time in carrying the filled crates outside the field, which can be a long distance. Alternatively, this task is shouldered by other farm workers, consequently increasing the costs for the farm owners. Remarkably, approximately one-third of the total working time can be spent in carrying crates, as stressed in [
8].
Toward meeting the challenges of both preventing farmers from musculoskeletal injuries and assuring safety, the adoption of a fleet of low-cost lightweight robots with high-load carrying capacity for transporting the crates from the site of picking to the end of the farmyard can take place. Nevertheless, this is a multidisciplinary problem, as it entails issues coming from several scientific topics such as Human–Robot Interaction (HRI), IoTs, economics, Machine Learning (ML), clinical, safety, and ergonomics. As a consequence, each feature should be investigated separately, at a first stage, before integrated efforts are made for accomplishing a safe and economically viable solution. The present study focuses on a very important aspect of safe HRI, namely the activity recognition of the workers that can increase the situation awareness. This concept constitutes an emerging scientific field in HRI research to eliminate errors that usually appear in complex tasks. Most of the time, humans are situation aware and thus can reliably display competent performance. This skill is highly required also for autonomous vehicles, and ongoing research is carried out toward this direction [
10,
11]. If a task necessitates cooperation, the participants (human and robot in this case) should synchronize their actions. This represents a key challenge that should be addressed in HRI, as highlighted by Benos et al. [
12], who examined the safety and ergonomics during human–robot synergy in agriculture.
Human Activity Recognition (HAR) has received extensive attention as a result of the progress of advanced technologies such as ML [
13] and IoT (e.g., Inertial Measurement Units (IMUs) and high accuracy Global Positioning Systems (GPS)) [
14]. In particular, HAR relied on sensors has been applied to numerous areas including rehabilitation [
15], sports [
16], healthcare [
17], and security monitoring [
18]. HAR based on sensor data is considered to be more reliable compared to vision-based techniques, since the latter are affected by visual disturbances, such as lighting variability, and need fixed site implementations [
19]. In contrast, wearable sensor-based techniques are suitable for real-time implementation, as they are easy to be deployed, cost effective, and are not site dependent [
20]. Among the most commonly utilized sensors, accelerometers, magnetometers, and gyroscopes are often used [
21,
22,
23]. Smartphones, having multi-sensor systems, are also gaining attention [
24,
25]. In general, multi-sensor data fusion has been proved to be more reliable than using a single sensor, as possible information losses or the imprecision of one sensor can be compensated by the other sensors [
24]. Remarkably, a promising technology is Digital Twin (DT), where multi-physics modeling can be integrated together with data analytics. As a result of its ability to combine data processing tools (from the digital world) as well as data acquisition (from the physical world), DT have been proposed as a means of better identifying scenarios of high-risk to optimize risk assessment and, thus, workers’ safety [
26,
27].
Processes, such as data preprocessing, describing data collection, data fusion, outlier removal, and noise reduction, precede all other processes that are involved in data-driven methodologies. These steps are required to bring the data into an optimal state, since they are derived from sensors that potentially can produce irregularities. Depending on the nature of the problem and the ML algorithm selected to tackle it, the data need to be further processed and transformed into the appropriate shape for a series of mathematical or logical operations. In general, the purpose of a ML algorithm is to be able to produce a model that will fit the data in the best possible way so as to predict unknown examples with the highest accuracy. In the case of HAR, the aim of the ML algorithm is to learn the characteristic features of the signals collected from the on-body sensors in order to be able to classify the correct activity for a particular timeframe. Afterwards, important feature vectors are extracted to minimize the classification errors and computation time [
28]. Finally, the classification phase serves to map the selected features into a set of activities by exploiting ML techniques. Through implementing ML algorithms, models can be developed via iterative learning from the extracted features, up until they are able to optimally model a process. ML has extensively been implemented in agriculture, thus offering valuable solutions to several tasks such as crop, livestock, water, and soil management [
29], to mention but a few. As far as the HAR is concerned, a plethora of ML models have been utilized, such as Hidden Markov Model [
30], Support Vector Machine [
31], K-Nearest Neighbor [
32], Naive Bayes [
33], Decision Tree [
34], and Long Short-Term Memory (LSTM) [
35]. Nonetheless, the literature regarding the use of ML for automated recognition, by using the data from wearable sensors collected throughout agricultural operations, is very limited. Indicative studies are those of Patil et al. [
36] (use of accelerometers to detect digging, harvesting, and sowing) and Sharma et al. [
37,
38,
39] (use of GPS, accelerometers, and microphone sensors to detect harvesting, weeding, bed making, and transplantation.
The aim of the present study was to properly identify human activities related to a particular task, which is lifting a crate and placing it onto a robot suitable for agricultural operations with the use of ML algorithms (LSTM) for sequential data classification. Since the agricultural environment is a dynamic ecosystem, which is susceptible to unforeseeable situations [
40], other human sub-activities comprising this task were also investigated, including standing still as well as walking with and without the crate. Two common lightweight agricultural Unmanned Ground Vehicles (UGVs) were used: Husky and Thorvald robots. For the purpose of gathering the data, 20 healthy participants took part in outdoor experimental sessions by wearing five IMU sensors (embedded with tri-axial accelerometers, gyroscopes, and magnetometers) in different body positions. To the best of our knowledge, no similar study exists. This investigation, through providing the activity “signatures” of the workers, has the potential to increase human awareness during HRI, thus contributing toward establishing an optimal ecosystem in terms of both cost savings and safety. Finally, the present dataset is made publicly available [
41] for the sake of future examination by other researchers.
4. Discussion and Main Conclusions
HAR is of major importance in the design process of agricultural collaborative robotic systems, as they should be able to operate in dynamic and crowded farm environments, where almost nothing is structured. In addition, these collaborative systems do not use isolated cells as occurs with conventional industrial robots. Toward optimizing the required activities, robots are working in the same working region concurrently with their “co-workers”, namely humans. “Cobots”, as these robots are usually referred to [
69], can carry out either the same task or distinct tasks. The present envisioned application focuses on the latter scenario, where the robot can follow the workers while harvesting and, subsequently, place the crate onto the robot. Afterwards, the robot can safely transfer the full crates outside the field. Aside from the aim to provide safety and time saving, this cooperation can contribute to the prevention of the fatigue of agricultural workers, because the arduous task of carrying the crates for a long distance is performed by robots based on human-aware planning. Apart from the HRI, the results of this study are also applicable to conventional in-field operations such as lifting crates and loading to platforms for transferring to storage.
The activity recognition of workers is closely related to an essential feature of HRI, which is usually mentioned as “social-aware robot navigation” [
70,
71]. While autonomous navigation is restricted to obstacles’ avoidance and reaching the target destination [
72,
73,
74], the social navigation, apart from that, takes into consideration other factors associated with human naturalness, comfort, and sociability [
75,
76]. More specifically, naturalness is related to navigation in paths such as those for humans via adjusting the robot’s speed and its distance from farmers. Comfort offers also the feeling of safety, whereas sociability has to do with abstracting decisions pertaining to robot’s movements by considering ethical and regional notions [
71]. In a nutshell, HAR within agricultural human–robot ecosystems has a great potential to assure a sociable acceptable safe motion of robots and provide a free space to farmers to perform their activities unaffected by the simultaneous existence of robots, while the latter can approach them when it is required [
12,
77,
78].
The present study focuses solely on HAR. To this end, data originated from 20 healthy participants, carrying out a particular task, were gathered by five wearable IMUs. This task included walking an unobstructed distance, lifting a crate (either empty or with a total mass of 20% of each participant’s body mass), and carrying it to the point of departure, where they have to place it onto an immovable UGV (either a Husky or a Thorvald). By carefully distinguishing the sub-activities comprising the above task, the obtained signals were properly preprocessed for the purpose of using them in the learning phase (training of the model) and the testing phase (evaluating the model’s performance and robustness) of an ML process (LSTM).
Overall, the problem of properly classifying stationary activities was challenging. The “Bending”, “Lifting”, and “Placing” activities were initially misclassified by a large margin. However, noise removal and normalization increased the overall performance of the trained model significantly. One of the factors that improved the performance of the model was the width of the temporal window which, when it varied more than one second, the performance of the overall model would decrease substantially. The LSTM architecture has provided the appropriate tools for the model to be able to learn the features of the activity signals. Early experimentation with artificial neural networks (ANN) and one-dimensional convolutional neural networks (CNN) has resulted in low performance on the trained model. Nevertheless, further investigation on more elaborate architectures utilizing the benefits of multiple methods, such as CNN-LSTM [
77] or convolutional LSTM [
78] networks might be worth conducting. However, being a characteristic of data-driven approaches, the volume and variability of data play a significant role in a model’s performance. That being stated, this study has shown that by obtaining data from 20 subjects, equipped with five IMU sensors each, performing a few recordings and fine-tuning a state-of-the art LSTM network can help train a robust model, which can properly classify all activities with an accuracy of larger than 76%. Toward increasing the volume and variability of data (and, thus, the overall accuracy), a study with a sample that consisted of more participants covering a wider range of ages, physical strength, and anthropometric characteristics exists in the immediate plans of the authors. Moreover, with the intention of providing real-world data, these experimental tests are planning to be performed in a real agricultural environment by workers at their own pace, according to the complex conditions that they may face.
Additionally, it can be concluded that the gyroscope and the accelerometer can be both used independently for recognizing the specific sub-activities, which are commonly performed in agricultural environments. However, their synergetic contribution can somewhat increase the overall performance. In contrast, the use of a magnetometer alone cannot lead to equally reliable results and should only be considered for supplementary use. The best performance was presented when the data from all sensors were fused. Furthermore, for the sub-activity of walking without the crate, the present methodology indicated the higher precision. On the contrary, as anticipated, the sub-activity presenting the smallest precision was that of bending down in order to approach the crate, since it can be executed in several ways, depending on each participant. For example, it was observed that most of the time, participants could solely bend their trunks (stooping) or kneel without bending their trunk enough, or simultaneously stoop and kneel to catch the crate. This resulted from the instruction of participants to carry out the task in their own way. This is justified from our intention to increase the variability of the dataset for capturing, as widely as possible, most of the different manners in which someone can perform the desired task.
Obviously, assuring a fluid and safe HRI in agriculture involves a plethora of different issues. However, each issue must be addressed separately, at a preliminary stage, before a viable solution is proposed. This study demonstrates the framework for both conducting direct field measurements and applying a ML approach to accurately identify the activities of workers automatically, by analytically presenting the applied methodology at each phase. Finally, the examined dataset is made publicly available, thus assuring research transparency while allowing for experimental reuse and lowering the barriers for meta-studies.