1. Introduction
Information gathering and transmission are growing to the point where it is common to exchange information between participation in real-time and anywhere in the world. However, the information is required to adjust the process for producing decision-making dynamically. Smart forest is one of the domains that is drastically becoming technologized with data-gathering sites, processing, transportation and analysis. Industrial IoT data-processing poses some unique challenges when endeavouring to make stream processing a reliable solution. Some specific challenges that the IoT industry faces when it comes to data-processing are (1) the retrieval and processing overheads for the huge amount of heterogeneous streaming data generated from a large number of IoT devices, and (2) the missing and delayed events due to unpredictability of data which affects the required rapid response time in the IoT industry [
1]. In particular, these IoT devices produce data at a variable rate (e.g., unscheduled events) rather than at a fixed rate, which augments the difficulties for data stream processing platforms to cope with such a sudden increase in streaming rates [
2]. Taming this massive streaming data is a very challenging task. Mainly, the true vision of IoT can only be realized if the underlying technologies for stream processing can handle large amounts of data. Realizing the necessity, recently, a few stable and scalable have emerged to facilitate real-time building applications for IoT by processing large IoT data at scale, such as Apache Spark, Apache Flink, Apache Samza and Apache Storm [
3] etc. The rapid development of IoT has resulted in the emergence of a few sophisticated query language tailor-made for performing analytics over streaming data, such as Spark SQL, Flink Table API, KSQL, SamzaSQL, and StromSQL. All of these streaming data queries are relying on the concept of windowing to convert continuous infinite data streams into chunks of finite data sets sliced according to a pre-defined time (e.g., minutes, seconds, milliseconds) [
4].
Our previous work has addressed the study of the impact of changes in IoT data streams rate over query window configurations [
5]. Our evaluation results have indicated a direct impact of any changes in stream rate and window size over the engines’ performance. For instance, consider a fire alarm detection query scenario that processes two input data streams and observes their values at a fixed interval for any fire hazard detection. Input streams/sensors are configured so that they change their stream rate with the changing values (e.g., if the temperature rises beyond a certain threshold, data observation rate is increased to ensure early detection of any fire event). Larger window size facilitates streaming query engines for optimal processing of data with high velocity. Still, it impedes the query engine’s performance due to large intermediate result size or delay events. The smaller window size can lead to short-latency, but it misses events in case of high stream rates. Therefore, a fixed window size will either impede the query engine’s performance due to a large number of intermediate results or miss the detection of fire events due to the small window size. One of the real use case of fire data monitoring is forest fire detection. The forest managers have gradually accepted IoT forest environmental monitoring technologies for taking forest inventory through remote IoT data collection [
6]. The IoT-based forest fire monitoring data has the challenge of dynamic ecological changes, i.e., climate change due to unpredictable streaming data. In particular, the timeliness of acquired monitoring IoT data should be considered to avoid real-time data being affected by sudden factors.
Internet of Forest Things (IoFT) is version revolution of IoT, which refers to the smart devices distributed in forest for monitoring, management and fire detection and protection. A big picture of the forest fire situation is depicted in
Figure 1. As shown, the deployed IoFT is used to monitor weather conditions (temperature, humidity, CO (Carbon Monoxide), and CO
2 (Carbon Dioxide)). The sensory weather data are collected and consumed by Apache Kafka as a scalable message queuing system. The collected data (i.e., potential fire events) is then sent to the query engine running on top stream processing platform. The number of possible fire events varies drastically during the day, which causes a delayed fire reporting to the forest department authority. The delay will also raise late alarms to the forest department authority to send its firefighters and drones to the burning forest. Therefore, the automation theme of Forestry 4.0 can lead to appropriate, timely action to be taken from the forest department authority side. Applying adaptive window-based streaming data analysis on forest fire data monitoring can provide early warning mechanisms to reduce fire risks by sending quick decision-making responses.
Sustainable forestry focuses on three impacts, i.e., economic (timber supply), ecological (biological and resilience), and social sustainability (forests multiple-use and non-timber products). The forest operations challenges are finding optimal management strategies to predict the environmental, economic, and social performance of various services, processes and productions. The concept of sustainable forestry is based on sustainable development, which refers to maintaining biodiversity, capacity, productivity, vitality and relevant economic, ecological and social operations impacts [
7]. Furthermore, the authors of [
8] discussed the challenges of forest operations and the proposed Sustainability Impact Assessment (SIA) framework that analyses the social, economic and environmental. Sustainable forestry focuses on three impacts, i.e., economic (timber supply), ecological(biological and resilience), and social sustainability (forests multiple-use and non-timber products).
Figure 2 illustrates the importance of forest sustainability management for economic, social, and environmental. Balancing these impacts is a challenge for improving sustainable forestry in terms of data availability, and quality [
9,
10]. The authors of [
8] summarized the forest operations sustainability impacts in details. Based on the above study,
Figure 2 shows the effect and the importance of using techniques solutions for improving Forestry 4.0. Forestry 4.0 aims to bring forest value chain to work within Industry 4.0 parameters (connectivity, security, and productivity, processing remotely). The authors of [
11] discussed challenges regarding wireless network availability and connectivity in forest in contrast to industries based on the advanced IoFT, robotics, automation and autonomous technologies.
To the best of our knowledge, there is no dynamic stream processing system that has been studied concerning the forest fire management and detection to manage the unpredictable environmental conditions changes. The dynamic window-based selector is particularly important and relevant for forest fire management and detection use case because of its ability to adapt to the environmental changes in time. It can provide a timely warning about fires by real-time monitoring streaming IoT data to detect the dynamic environmental fire risks. To bring our research work to reality, we develop a data generator to generate IoFT-based forest fire streaming forestry data with fluctuated change rates simulated to rapidly changing in an ecological environment within forestry areas. Therefore, it is sensible to state that our work presents an adaptive stream processing prototype for Forestry 4.0 i.e., IoFT-based forest fire detection can automatically adapt the window size according to several parameters and, thus, improve the performance of the underlying IoFT.
1.1. Contribution
The main contributions in this paper can be summarized as follows,
We elaborate Industry 4.0 towards Forestry 4.0, which has been proposed as research initiatives in recent years. While most of the publications have focused mainly on the digital technologies, we have focused on applying the automation theme by proposing dynamic stream query processing using IoFT for forest fire detection use case.
We provide a flexible model proposing the ideal window type and size depending on stream rates and application requirements. The proposed dynamic query window selector can monitor various external factors including stream rate, resource constraints, and application requirement to propose the most optimal window size and type for a given query. It is also capable of detecting any in-efficiencies and re-deploy the optimal query.
We identify and perform a real-world use case of Forestry 4.0 (i.e., forest fire detection based on IoT data) to evaluate the dynamic window selector in the fire situation when the stream rate suddenly changes i.e., weather sensor starts sending streaming data in high rate.
1.2. Paper Organization
The remainder of this paper is organized as follows; related work is introduced in
Section 2. The Industry 4.0 towards Forestry 4.0 and the forest fire detection use case is introduced in
Section 3 and
Section 4 respectively. The research methodology, including the data streaming pipeline and our proposed dynamic window-based selector are introduced in
Section 5. The experimental evaluation and discussion are presented in
Section 6 and
Section 7 respectively. Finally, conclusions and future work are presented in
Section 8.
2. Related Work
Traditionally, stream processing systems have managed the sudden changes in stream rate by elastically increasing available resources, or by discarding part of the data streams (i.e., load shedding) [
12,
13,
14]. Much work on the potential of stream processing has been carried out in recent years due to the growth in IoT. Our work tackles change of stream rates and window-based stream processing, which are major challenges faced due to loosely couple nature of IoT data streams. The research work of forest environmental management and industry 4.0 including forest fire detection is discussed. Therefore, we divide the existing work into the following categories:
For the stream processing adaptation, Cervino et al. have proposed an adaptive cloud-based approach for provisioning virtual machines with respect to stream rate change [
15]. The proposed approach periodically estimates the number of virtual machines required to support the input stream data rate to maintain virtual machines overloaded and meet processing latency. The adaptive stream rate for smart grid applications on clouds has been studied to throttle the rate of generation of power events by smart meters [
2].
Furthermore, one of the aspects of the fluctuated streams generated by various IoT devices is the out-of-order events problem. Kun et al. have proposed a real-time query-matching algorithm to generate queries when the number of event types is large and query length is long by minimizing the overhead and reduce the response time [
1]. The distributed Platform for Elastic Stream Processing (PESP) has been introduced to deal with changing rates of streaming data [
3]. The PESP platform operates a cost-efficient stream processing engines due to a flexible adoption of processing nodes.
With regard to query stream processing, STREAM, a Stanford’s data stream management system, supports a large class of declarative continuous queries over continuous streams and traditional stored data sets [
16]. Das et al. have proposed a robust algorithm to automatically adapt the batch size based on the data ingestion rates, variations in available resources, and workload characteristics [
17]. Zhang et al. have leveraged adaptive batch sizing and block size to minimize the end-to-end latency of streaming system without prior knowledge of workloads specification [
18]. The authors proposed a heuristic algorithm integrated with isotonic regression to automatically learn and adjust batch size and execution parallelism according to workloads and operating conditions.
There has been much work in the area of stream-processing based on querying windows. SABER, a window-based hybrid stream processing system is proposed to adapt scheduling strategy on CPU and GPU with respect to increasing of the share of queries [
19]. SPECTRE (SPECulaTive Runtime Environment) is a framework for speculative processing of multiple dependent windows in parallel [
20]. The SPECTRE framework has addressed the speculative processing concept to allow the execution of multiple versions of multiple windows using different event sets in parallel. It has provided a probabilistic model to process different window versions that have the highest probability to be correct.
Stream-processing based on sliding window has been extensively studied for different aspects such as aggregations and anomaly detection. For aggregation, the DABA algorithm has been proposed for incremental sliding window aggregation over stream data [
21]. Scotty, which is window-based operator has been proposed for aggregation and discretization [
22]. The key idea of the Scotty operator is splitting the streams into non-overlapping slices and computes shared partial aggregates per slice while supporting out-of-order processing. For anomaly detection, a sliding window-based strategy has been used for detecting faults over high dimensional streaming data [
23]. In particular, the authors has proposed ABSAD approach to select fault-relevant sub-spaces and then detect online faults stream with time-varying characteristics using sliding window.
For IoT forest environmental monitoring, the researchers at Northeast Forestry University have researched a networking based intelligent platform by using the ZIGBEE protocol to monitor their forest environmental factors in time with the new IoT technology [
6]. The ZIGBEE-based networking technologies has the advantages of low power dissipation, low data rate, and high-capacity transportation, which makes it more suitable for the design of the node of the forest environmental factors collection platform. In the context of forest fire management, the research works are over-viewed in terms of the satellite systems, optical cameras, and wireless sensor networks detection techniques [
24]. The author has discussed several research experiment results and some market product methods for better understanding the fire detection technique stated that each technique has its advantages and disadvantages in terms of efficiency, accuracy, versatility, and other key attributes. Haifeng et al. have proposed a fuzzy prediction algorithm implemented by rechargeable wireless sensor network to assess fire risk and calculated the quantitative potential fire risk [
25]. The authors studied the weather variables including temperature and humidity as input of the proposed fuzzy prediction using 24-h monitoring of whether meteorological factors. They concluded that it is difficult to predict the occurrence of forest fires accurately.
In the field of IoT-based applications and fire management, Faisal et al. have designed a wireless sensor network using multiple sensors for early detection of house fires [
26]. The authors have used the Global System for Mobile Communications (GSM) to avoid false alarms and they have tested their system by simulating a fire in a smart home using Fire Dynamics Simulator. For the IoT-based forest fire detection, the proposed work in [
27] provides new improvements such as the use of innovative IoT technologies and a data treatment focused on the prevention, detection, activation of alarms and management of operations for the extinction of fires. The authors have developed a system with secure communication which has been configured for monitoring different variables of environments including temperature, humidity, CO, CO
2 and wind speed. Furthermore, the authors of [
28] addressed the techniques used for reducing pollution such as CO
2 for improving smartness application in the real world via collaboration of drone and IoT framework applications. However, the authors of [
29] presented the collaboration of smart IoT devices and drone for improving emergency response.
However, some good research work has been carried out to the dynamic stream processing and previous work has not comprehensively considered dynamic stream rate and query windowing using an open knowledge base to recommend the proper window configuration. According to the authors’ knowledge, no comprehensive work towards Forestry 4.0 which exploits the capabilities of streaming data platforms to manage the forest fire due to dynamic rapidly changing of the ecological environment. So, we proposed an adaptive window selector to dynamically change window configuration to face stream fluctuating. Then, we defined a real-world IoT-based use case, fire forest detection, as a dimension of Forestry 4.0 using the power of streaming technologies to provide a timely warning about fires by real-time monitoring streaming IoT data to detect the dynamic environmental fire risk.
The wood resource is from the forest, making forestry important for economic, cultural, and ecological. Based on industry 4.0, the authors of [
30] introduced the concept of forestry 4.0. they showed the implementation of forestry 4.0 with multi-domain systems. The authors of [
31] discussed the advantages and limitations of using IoT for wood processing in industry. Many studies discussed industry 4.0 in several field including economic and business [
32], information sharing [
33], technologies and applications [
34], future trends [
35] and industry 4.0 in the wood industry [
36] (
https://awfsfair.org/2019/04/industry-4-0-in-the-wood-industry-beyond-the-buzz/). However, many studies have been done in connecting industry 4.0 and forestry 4.0 with the help of advanced technologies such as IoT, Artificial intelligence, robots, vehicles and etc. [
37], industry 4.0 application to forestry [
30], wood processing [
31], digitization in wood supply [
38]. The authors of [
39] discussed Internet of improving the forest sustainability. Moreover, the authors of [
40] introduced the framework of industry 4.0 from the prospective of forest supply chain. The framework components were included system intelligence, digital technologies, communication network infrastructure, collaborative supply chain of forest.
Forest fire accident represents a common hazard that destroys the forest. Therefore, giant trees were reduced drastically, which led to an unhealthy environment for human beings and animals. The authors of [
41] introduced IoT devices and cloud to produce forest fire alert in case the fire is detected. Therefore, detecting fire is necessary to avoid fire hazards in the forest and benefit from distributing IoT devices in forest areas. The authors of [
37] introduced optical remote sensing for early fire alert systems. The proposed system architecture included spaceborne, airborne, and terrestrial to detect fire with a high accuracy level. Smart IoT devices are implemented in smart systems to measure CO
2 emissions of different forest fire sources [
42]. Furthermore, IoT devices are used for designing efficient forest fires detection [
43]. The summary of the related research work has been presented in
Table 1.
3. Industry 4.0 towards Forestry 4.0
Industry 4.0 is considered the fourth industrial revolution introducing a new paradigm of digital, autonomous, decentralized control for manufacturing systems. The concept of Industry 4.0 refers to smart manufacturing toward to digitization, collaboration and automation. The authors of [
48] identified the component of industry 4.0 include Cyber–Physical Systems (CPS), smart factory, IoT, and internet of services (IoS). CPS Refers to the fusion of physical world in Industry 4.0, while IoT refers to the connectivity between the physical elements in industry 4.0. Smart factory refers to all categories of smart phyiscial components such as devices, robots, computers, cameras sensors, and etc. IoS refers to the processing and functions of all smart devices connected via IoT. Moreover, the Industry 4.0 impacts can be improve socially, economically, and environmentally [
48]. Furthermore, Industry 4.0 covers a broad range of technologies, processes, and systems mainly related to industry digitalization. In terms of data-related technologies, the main areas of Industry 4.0 are CPS, Industrial Internet of Things (IIoT), Cloud Solutions and Decentralized Services, and Big Data and Stream Processing technologies for processing large amounts of production data in real-time [
49].
The transfer of Industry 4.0 concepts and technologies to the forestry sector appears to be a promising way to optimize existing processes and to spawn new business models. Forestry 4.0 concept is inspired by Industry 4.0 concept, which plays a vital role in the next industrial generation revolution. Internet of Forest Things, AI, automation, smart devices, Blockchain and digital twins will drastically change the Forestry 4.0 for the better. These advanced and emerging technologies are used to solve the operational issues related to create a sustainable Forestry 4.0 (
https://www.woodbusiness.ca/final-cut-forestry-of-the-future-the-sustainable-revolution). Furthermore, the combination of emerging technologies for sustainability is the efficient way of Forestry 4.0 future toward Industry 4.0. For Forestry 4.0, network performance and communication network are required such as IoFT, wireless sensor network, IoT, big data, edge computing, drone, and cloud computing. Smart IoT device, mobile devices, IoFT, smart devices, robots, objects, vehicles, drone, and machines; as shown in
Figure 3. The authors of [
40] have defined the Forestry 4.0 as paradigm of forest industry (digitization, connectivity, harvesting, automation and transportation). It focused on digitization of end-to-end smart devices as well as customers. Thus, the forestry 4.0 concept combines digital technologies, network connectivity, processing and operations, and collaboration.
Furthermore, Forestry 4.0 technical realization involves connecting wood resources, data sets, existing and new hard and software components, and stakeholders into a novel IoT, Services, and People in forestry [
30]. Based on the manufacturing industries experience with Industry 4.0, Forestry 4.0 concept has been launched by FPInnovations (
https://web.fpinnovations.ca/) as an initiative for digitalization in the Forest Industry. Forestry 4.0 initiative aims at enabling the upstream part of the forest value chain in Canada to fully leverage the agility and power of the fourth industrial revolutions. The development of Forestry 4.0 achieves solutions for issues that affect the forest industry including labour shortages, performance, forest connectivity, safety, environmental performance improving, sustainability and reducing costs. IoFT is based on big data gathering and exchange, real-time connection, and assembly of technologies. Implementation of communication among the distributed smart devices in the forest environment is the aim of the Internet of Forest Things, enabling the industry 4.0 implementation standard. Therefore, communication for wide range requires device-to-device, robot-to-robot, vehicle to vehicle, vehicle and robot to infrastructure things, human to devices and machine, interconnected things among heterogeneous devices, and cellular to operations via Internet network [
50,
51]. The focus of the IoFT is on keeping connectivity links among all forest components (robots, smart devices, machines, vehicles, devices, operations, cellulars, etc.) in large forest areas. The authors of [
39] discussed the smart devices and connectivity for digital Forestry 4.0 and monitoring applications considering sustainability. The Internet of Forest Things is the key enabling to exchange real-time information between Forestry 4.0 operations components and decision centre and industry 4.0. Furthermore, applying the IoFT can monitor forest environment impacts in real-time with intelligent platforms. The authors of [
50] showed many benefits of using the IoFT to improve Industry 4.0 via low data rate, transportation of high capacity, low power consumption and efficient gathering data. Moreover, Internet of trees is used to monitor and early fires detection (
https://https://electronics360.globalspec.com/article/11399/internet-of-trees-early-detection-of-forest-fires/), and fight climate change (
https://www.euronews.com/living/2020/07/29/internet-of-things-technology-is-being-used-to-help-trees-fight-climate-change).
Forestry 4.0 concept, four research themes have been defined which, through their distinct functions; the real environment, IoFT, the next-generation fibre supply chain, data analytics [
11]; as shown in
Figure 3. For the real environment, the forest supply chains, accurate information is needed on the amount and quality of fibre available, the physical environment in which operations will need to be deployed and the transformational outcomes of the various phases of harvesting systems. In regards to forestry data-related technologies, data are collected through remote sensing, satellites, drones or aircraft, imagery and LiDAR 3D cloud points, infrared cameras, high-resolution camera, etc. For the Internet of Forest, the forestry industries face the most significant challenge regarding communicating in remote areas with a high cost of satellite communication. Therefore, the Internet of Forest which refers to various machines’ connectivity, is used as a collaborative system based on real-time communication between machinery, infrastructures and digital devices to control operations, even remotely. For the next-generation fibre supply chain theme, the advanced technologies will be required in harvesting systems to truly enable full Forestry 4.0 functionalities around connectivity, automation, and agility to upstream and downstream changes in the supply chain. For automation, the production chain must be updated using the latest technological developments, such as sensors, augmented reality devices, more autonomous intelligent transportation systems (self-driving vehicles). For data analytics, forest management’s decision-making process must take account of analyzing vast data (such as geographical or geological data or those referring to wildlife biology). The forestry data analyzing is beneficial to early inform and warn for risk analyses, accident statistics, timber products supply chain, forest-damage, forest fire, etc. Forest fire is one of the risks which has significant damage to the environment which motivates us to identify and perform it as a real-world use case (i.e., forest fire detection based on IoFT data) as a dimension of Forestry 4.0 using the power of streaming technologies. In particular, IoFT streaming forestry data analysis can support and automate early warning systems that ensure protection against forest fires around the clock have replaced forest workers and volunteers doing duty on watchtowers. The summarized comparison of industrial research in forest fire detection domain has been described in
Table 2.
Based on the above, we try to build Forestry 4.0 in several layers based on [
52].
Figure 4 illustrates the forestry layers, including smart devices layer, network layer, data analysis layer and application layer. Each layer contains various devices, technology, and technique to build smart Forestry 4.0 to be automated, digitalization and collaboration. In the forest layer, forest world devices are used for sensing, monitoring, forest robots and transportation robots. Network layer refers to advanced and emerged communication technology such as 5G and 6G technology that can make the interaction between devices reliable and without human intervention. Gathered data are processed in the data analyses layer. People and employers can monitor Forestry 4.0 in the application layer.
4. Forest Fire Detection Use Case
As a use case, IoFT aims to use different smart devices to measure forest parameters CO, CO
2, monitoring, temperature, detecting fire, RFID, sensors, cameras, etc., without human intervention; as shown in
Figure 4. These devices send the gathered data into the centre platform via advanced wireless communication technologies. Industry 4.0 people or employers in the centre platform can interact with smart IoFT devices, process received data, estimate forest growth, monitor trees’ health, and fire detection. Implementing Industry 4.0 technologies reduces data collection cost, improves sustainability, monitors forest utilization resources, and measures forest parameters. Therefore, decision-making can be real-time and easy, while growth forest prediction can be more accurate and reliable due to continuous measurement. These will play a vital role in improving economical impact, environmental impact of Forestry 4.0.
Forest fires, which are also called wildfires, are among the greatest disasters in the world today. In 2018 alone, 8,767,492 acres burned, roughly equivalent to 74 of the 75 largest cities in the United States combined. It is the sixth-highest total since modern historical records began in the mid-1900s, indicating that no state is entirely free from wildfire risk in the US. CoreLogic Wildfire Risk Report for 2019 highlights that the total estimated reconstruction cost value for the extreme-risk homes is more than
$221 billion, with California metropolitan areas dominating the top 15 risk regions (see
Figure 5). In late August 2019, for example, Brazil’s National Institute for Space Research said that the number of fires in the country (i.e., Amazon) largely set by humans had jumped 84% in 2019 over the same period in 2018 (
https://fortune.com/2019/08/25/causes-of-amazon-forest-fires/).
The year 2020 has been a year like no other due to COVID-19, which will change the world forever. As the scientists, researchers, the World Health Organization, and social communities fight the pandemic, another crisis is unfolding worldwide.
Figure 6, (
https://cleantechnica.com/2020/05/13/2020-fire-season-covid-19-not-a-match-made-in-heaven/) depicts the forest fire outlooks in 2020 for May, June, July, and August, highlighting the elevated risk in the Pacific Northwest, Northern California, and the Southwest throughout the summer.
Substantially, a forest fire happens due to rapidly changing ecological environments such as uncontrolled climate changes, making the forests unable to recover from devastating consequences for the long-term. For example, the climate changes cause to change the soil moisture and surface temperatures, making the soil becomes water repellent [
55]. The forest department authorities’ enormous challenge is that the forests are usually remote, abandoned/unmanaged areas filled with trees and affected by dynamic environmental variables, e.g., temperature, humidity, CO, and CO
2.
This issue has been a research interest for many years; many very well-studied solutions are available out there to propose an effective way to minimize the damages caused by the fires. Early detection of forest fires is the most attractive trend for the market and research, making decision-makers take a fast appropriate reaction. There are several forest fire detection techniques, and monitoring systems employed by authorities, including human-based observation, satellite-based monitoring systems, optical camera-based monitoring systems and wireless sensor networks [
24]. The human observation is inefficient due to the error-prone. It provides an accurate forest fire prediction. As humans get fatigued by time, their forest fire prediction will be inaccurate due to less considering environmental impact, making it a non-reliable solution to reduce forest fire risks. The promoted satellite monitoring systems suffer from severe limitations failing speedy and effective control for forest areas. For example, the satellite systems may not be available for continuous-time to cover the full regions within forests such as gaps in time when the satellite is not within the field of view from certain regions or spots of the forest. On the other hand, the optical camera-based monitoring systems are costly in building towers and communication infrastructure in the forests’ remote areas. Furthermore, the optical camera-based monitoring systems may provide false alarms due to night vision and weather conditions such as wind-tossed trees and cloud shadows that affect camera performance.
Recently, wireless sensor networks are considered the best available solution for forest fire detection. They can provide all the required information that influences the environment at any moment accurately. The wireless sensor networks are easily connected and deployed in broad and inaccessible forestry areas. Accordingly, the researchers and industries have shifted to IoT paradigm, which has been conducted in various fields. For instance, the proposed work in [
27] provides new improvements such as the use of innovative IoT technologies and a data treatment focused on the prevention, detection, activation of alarms and management of operations for the extinction of fires. The authors have developed a system with secure communication configured for monitoring different variables of environments including temperature, humidity, CO, CO
2 and wind speed. Lately, the forestry companies leverage the agility and power of the fourth industrial revolution (i.e., Industry 4.0) towards Forestry 4.0 to utilize the IoT sensors capabilities which send real-time streaming information to early detect wildfire.
Forest fires can happen due to climate change, and cause significant environmental damages. So, we identified a forest fire detection use case as one of the dynamic environmental management challenges. Several detection and monitoring systems are used by authorities to detect the fire as fast as possible, and its exact localization and early notification. As many IoT devices are working together to detect forest fire, the fire alarm detection technologies can help support the decision-making process due to a rapidly changing ecological environment. In particular, the generation of IoT streaming data technology allows managers to establish a set of early warning mechanisms for the quick response and decision making, together with having full use of the data on environmental performance evaluation.
6. Experimental Setup and Evaluation
The experiments have been conducted to perform data stream pipeline using Apache Kafka version 2.11–2.2.0, Apache Flink version 1.7, and MongoDB to consume data from data sources (i.e., sensor data generator), execute stream query and then store the reported data respectively. In the following subsections, the generation of forest fire dataset use case and the description of experiments and discussion are presented.
6.1. Forest Fire Based on Climate Change Dataset
We build our use case called forest fire detection to continuously monitor climate change, i.e., weather conditions in real-time (see
Figure 1). As technical level, Apache Kafka is used to consume weather streaming data from the deployed sensors within a forest. The collected data (i.e., potential fire events) is then sent to the query engine running on the top stream processing platform (i.e., Apache Flink). The number of possible fire events is varying drastically during the day. In the normal weather conditions, the weather sensors will send lower data frequencies, and no action will be taken from the forest department authority side. For the forest fire situation, the weather sensors start to send data in higher frequencies which causes event-loss due to the delay in consuming and processing such massive streaming data. Therefore, the delay will raise late alarms to the forest department authority to send its firefighters and drones to the burning forest. Consequently, the dynamic query window selector occurs due to the delay issues caused by the static window configuration for stream query engines. The adaptive query window selector adapts the window size (i.e., increasing and/or decreasing window size according to the fluctuating stream rates during day hours). To the best of our knowledge, there is no open dataset of forest fire detection IoT based use case. The authors in [
27] have developed a system with secure communication which has been configured for monitoring different variables of environments including temperature, humidity, CO, CO
2 and wind speed. They have taken into account Rule of 30 that considers zones characterized by temperatures above 30 °C and humidity values below 30% as risk areas for forest fires. Authors in [
25] have considered the probability of forest fire changing with temperature.
Based on the proposed work in [
25,
27], we prepare a dataset having five weather sensors including temperature, humidity, CO, CO
2, and wind speed. The dataset is generated using Rule 30 to bring it close to reality, for example by showing temperature goes higher during the noon, late afternoon and lower after midnight [
27,
56] (
https://en.wikipedia.org/wiki/Rule_30), (
http://www.auburn.edu/academic/forestry_wildlife/fire/weather_elements.htm). However, the authors of [
57] discussed the techniques and strategies for greening IoT by reducing the pollution hazards. Furthermore, we use the standard European Forest Fire Information System which describes the Forest Fire Danger Ratings into four categories; Green, Yellow, Orange, and Red (
http://effis.jrc.ec.europa.eu) to define alert types thresholds for each sensor (see
Figure 14). Our assumption for data generation is that each of the sensors remains in the thresholds defined for each alert type, e.g., when the temperature is less than 30, humidity is always greater than 30 and so on (see
Table 4 and
Figure 15).
In particular, we assume there are 100 nodes installed to produce weather sensor values (i.e., temperature, humidity, CO, and CO
2) depending on the alert type threshold. For example, for green, yellow, orange and red, there are 2, 6, 12, and 60 values per minute sent from sensors respectively (see
Table 5). The day is divided into different sections depending on the alert type, and in total there are 8 h of green, 7 h of yellow, 6 h of orange and 3 h of red. The rationale behind this assumption is that the midnight and early minoring hours are less likely to fire occurrences, i.e., green alerts. In contrast, the noon hours are the high probability for fire occurrence, i.e., red alerts. The sample of generated data for the weather variables values is provided in
Table 6.
Substantially, high wind speed promotes supplying more oxygen from the surrounding environment. Therefore it increases the fire at a fast rate. Consequently, the wind speed, which probably has the most significant impact on a wildfire’s behavior, is the most unpredictable factor. Due to wind speed’s unpredictability during a day, we could not generate a wind speed stream using Rule 30 to be close to reality. We generate it within the speed range; between 1 and 20 km per hour [
58].
6.2. Experimental Setup
We run six experiments for static window sizes including 1, 5, 10, 15, 30, and 60 s, using a tumbling window and one experiment for dynamic window size. Static window sizes experiments aim to show the static window’s impact and the stream rate changes on the reported alerts including the reported alerts, missed alerts and latencies. According to the latency setting, we ran a dynamic window size experiment to evaluate our proposed technique by adapting window size between 60 s and 1 s based on the input stream rate change. Then, we show the benefit of query window adaption in reducing the latency of reporting time while discussing the trade-offs regarding the alert losing due to the adaption process.
The conducted experiments execute a stream query that monitors alerts based on sensors’ values and then reports an event that is the alert type. In particular, the stream query joins two streams; weather stream including (temperature, humidity, CO, and CO2) and wind speed stream. Since not only the pattern of weather sensors thresholds is used to report the wildfires, the wind speed also is used to early notify the fires especially in case of the wind excess the high range, i.e., 20 km/h.
For brevity and simplicity, we run our experiments by setting the data generator to produce streaming data for 100 nodes within 24 min using the predefined stream rate of 24 h to make our experiments more tractable.
According to the listed experiments and the forest fire detection use case which concerns timely danger rating alerts, we measure these matrices; (1) output stream rate including the number of actual alerts, joined events/reported alerts, loss/missed alerts for each alerting category, (2) timeline latencies including queuing time, processing time, and end-to-end latency time (see
Figure 16), (3) reduction of reporting time using dynamic window selector. For the output stream rate, the actual alerts measure the expected alerts resulted from join query based on the stream rate; the reported alerts measure the Flink resulted joined tuples, the loss/missed alerts measure the alerts which pass their time due to long waiting in Kafka queues. For the timeline latencies, the queuing time measures the consumed times by Kafka in two intervals; the first interval is when the event consumed from Kafka source (i.e., weather sensors) and the second interval is when the joined events/reported alerts are sent from Flink until reported (i.e., stored into MongoDB). The processing time measures when the event is ingested into Flink until the event was sent to Kafka again. The processing time also includes Flink join processing time which labels each event by the corresponding rating alert. The end-to-end latency time measures the time from the event is generated until the event/alert is reported, e.g., sent to the forest department authority.
6.3. Comparison of Different Static Window Sizes
The static window experiments that are conducted by fine-tuning the window sizes such as 1, 5, 10, 15, 30, and 60 s. For each window size, we first measure the number of actual alerts, joined events/reported alerts, loss events and the end-to-end latency time that can be incurred for each window size.
Figure 17 showcase the linear increase of the reported alerts and decrease loss alerts for each type of alert concerning the actual alerts using different window sizes. It is noted that the number of reported alerts converges towards the number of the actual alert with window size increasing. In contrast, the number of loss alerts has the opposite behavior. We attribute this incremental behavior of the reported alerts to the increasing window size. More specifically, the large window size can hold many ingested events to be joined within a window.
Similarly, the decrease of loss alerts concerning the increasing window size is that the small window size fast joins the ingested events. Some queued events are lost by passing their timeout due to the long wait and couldn’t be joined with next windows. Thereby, the large window size can hold a larger number of input events that be joined and then reported. It can be seen that the average of end-to-end latency times gradually increases due to the increasing dormer size for each alert. For example, the green alert, which has the lowest stream rate, has the shortest latencies with respect to the other alerts for all window sizes. In contrast, the red alert, which has the largest stream rate, has the longest latencies with respect to the other alerts for all window sizes. The long latency of red alerts happens by increasing of stream rate over larger window size.
6.4. Comparison of Static and Dynamic Window Sizes Results
Based on the experimental results using static window sizes, we chose the static window size experiment of 60 s to compare with a dynamic window selection experiment which adapts the window size between 1 and 60 s. More specifically, this experiment has the lowest number of loss alerts concerning the other static window size experiments, making it the proper one to compare its performance with dynamic window selector performance. As the IoT-based forest fire monitoring use case is considered one of the latency-sensitive IoT-based application, the proposed dynamic window selector recommends the tumbling window type based on the built knowledgebase. Accordingly, the window size is initially configured using tumbling window and 60 s. The dynamic window selector slightly adapts the window size to maintain timely reported alerts for various stream rate while keeping the window type as tumbling because of its superiority for the latency reduction.
As can be seen,
Figure 18a,b depict that the proposed window selector achieves lower latencies to the static window size. By introducing a streaming data pipeline, it is noticed that the latencies including queuing time, processing time and end-to-end latency time are significantly reduced compared to the static window size experiment by dynamically change window sizes for the sudden stream rates. Mainly, the dynamic window selector gradually decreases the window size based on the stream rate increments. The lower window size performs multiple short join intervals, leading to quickly consuming data from Kafka and decreasing the queuing time (see
Figure 18b).
Considering the reality of the desired contribution,
Figure 19a shows that our dynamic window selector can adapt the window size according to time-varying input stream data rates based on 24-h monitoring of weather conditions in the forestry areas. For afternoon hours (i.e., 12, 13, and 14 o’clock) which are the most latency-sensitive for fire probabilities, our dynamic window selector can achieve the highest reporting delay reduction relative to the rest hours per day (see
Figure 19b). Furthermore,
Figure 19c shows the significant end-to-end latency time, E2E, improvement relative to window size which leads to faster alerting the forest department authority to protect the forests from fire spreading and huge damages.
In
Figure 20, we analytically and experimentally summarize the performance gained from window size adaptation, showing their superiority over the static window size configuration for each danger rating alert. However, the results show the trade-off between the end-to-end latency time improvements and the loss alerts ratios. For example, the red alerts have the highest number of loss alerts concerning another type of alerts which is on average is 85%. Still, its reporting time (i.e., end-to-end latency time improvement) is reduced by 74%. In particular, the proposed dynamic window selector has deployed a query with a smaller window size to perform fast joins using a smaller number of ingested events to trigger the early red alerts. The early notification using a smaller number of ingested events is better than deploying a query with a larger window size to perform late joins using delayed events having similar sensors values. As the selected use case, IoT-based forest fire detection, considered one of the latency-sensitive applications, we believe that losing 85% of red alerts is still reasonable to provide a timely warning about fires by using 15% of the expected alerts while reducing notification time to 74%. Finally, our experimental results over the generated forest fire dataset confirm that the dynamic window selector significantly reduces the end-to-end latency time for the reported fire alerts by sending fast danger alerts even though there are many missed alerts during the adaptation process.
7. Discussion
To bring the static window size experimental investigation to reality, the weather variables including temperature, humidity, CO, CO2 and wind, are considered relying on the new technology of IoT, which can achieve 24-h monitoring of weather conditions. From this experimental investigation, we learn that the large window size is suitable for the low stream rate, which reports the green alerts in normal weather conditions. On the other hand, the small window size which reports fast alerts is good for the high stream rate, which leads to critical danger ratings (i.e., red alerts). In terms of forest fire detection, it is a widely accepted view that relies on how difficult to detect forest fires accurately using static streaming window configurations. Consequently, the window size needs to linearly adapt according to the increasing stream rate to keep maintaining the critical alerts even with a reasonable number of loss alerts that indicate forest fires. In particular, we need to deploy a suitable dynamic streaming window configuration to maintain the weather data changes’ stream rates to avoid the potential fires in the broad forestry areas, especially in the summer season.
In summer times which are usually called fire seasons, more than 80% forest fire occurred either in the spring or in the autumn with a slight increasing distribution due to the drier weather conditions [
25]. According to this work, the weather variables, including temperature, humidity, CO, CO
2, and wind, are considered relying on IoT’s new technology, which can achieve 24-h monitoring of weather conditions in forestry areas. The joined streaming results, which are the alerts, can illustrate the relationships between the weather alert thresholds and the wind speed. Therefore, based on the findings using our proposed dynamic window-based selection system, the reported alerts are very time-sensitive in case of high wind speed. In particular, in the autumn with the drier weather conditions, the orange alerts should be considered red alerts in high wind speed, which strongly warns of danger fires igniting. Consequently, our proposed dynamic window selection system can inform the forest fire authorities about the forest fires that rely on different weather conditions even not fire seasons.
7.1. Challenges
In this paper, we present a novel dynamic query window-based processing system to recommend the optimal window size and type based on the dynamic context of IoFT applications. Regarding improving forestry 4.0 environmental sustainability, real-time network connectivity of IoFT devices and decision making will enhance forest resources to be continuously managed and monitored. Management of forestry 4.0 will be easy due to reducing resource damages and wastes, while harvesting will be coordinate to minimize gas emissions. Autonomous IoFT devices, vehicles and robots will improve safety and also working environment. There are more complex queries with complex operations, e.g.; multiple streams join since such cases/scenarios are non-trivial to solve and may consider for future work. Furthermore, The delay of reporting time in case of missed and delayed events could be reduced by identifying the optimal amount of resources to satisfy the required processing delay under specific stream rate change [
59]. In integration with other adaptive streaming techniques, other adaptive techniques such as adaptive load shedding for windowed stream joins [
13] and dynamic batch sizing [
17] can be integrated with our proposed dynamic windowing based on stream rate change for industrial applications. For stream processing platforms, the streaming data processing results should be on-the-fly to support alerting applications that process new data at the speed with which it is generated. Event streams are potentially unbounded, infinite sequences of records and unpredictable that represent events or changes in real-time. Consequently, the in-memory stream processing platforms must be both fast and scalable to handle billions of tuples every second. Memory consumption is one of the scalability challenging aspects that affect stream processing performance, especially for alerting applications, i.e., publishing notifications to subscribers. The existing research of adaptive windowing lacks solutions that consider memory consumption with the variable stream rates generated from the event sources (e.g., applications and sensors) and the critical destinations such as an alerting system. Regarding our proposed system, changing the window size can reduce the trade-offs with memory consumption. For example, assigning small window sizes for high stream rates helps avoid data drift problems (i.e., memory crash) and meet analytics requirements such as latency constraints. For further investigation, we plan to address the bottleneck and performance problems briefly due to memory consumption during streaming data processing [
60].
7.2. Opportunities
We believe that the proposed system has taken a number of steps in this direction. Furthermore, the future work will take us much closer to comprehensive solutions for variable size windowing over IoT-based stream rate change. Moreover, the proposed system able to manage the sudden large change in stream rate to establish a set of early warning mechanisms for the quick response for critical-latency applications. Gathering data and producing accurate alert of fire detection in Forestry 4.0 requires to distribute a massive number of smart devices and efficient processing techniques. Real-time fire detection and management alert is a critical issue for improving Forestry 4.0 application implementation. Therefore, this issue should be given high consideration to improve treating with a real situation and supporting needed real remedies to situations in real-time. Furthermore, distributed several kinds of smart devices could be used to monitor the forest fire. Heterogeneous smart devices gather different data. Applying a multi-model for gathered data is also a critical issue for forest fire detection and management.
Transmission gathered data for long-distance is a critical issue for improving Forestry 4.0 monitoring. Therefore, processing gathered data locally should be taken into consideration for improving forest situation in real time. Another solution, drone technology may be taken into consideration as edge station for gathering data from closed distance to smart devices efficiently and line of sight [
61,
62]. Furthermore, drones can process data nearest to smart devices and take action based on an algorithm implemented by using deep learning and store in the drone. in this case, drones can produce alerts of fire detection in forests. On the other hand, there is an opportunity to use edge technology to improve the economy and manage wood production (i.e., drone edge intelligence for gathering data from IoFT devices).