1. Introduction
Weather plays an important role in people’s lives. Through weather monitoring, data analysis and forecasting can be performed to provide useful weather information [
1]. In terms of forecasting, since there are many factors that affect weather changes, it is challenging to predict the weather accurately [
2]. Considering system operations and processing technologies, the existing systems for weather monitoring and prediction can be described from the system architecture and the information processing perspectives, respectively.
From the system architecture perspective, weather monitoring stations can be static or mobile. With the information provided by the fixed meteorological stations, there is some simulation software that uses numeric simulation to define the temperature in each grid [
3]. The precision of the estimated calculus for each grid is proportional to the number of weather stations distributed over the city. In Lim et al. [
4], the National Weather Sensor Grid (NWSG) system is designed to monitor weather information in real time over distributed areas in a city, where the weather stations are set in schools. In Sutar [
5], a system is developed to enable the monitoring of weather parameters like temperature, humidity and light intensity. However, mobility issues and communication protocols are not considered.
The mobile weather station is installed on the vehicle, constantly driving in a specific area to collect data and send the data to different receivers via wired or wireless technologies, which leads to a better balance of coverage than static observatories. In Foina et al. [
6], a city bus is applied as a mobile weather station to collect data through the path traveled by the vehicle. The system has three levels of interaction, the device integrated into the buses, terminal computer, and system computer. Although the PeWeMos system [
6] argues that it may monitor the very fine details and weather changes within the one area and provide very fine weather information and changes in even a sufficient amount of time, the interpretation of the sensed weather data and the cooperation of the buses, bus stops, and passengers for weather monitoring are not addressed. Hellweg et al. [
7] uses floating car data for road weather forecasts, which aims to increase the resolution of the weather observation network and the forecast model. The preliminary results show that bias corrections and quality control of the raw signals are key issues to enable safe autonomous driving. Considering the impact of communication channels on information quality, the weather’s impact on the performance of a radio link had been studied in [
8], which analyzes the correlation between several weather variables and the behavior of control frames in an outdoor wireless local area network. Based on the bus information management system, our previous work [
9] combines the advantages of local information processing and bus mobility, and proposes a real-time weather monitoring system, including a weather monitoring system and a management subsystem between buses and stations.
Figure 1 shows the system model, including signal, control, and communication components.
From the information processing perspective, since forecasting is a very important analysis topic, machine learning provides a capability to the systems to learn and improve from experience [
10,
11]. Moreover, with machine learning, data analysis, and prediction can be achieved without understanding the physical processes (e.g., applying the past data to predict future data [
12]). Readers may refer to [
13] for a full discussion.
In the literature, many prediction models for rainfall and weather forecasting have been proposed. For instance, Parashar [
14] proposes a system for monitoring and reporting weather conditions so as to be notified in advance to take relative measures to reduce possible damage. An Arduino Mega is used with some weather sensors to display the sensed values on the LCD screen, and the machine learning technology is applied to train the model and prediction and put the prediction results on the website. It mainly monitors the weather conditions and predicts the average, maximum and minimum temperature of the next day, not providing more detailed information like the weather conditions (e.g., temperature, humidity, and air pressure) for each hour.
Singh et al. [
15] develops a low-cost, portable weather prediction system that can be used in remote areas, with data analysis and machine learning algorithms to predict weather conditions. The system architecture uses the Raspberry Pi as the main component with temperature, humidity, and barometric pressure sensors to obtain the sensed values and then train according to the random forest classification model, and predict whether it will rain. Note that although the system hardware in Singh et al. [
15] and that of the proposed weather monitoring and forecasting systems are similar, the system in Singh et al. [
15] only describes the probability of precipitation. In Varghese et al. [
16], with Raspberry Pi and weather sensors, data are collected, trained, and predicted using linear regression machine learning models for evaluation via mean absolute error and median absolute error.
Instead of only considering the information processing perspective, this work simultaneously adopts the system architecture and the information processing perspectives. On the basis of the system architecture in Chen et al. [
9], a pair of bus stops and a bus, the gateway, and the server can work as a group to dynamically operate the control system and communication system, which extend the system to apply the collected data with machine learning algorithms for providing weather monitoring and forecasting. Note that given basic meteorological elements such as pressure, temperature, and humidity, this work focuses on the prediction of the temperature, humidity, and pressure for the next 24 h with mild weather changes. For the forecast of severe weathers, in order to accelerate the training process and improve the predictive accuracy, Zhou et al. [
17] state the predictors should contain major environmental conditions, which include meteorological elements such as pressure, temperature, geopotential height, humidity, and wind, as well as a number of convective physical parameters (i.e., including additional and advanced sensor equipment) to build the prediction system.
The major contributions and features of this work are: (1) proposition of a novel real-time weather monitoring and prediction system with basic meteorological elements; (2) development of an information processing scheme for increasing the management efficiency via a bus information system; (3) construction of machine learning models to analyze the trend of weather changes and predict the weather for the next 24 h.
Table 1 describes the performance comparison of existing and proposed systems, which shows that besides temperature prediction, the proposed system is able to provide a forecast of basic meteorological elements (e.g., temperature, humidity, pressure) for one-day weather prediction via the trained models.
The rest of the paper is organized as follows:
Section 2 depicts the system architecture, including information processing, data transmission/reception processes, the system components, and the implementation of the system.
Section 3 presents machine learning models and input data formats.
Section 4 describes the experimental results of each processing block and depicts the performance comparison of different prediction models. Finally, summarize this research in
Section 5.
3. Input Data Format
In order to examine the prediction performance, the measurement data is processed through different formats (i.e., G1, G2, G3, and G4) as shown in
Table 4. Let
indicate the sensing value at what time of the day, where In represents the input data, D indicates the present day, and T represents the time in 24 h format. Denote the day before as D-1. For instance,
represents 23:00 of the present day and
represents 22:00 of the previous day. Each type of input data format is represented by a timeline, where each data contains three values: temperature t, humidity h, and pressure
p. Assuming that the weather data at 24:00 (i.e., the red part of
Figure 5) is to be predicted, the G1 format is considered for the evaluation of adjacent time periods (i.e., the input data (the gray parts of
Figure 5) will contain four entries from 21:00 to 24:00 yesterday and data from 23:00 today). The rationale for the G1 format is to explore the data characteristics at short adjacent time periods. For the G2, G3, and G4 formats, we investigate the impact of the data formats at moderate time period (e.g., G2 for the past 12 h) and long time period (e.g., G3 for the past 24 h and G4 for the past 48 h) on the prediction performance.
Learning Model Architecture
This subsection describes the learning models used in this work: the long short-term memory (LSTM) model and the multilayer perceptron (MLP) model. For the LSTM [
21,
22] model, it uses three gates to adjust previously stored data: input gate, output gate, and forgetting gate and improves the problem of the recurrent neural network (RNN) gradient vanishing. The forget gate is used to decide which information will be discarded from the cell state. The input gate determines how much new information is added to the cell state. The output gate is based on the cell state to determine what value is invited to output. The LSTM combines the structure of three gates to protect and control information. Therefore, the performance of LSTM is better than that of RNN in the task of long-term memory. In
Figure 6, the upper horizontal line is the state of the cell. Selectively let messages through three gates. The forget gate is used to determine which messages pass through the cell, then enter the input gate, decide how many new messages to add to the cell state, and finally decide the output message through the output gate.
This work applies the LSTM with a forget gate. In
Figure 6,
and
represent the cell state and the output value for the current moment, and
and
represent the cell state and the output value of the previous moment. Denote
,
,
,
as forget gate’s activation vector, input/update gate’s activation vector, output gate’s activation vector, cell input activation vector, respectively. Let
and
be a weight matrix and a bias vector parameter, respectively, which need to be learned during training. Let
and
be the sigmoid function and the hyperbolic tangent (Tanh) function, respectively.
The first step is to decide what information to throw away from the cell state via a sigmoid layer called the forget gate layer. It looks at
and input vector
, and outputs a number between 0 and 1 for each number in the cell state
. Note that a 1 represents “completely keep this” while a 0 represents “completely get rid of this”. Thus, the forget gate’s activation vector is given by
The next step is to decide what new information to store in the cell state. The input gate layer and the Tanh layer are applied to create an update to the state.
Then, the new cell state
is updated by
Finally, based on the cell state, we need to decide what to output. First, we run a sigmoid layer which decides what parts of the cell state for the output. Then, we put the cell state through and multiply it by the output of the sigmoid gate, which yields
An MLP [
23] model consists of at least three layers of nodes (an input layer, a hidden layer, and an output layer). In the MLP model, some neurons use nonlinear activation functions to simulate the frequency of action potential, or firing of biological neurons. Since MLPs are fully connected, each node in one layer connects with a certain weight to every node in the following layer. After each data processing is completed, learning performs in the perceptron by adjusting the connection weights, which depends on the number of errors in the data output compared to the results.
The LSTM and MLP model architectures are paired with TensorFlow and Keras for model training. The LSTM parameter lookback is set to 5. The Adam optimization algorithm is applied for training the network. The loss value is evaluated via the root mean square error (RMSE). The activation functions use Tanh and scaled exponential linear units (Selu) functions. Units and activation functions of each layer are summarized in
Table 5. Referring to the above LSTM layer, we can match the data by adjusting the number of cells, entering dimensions, and activating functions. The time distributed dense layer is to gradually apply the dense layer to the sequence. The dense layer is used to activate neurons in neural networks. For the MLP parameters, initialize weights with a normal distribution. The activation function uses the rectified linear units (Relu) function.
Table 6 summarizes the units and activation functions at each layer of the MLP model, where the flatten layer is to flatten the high-dimensional matrix into a two-dimensional matrix, retaining the first dimension, and then multiplying the values of the remaining dimensions to get the second dimension of the matrix. When the model is trained, we can determine whether the model is overfitting such that the model can be adjusted according to the loss value and accuracy of each training. By testing the parameter values of different combinations and layers, the model suitable for the data is finally found.
The weather data collected by the sensors are processed according to the above data processing steps and format. The proposed model mainly focuses on predicting the weather condition for the coming day, including temperature, humidity, and air pressure. That is, assuming the current time is 0:00 with a weather prediction, the predicted temperature, humidity, and pressure values are obtained for the next 24 h (i.e., a weather prediction from 1:00 to 24:00). Accordingly, at 1:00 for performing an updated 24 h weather prediction (i.e., a weather prediction from 2:00 to the next day 1:00), the weather data collected by the sensors at 1:00 will be added to the original dataset to form a new input dataset for the sequential prediction of the next 24 h. Finally, the system accuracy is evaluated by the comparison between the predicted weather data and the measurement values via the root mean square error (RMSE), mean absolute error (MAE), and percentage error, as depicted in (1)–(3). Therefore, referring to the training model described above, weather prediction can be achieved.
The overall prediction model training process divides the original data into a training set, a verification set, and a test set after data processing, and then performs model training. After completing the model evaluation, the system adjusts the parameters according to the evaluation results and then continues training, and finally gets the prediction model.
After the data are processed, the trained prediction model is used to make a prediction. As the prediction is completed, the predicted values are added to the dataset to form a new dataset, and then the next prediction is performed until the final result is obtained, which completes the prediction task.
5. Conclusions
This paper presents a real-time weather monitoring and prediction system based on bus information management, combined with information processing and machine learning to complete the communication and analysis of information between buses, stations, and sensors. The proposed system contains four core components: (1) information management, (2) interactive bus stop, (3) machine learning prediction model, and (4) weather information platform. The website shows weather information via a dynamic chart. In addition to the current temperature, humidity, air pressure, rainfall, UV, and PM 2.5, the system provides a forecast of temperature, humidity, and air pressure for the next 24 h.
Although the proposed system achieves effective weather monitoring and information management, misalignment may be present due to the significant weather changes, which is the major challenge to overcome. In the future work, in addition to optimizing the system operation, we are planning to refine the prediction system, considering the deployment of nodes based on bus routes, the learning models, including more physical parameters, exploring the effects of forecast and measurement errors on the forecasting models, reanalyzing the dataset (e.g., performing data revisions), applying multiple data sources [
26] and information processing technologies, which may achieve better prediction accuracy.