1. Introduction
To deal with increasingly fierce market competition, manufacturers have transformed their policies by providing customers with customized products and services, quickly responding to diversified needs, reducing competition uncertainty, and obtaining satisfactory services. Manufacturers expect to maintain or even increase their sales through such a transformation under a potentially increasing inventory pressure. As the green production and the circular economy have gradually formed a consensus between production and sales, manufacturers have tried to address the above challenges and turn them into a positive force to solve market uncertainty and effectively manage their inventory of existing production models.
During the last two decades, significant research work has been reported in the literature. This work has demonstrated that demand forecasting is one of the main tools for evaluating and maintaining the market and has become the cornerstone of companies’ decision-making strategies [
1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12]. In practice, demand forecasting includes at least two parts, namely, production forecasting and inventory control. Production forecasting relates to actual sales, and the time series analysis has become one of the best solutions for sales forecasting. The use of time series models to assist business decision-making has proven its success in many sectors and industries, such as energy consumption forecasting in the petrochemical industry [
13,
14], station expansion and capacity growth forecasting in the bus system [
15], and economic and financial growth forecasting [
16]. The current literature on inventory control is often associated with supply chain management research [
3,
7,
8,
11]. Different production modes, such as make-to-stock (MTS), make-to-order (MTO), assembly-to-order (ATO), and build-to-forecast (BTF), apart from providing downstream customers with different levels of customized services, enable manufacturers to verify their management ability to balance revenue generation and inventory control.
From a practical viewpoint, companies require both accuracy and sustainability from the predictive models. Usually, there is a difference between observed and forecast results. Thus, the prediction results may not necessarily be acceptable (model accuracy) or reasonable (model sustainability). Conducting a verification process (accepted and certificated by companies) before selecting the predictive models is very important to effectively respond to this difference. For example, the sales record of commercial activities will inevitably encounter the no-order situation (for a day or consecutive days), not only during holidays but also during normal working days. However, in sales forecasting, the subjective comments (considering that the no-order situation is occasional and rare) often dominate the objective sale results and are unacceptable for predicting zero sales. They may also affect the preprocessing data methods before modeling. For example, in terms of data storage, records of no orders are usually indicated as blanks. In the following analysis, these blank records are classified as missing data, and the original series is defined as incomplete.
In the currently available literature on time series analysis with missing data, it has generally been assumed that the missing data are randomly missed, i.e., missing at random (MAR) or missing completely at random (MCAR). Moreover, different data imputation methods have been discussed and compared, and predictive modeling for the imputed data series has been performed [
17,
18,
19,
20,
21,
22,
23,
24,
25,
26]. The zero-forecasting method, which is used to predict a rare event or an intermittent demand [
27], has also been reported. Since a rare event is treated as a particular case as part of a business activity and the result is always predicted as zero, this method is not as popular as other methods, which are based on statistical learning (such as the autoregressive integrated moving average (ARIMA)) or machine learning (such as the long short-term memory (LSTM)). In the research conducted on intermittent demand forecasts, the missing data are preprocessed either by combining adjacent time periods [
8] or by defining the missing value as noise and then smoothing it out [
7,
21] or using min-max normalization to revise the data [
9]. Then, the transformed data series is processed using a typical time series analysis. Generally, in the studies reported in the literature, the best solution is sought to obtain a better model accuracy. Compared with model accuracy, in model sustainability, the long-term data background is observed and understood both before and after modeling. In actual business activities, companies may accept the additional costs predicted by inaccurate forecasting results. For example, overproduction will increase the inventory cost (high forecast, but low actual demand), and underproduction will increase the labor cost because of the overtime work required (low forecast, but high actual demand). Therefore, companies are concerned about taking appropriate actions in the shortest possible time to correct the discrepancies caused by forecasting. Moreover, the nature of the data is a key issue.
This article aims to determine appropriate methods that are capable of dealing with missing data by establishing an accurate and sustainable forecasting model on the basis of a specific sales data background and to provide a business reference. To this end, a set of real sales data regarding plastic injection tray products is empirically investigated. The products are the outer trays of consumer electronics chips; MTS and MTO are the company’s existing production modes. Many blank records in the dataset exist. In this article, zero-filling values in place of the blank records are proposed. Then, time series forecasting is performed on the recovered series. The results are compared with those obtained from the mean imputation method applied to different forecasters, including the Naive forecasting, the ARIMA, and the LSTM. As a conclusion, managerial insights are also proposed.
The rest of this article is organized as follows. The literature review is presented in
Section 2. The materials and methods used are described in
Section 3, and the numerical results are presented in
Section 4.
Section 5 provides the discussion and conclusion.
3. Materials and Methods
In this article, the data used for the empirical study were obtained from a plastic injection product manufacturing company. Its main business line is to provide downstream firms, which produce electronic chips (such as Sim Cards, ICs, Smart Cards, and Flashes), with packaging boxes (also called trays) of various specifications. In recent years, the theme of green production and the circular economy has been widely discussed and advocated. The recycled used trays are cleaned and reused, or remelted and reinjected to make products with different specifications. These are the practices proposed by manufacturers to deal with market competition. These practices are also environmentally friendly.
Conversely, the above practices have changed the previous supply and demand model. For example, the cycle of customers placing the orders is no longer fixed, the selection of new product specifications continues to increase, but the frequency and amount of a single product demand and a single order form may decrease. These revised business activities also increase the records of zero sale events during a typical working day. To deal with these new cases in their operations, companies must renew their service model and they require more comprehensive scientific management.
In this article, the flowchart presented in
Figure 1 was designed to include data acquisition, data preprocessing, modeling, evaluation, and till deployment. Two tasks were arranged in the preprocessing data phase: data transformation and missing data processing. The original data were initially transformed from a daily to a weekly format. Then, both the zero-filling and mean imputation methods were used to fill with values in the transformed series. Subsequently, the models designed by combining the missing data processing methods (zero-filling and mean imputation) and the forecasters in the modeling phase were applied to the filled series. The MAPE and MASE were used as indicators for model evaluation. A set of unused data (extracted from the transformed series) was explicitly used for model deployment to validate that the selected model was sustainable.
3.1. Data Preprocessing
Because of commercial confidentiality, customer information and product prices were removed from the data used in this article in advance. The raw data used were the order records of plastic injection products collected from 1 January 2017 to 30 September 2019. To validate the analysis results, the data were filtered, and the top 10 products in 2018 (fiscal year) were selected. The following two assumptions were considered in the analysis: the first one considered the exact unit price, regardless of differences in the specifications; the second one considered that there was no substitution effect between products.
3.1.1. Definitions of and Equations for RMS and MGR
The relative market share (RMS) and the market growth rate (MGR) are significant indicators for creating a Boston Consulting Group (BCG) Matrix (first introduced by Dr. Bruce D. Henderson, the Boston Consulting Group, 1970) [
57,
58]. The BCG Matrix is also called the Product Portfolio Matrix. A typical 2 × 2 matrix is used to position a firm’s competitiveness or a brand product in the local or global market. From a practical point of view, the term RMS relates to cash generation and cash usage performance.
Relative market share (RMS). The RMS is used to evaluate how far an owned product is from its leading competitor in the market. This indicator represents the competitiveness and completeness of a company’s products or brands. A high competitiveness leads to obvious and immediate high profits (the cash) for a company. However, if the company’s profit highly depends on a single or a few products, different business problems may arise, once the demand changes. A company with a high market share can gradually expand and boost the growth of other products and establish a complete commercial strategic value and market position. The RMS equation is given as follows:
The RMS is always a positive value, and its maximum value is 100% (or 1.00). Furthermore, by adopting the midpoint value (0.50), the market share status (of the product or brand) is indicated. For example, an RMS greater than or equal to 0.50 indicates a high (market) share, whereas an RMS less than 0.50 indicates an average (market) share.
Market growth rate (MGR). The MGR is used to measure the degree to which a firm’s or brand’s capital gain grows or declines year on year. This indicator represents the degree of change (increase or decrease) in the sales performance or the market share in a specific time (typically a year). The MGR equation is given as follows:
The MGR can reach a very high positive or negative value. If there is no sales record in the previous fiscal year, it cannot be calculated. A 10% annual rate is usually used to assess whether the growth is significant or not. A growth rate of more than 10% indicates a high growth, whereas a growth rate of less than 10% indicates a slow and moderate growth. Accordingly, an annual (negative) growth rate lower than −10% indicates a high decline, whereas a (negative) growth rate higher than −10% indicates a moderate decline.
3.1.2. Missing Data Processing Methods
In this article, the mean imputation and the zero-filling methods were proposed for processing missing data.
Mean imputation. According to its name, in this method, the missing data are replaced by their mean value, which is calculated from other valid data of a variable where the missing data are located. The advantage of this method is that the calculation is simple. Its disadvantage is that both the mean and standard deviation indicators increase after imputation.
Zero-filling. This method is also an imputation-type method, but the missing data are replaced by zeros. From the time series perspective, this method can explain why the data of a specific event (for example, the sale orders) have not been effectively collected at a specific timestamp (for example, during weekends or national holidays). In addition to retaining the nature of the data, the imputed series is also characterized by completeness.
Compared with the mean imputation, the zero-filling method can overcome the disadvantage of the mean value becoming large. Its disadvantages are a more significant variance and not being able to calculate the MAPE.
3.1.3. Data Split into Training, Test, and Validation Sets
The filled series (weekly format) are split into training, test, and validation sets.
Table 1 summarizes the definition of each set. The ratio between these sets is 8:2:1.
3.2. Forecaster
In this section, the Naive forecasting, the ARIMA, and the LSTM methods are introduced. These three forecasters were selected on the basis of their specific characteristics. The Naive forecasting method is one of the most frequently used tools by companies. It is a quick and easy method to use, but its latter forecasts are significantly affected by its former ones, especially when some impacts and uncertainties are not immediately observed (for example, in the case when a former forecast has failed and many latter forecasts become worse). The ARIMA method provides complex but delicate parameter settings. In this model, autoregression and moving average models are integrated, even if the series is stationary or not. The ARIMA is also a data-driven model; it can switch to the ARMA, AR, MA, or even seasonal SARIMA (SARIMA) models, depending on the data characteristics (trend, cycle, seasonality, and more). An effective ARIMA model requires the series data to be complete, but missing data will be frequently encountered in time series analysis. The LSTM is based on RNNs and can address the issue of missing data existence, either by doing nothing (directly ignoring the missing data) or accepting any specific imputation (single or multiple). Overfitting may also occur after executing a large number of iterations.
3.2.1. Naive Forecasting
The Naive forecasting model [
50] is the most straightforward time series approach and one of the most frequently used tools by companies. By definition, the last observation of the series is the forecast of the following data point. This is described by the following equation:
where
is the observation at time
and
is the forecast at time
. This approach works remarkably well for many economic and financial time series [
50].
3.2.2. ARIMA
The ARIMA [
50,
52] model is one of the most widely used approaches in time series forecasting. In this approach, the autoregression and moving average models are integrated. The approach also considers series stationarity and the selection of series transformation. The augmented Dickey–Fuller (ADF) test and the Kwiatkowski–Phillips–Schmidt–Shin (KPSS) test [
59] are valuable tools for detecting series stationarity. Initially, the ADF test is used to check if the series is trend stationary. Then, the KPSS test is used to check if the series is simply difference stationary, even if the series is stationary. Sometimes, modeling may still be complicated because of the presence of white noise and cycle behavior (no trends, no seasonality). To deal with this problem, data transformation, such as differencing and other methods (for example, smoothing and shift), can be used.
Three main parameters are required to configure the ARIMA model; p refers to the AR model, d denotes the integration steps, and q refers to the MA model. For a stationary series, autoregression is modeled using a linear combination of a variable’s past value. In other words, the term autoregression means a regression of the variable against itself. An autoregression model of order p is expressed as AR(p), which is formed as follows:
where
is the white noise and
is the forecast value using its lagged value as the predictor.
A moving average model of order
q is expressed as MA(
q), which is a regression-like model and is formed as follows:
where
is the white noise and
is the weighted moving average of the past few forecast errors (lagged errors). Generally, by combining differencing with an autoregression and a moving average model, a nonseasonal ARIMA model is obtained. This is expressed as ARIMA (
p,
d,
q), which is formed as follows:
where
is a forecast of the transformed series after differencing and the predictors include the lagged values of
and the relative lagged errors. The parameters p and q represent the order of the autoregression and the moving average model, respectively, and d denotes the steps of differencing conducted (if necessary) to integrate these two models.
The basic steps to build the ARIMA model involve first conducting the ADF and KPSS tests and then checking if the transformed series (after differencing) is stationary. The next step is to determine the best combination of p and q. The final step is to confirm whether the white noise follows the normal distribution.
3.2.3. LSTM
The LSTM is one of the most popular predictive models used in recent years. It was first introduced by S. Hochreiter and J. Schmidhuber in 1997 [
53]. The LSTM prototype comes from an RNN, which is a class of neural network models. By configuring memory feedback during the learning process, an RNN can improve the feedforward learning constraint, which exists in convolutional neural networks (CNNs). It then reduces the bias caused by overlearning. Based on an RNN, the LSTM can solve other complicated problems, which are derived from adding different background factors during the learning process, for example, how to set the feedback position when the occurrence of events is no longer in a fixed order; alternatively, whether the learning memory should be retained or dropped if the time interval is inconsistent, etc. The usual case is the following: in a fixed order of events (for example, first is event A and then followed B, C, D, and E), either C or D could be in a feedback loop, but what if B or E is missing?
The LSTM includes an input gate, an output gate, a forget gate, and a cell. The information enters the cell through the gates and exits as numbers (between 0 and 1). A zero means that all the information has been completely dropped out, whereas a one means that all the information has been completely retained.
Figure 2 shows a diagram of an LSTM with a single cell.
From the input to the output , four functions are executed in a single cell. These functions are divided into three steps and then are integrated (by applying the addition and multiplication operations) step by step, until the output is ready to be produced.
In Step 1, a function,
, is used to determine the information to be dropped out of the cell state. This function is a typical sigmoid function involving the input
, the output
of the previous cell, the weighted decay
, and the bias
. The output is a number between 0 and 1, which is then multiplied with
of a previous cell and moves forward.
is described as follows:
Step 2 includes two functions (
and
), which represent the retained information and a new candidate vector, respectively. This vector is created by the hyperbolic tangent function (tanh). This step is used to decide the new information to be stored in the cell state. This information is then added to the output generated in Step 1 and moves forward.
,
, and
are given as follows:
Step 3 is used to decide the information to be exited from the cell. The output function
is executed by a sigmoid function involving the input
, the output of the previous cell
, the weighted decay
, and the bias
.
is described as follows:
The last update of the cell state
(obtained from Step 2) is first created by applying tanh (i.e., values between −1 and 1 are produced). It is then multiplied with the output function
to generate the final output
of the cell as follows:
3.3. Model Performance Indicator
By combining the missing data processing methods (
Section 3.1.2) and the proposed forecasters (
Section 3.2.1,
Section 3.2.2 and
Section 3.2.3), a total of six combined models are obtained. These include Naive forecasting + zero-filling, Naive forecasting + mean imputation, ARIMA + zero-filling, ARIMA + mean imputation, LSTM + zero-filling, and LSTM + mean imputation. Each of the selected products implements the six combined models and evaluates the model performance based on the produced indicators. Based on the original series, which includes missing data, and the zero-filling method proposed in this article, the MAPE is defined as the first indicator that evaluates and filters which combined model is the most appropriate. The second indicator used for further filtering is the MASE, an indicator that can handle zero counts. These two indicators are used to filter the models and help evaluate if the selected models are reliable.
3.3.1. MAPE
The MAPE is defined as a loss function type by its definition [
60]. Similar to other indicators (including the MSE and RMSE), the MAPE is widely used for model accuracy evaluation. This indicator transforms the initial deviation into an absolute value form. In other words, it imposes a heavier penalty on the positive errors usually caused by overestimation. The MAPE equation is given below:
where
and
are the actual value and the forecast value, respectively, of the
i-th data point and
is the length (or the number of forecasts) in a given period. A small MAPE value means that, on average, the selected model provides relatively accurate results. The two drawbacks of MAPE are as follows: (1) its inability to handle zero values [
61] and (2) the asymmetry problem due to large numbers [
55]. The MAPE is unable to calculate if an actual value corresponds to the missing data. Even so, it is relatively effective in using zero values to replace the missing data since the calculated MAPE indicator is always positive. Moreover, it imposes a relatively heavy penalty for positive errors caused, for example, by overestimation [
62].
3.3.2. MASE
The MASE was first proposed by Hyndman and Koehler in 2005 [
61]. It is a scale-independent indicator for measuring the forecasting accuracy. Because of its scale-independent characteristic, the MASE handles the zero values directly and imposes an equal-weight penalty on both the positive errors (caused, for example, by overestimation) and the negative errors (caused, for example, by underestimation). Generally, the MASE overcomes the significant drawbacks of the MAPE. For time series data, if they are nonseasonal, the MASE is calculated using the following equation:
The denominator is the mean absolute error of the one-step Naive forecasting on a training set with
data points. If the series contains seasonal factors, the period
of the training set is redefined. A MASE value of less than one means that the proposed model produces more minor errors than a one-step Naive forecasting [
37]. In other words, a MASE value greater than one means that the performance of the proposed model is worse than that of the Naive forecasting.
3.3.3. Within-Mean Difference
To effectively illustrate the achieved performance of the selected models on the validation set, a specific indicator named within-mean difference (WD) is introduced in this article. The WD is the percentage difference (%) between the forecast and the actual values, when it is applied to the validation set. The WD formula is given as follows:
A WD value close to zero indicates that the difference between the forecast and the actual values is small, which means that the selected models perform well. A positive WD value indicates that the selected models lead to overestimation, whereas a negative WD value indicates that the trained model lacks fitting (i.e., insufficient estimation). When a WD exceeds 100% or drops below −100%, it is recommended not to further use these models because of their poor performance.
3.4. Research Questions
Through the empirical analysis, this article aims to answer the following two classic research questions:
RQ1: Which combination of the missing data processing methods (deletion, mean imputation, and zero-filling) and the forecasters (ARIMA, LSTM, and Naive forecasting) achieves the best performance (MAPE, MASE) for specific products?
RQ2: Which missing data processing method is mostly recommended for individual forecasters?
5. Discussion
The results presented in
Section 4.3 have answered the two research questions set in
Section 3.4.
Table 6 presents the high share and high growth products. It also reveals that regardless of the forecaster it was combined with, the zero-filling method almost achieved the best performance in the model evaluation and deployment. Among the individual forecasters, the ARIMA model achieved the best performance, followed by the LSTM, and the Naive forecasting models.
Table 7 presents the high share, but high decline (in the last fiscal year) products. Again, the zero-filling method performed better than the mean imputation method regardless of the forecaster used (i.e., Naïve forecasting, ARIMA, or LSTM models). Among the individual forecasters, the Naive forecasting model achieved the best performance, followed by the ARIMA and the LSTM models.
Based on the above results, it is evident that the zero-filling method is the most suitable for high market share products. The ARIMA and Naive forecasting models can also be used, depending on whether the product grew significantly or seriously declined in the previous year. A high market share represents a relatively stable cash flow for companies that provide many diversified and customized products. Regardless of the growth or recession in sales, this is a great challenge and has an impact on cash management. In a BCG matrix, the possible roles are the Star and Cash-cows. For a long-term development, companies must prioritize the changes in demand for products with a high market share.
Table 8 presents the average share and high growth products. The ARIMA model performed better than the LSTM model, followed by the Naive forecasting model, regardless of the missing data processing method. Conversely, regardless of the forecaster used, it was difficult to decide whether the zero-filling or the mean imputation is better. Based on a smaller WD, the Naive forecasting + mean imputation model is suggested.
Table 9 presents the average share, but in reference to high decline products. The LSTM + mean imputation model is suggested in this case.
The common features of
Table 8 and
Table 9 are those products that both represent the average market share and use the mean imputation to deal with the missing data. The selection between the Naive forecasting and LSTM models depends on the growth or recession of the last fiscal year. Against the previous two products (BGA 8 × 13 mm and TSOP II 54/86P), TQFP 14 × 14 × 1.4 and TSOP II 54/86 135′C products are significantly affected by environmental factors. For example, high growth indicates potential, but an average share means that the expected potential cannot be fully confirmed and accepted.
Conversely, high decline means the product is possibly out of fashion but still survives because the average-share feature means it can still bring in cash. In a BCG matrix, the possible roles are Problem child and Dogs.
Table 10 presents a general summary of the previous results.
6. Conclusions
The primary objective of this article was to prove that a dedicated time series model can provide accuracy and sustainability for the sales forecasting of a specific product. Part of the empirical study results achieved this objective. For example, the ARIMA + zero-filling model can predict high share and high growth products. Although there was an underestimation of approximately 17%, this gap could effectively be filled by a correction strategy in real production. The second objective was to prove that a practical observation of the data background helps the appropriate method for processing the missing data to be selected. Four specific products with different backgrounds consistently proved that the zero-filling method achieves the best modeling and deployment performance, regardless of the forecaster it is combined with. By applying the same modeling process, apart from the average share and high growth products (no other products were matched), 6 of the top 10 products sold all led to the same conclusion (see
Appendix A). The case company has recognized the case analysis results. Thus, it can be further confirmed that the two propositions of this article can be applied to a company’s hot-selling products. They can also be used as a managerial reference for other companies with similar data backgrounds.
In this article, the empirical case of a univariate analysis was presented and the paper successfully dealt with the actual case problems related to the sales forecasting performance for plastic tray manufacturing. By adding other background factors, the contents of the analysis could be enriched to reduce modeling uncertainty and to provide more accurate results. From a practical point of view, when a specific model can be applied to other similar background products and produce effective forecast results, this model can be defined as a guide model. The integration of the BCG matrix and guidance models would enhance the efficiency of multi-item demand forecasting decision making. Moreover, the establishment of the forecasting framework will allow the application of multi-item forecasts. In future research work, the guide models could be figured out and tested for batch processing and multiple product management. The advantages of such an approach include a reduced calculation time, reducing the amount of substitution of other internal products, and even making a direct purchase list for customers. Companies should conscientiously master the production and stock management of specific products, continually provide excellent service regarding fulfillment and shipment, and carefully evaluate the potential of other products.