In this section, we will evaluate the performance of the TransTLA model in home appliance sales forecasting from various perspectives. First, we perform parameter selection experiments to identify the optimal values for key model parameters. Then, we implement ablation experiments to analyze the contribution of each model component to the overall performance. Finally, we compare the model with other forecasting methods.
5.1. Data Description and Preprocessing
Our study aims to refine the precision of home appliance sales forecasting via TL. In the experiment, we employ real sales data of home appliance products from the Chinese market. Rich historical sales data from outlets in large cities are selected as the source domain, while small towns or newly established outlets with lower market maturity and scarce data are chosen as the target domain. Specifically, we select two geographically close cities, City A and Town B in the Yangtze River Delta region, as the subjects of our study to ensure a certain degree of similarity in features between the source and target domains, such as geographical location, climate, and weather conditions. This similarity helps the model to better adapt to the characteristics of the target domain.
The source domain dataset covers 1260 days of air conditioner sales data from City A from 2020 to 2023, while the target domain dataset includes 405 days of air conditioner sales data from Town B from 2022 to 2023. These data provide a solid foundation for subsequent experiments, enabling us to conduct in-depth analysis and forecasting of home appliance sales patterns.
During the data preprocessing phase, the collected source and target domain datasets are first cleaned and filtered to remove outliers and noise. Missing values in the datasets are smoothed through mean imputation, which helps maintain data integrity and reduces the potential impact of missing data on model training.
We apply min–max normalization to standardize the data across variables, as shown below:
where
and
represent the original data and normalized data, respectively, with
and
being the minimum and maximum values in the original sequence.
To visually present the experimental datasets and analyze the sales changes over time more accurately and meticulously for the two domains, we normalize and align the sales data of the source and target domains on the time axis, as shown in
Figure 7. It can be seen that the sales trends from both domains exhibit a degree of similarity, suggesting that, despite variations in data volume and market maturity, there exists some shared patterns in sales fluctuations. This commonality offers robust empirical evidence that is instrumental for the domain adaptation training of the TransTLA model.
The dataset is partitioned into three distinct subsets: training (70%), validation (10%), and testing (20%). This division is strategically designed to address the limited availability of the target domain samples by assigning a larger training dataset, ensuring appropriate data support for the model’s training, optimization, and assessment. To model the sequences, a sliding time window technique is employed, with a fixed window size, such as 7 days, to capture consecutive time intervals. This approach allows for overlapping windows, enabling the model to effectively handle the sequential and evolving nature of sales data.
5.4. Parameter Selection Experiments and Results
In the parameter selection experiment, we aim to determine the optimal parameter settings for the TransTLA model to optimize performance on home appliance sales forecasting. We focus on the impact of weight parameters in the composite loss function and the effect of the varying historical window sizes.
The composite loss function (see Equation (20)) includes source domain loss, target domain loss, and domain adaptation loss, with hyperparameters α and β balancing their contributions.
α is a critical hyperparameter that balances the weights between the source domain regression loss and the target domain regression loss. We explore α values from 0 to 1 with steps of 0.1 to determine its effect. Specifically, when α equals 0, the influence of the source domain regression loss is not considered; only the target domain regression loss and domain adaptation loss are used in the backpropagation. When α is set to 1, it indicates that the total loss consists entirely of the source domain regression loss and domain adaptation loss. For each value of α, the model is trained with a consistent structure and parameters as described in
Section 5.3. The historical backtracking window length is set to 7 days, and β is set to 0.1. We use a sliding window to predict the sales volume for the next day. The evaluation metrics for sales volume prediction under different proportions of source domain regression loss are presented in
Table 3, with comparative curves for the MAE and RMSE indicators illustrated in
Figure 8.
Both
Table 3 and
Figure 8 indicate that the model performs well when the value of α is in the middle range, specifically at 0.5 or 0.6, with a small difference between them. As the value of α moves away from the middle, the predictive performance declines, suggesting that equally considering the regression losses of both the source and target domains leads to a more balanced effect, which helps the model maintain a better performance when transferred to the target domain. Therefore, 0.5 is chosen as the optimal parameter setting for α.
It can be observed that, when α = 0 or α = 1, the model’s performance is not ideal, indicating that relying solely on target domain data or source domain data is not the best strategy. A moderate amount of source domain information helps improve the predictive performance in the target domain task, but overreliance on source domain information may introduce noise or inadaptability, thereby reducing the model’s performance.
Next, we explore the impact of the hyperparameter β on the performance of the TransTLA model. β regulates the impact of the domain adaptation loss, which is crucial for minimizing the differences in feature distributions across source and target domains, thereby enhancing the model’s performance on the target domain. Experiments vary β from 0 to 1 in 0.1 increments, with other parameters held constant. At β = 0, the model disregards domain adaptation loss, focusing solely on regression losses. With α set at 0.5 and a 7-day backtracking window, the model forecasts the next day’s sales. The results under different domain adaptation loss ratios are detailed in
Table 4, with the MAE and RMSE curves compared in
Figure 9.
The analysis of the data and comparison curves from
Table 4 and
Figure 9 reveals that an initial increase in β from 0 to 0.1 enhances the model performance. However, beyond this point, further increments in β result in performance fluctuations rather than a steady improvement. A β that is too high could cause the model to prioritize reducing domain discrepancies over optimizing the regression task, potentially degrading the overall performance. This suggests that there is a sweet spot for balancing regression loss and domain adaptation loss, which is essential for model tuning. Notably, when β is set to 0.1, the model achieves the best results in forecasting home appliance sales. Consequently, this paper adopts β = 0.1 for all subsequent experiments.
The size of the historical backtracking window determines the length of historical information considered by the model when making predictions. Choosing the appropriate window size is crucial for the model to capture long-term dependencies in time series and to improve the predictive performance. To quantitatively analyze the impact of different backtracking window sizes, the size is varied from 4 to 12 days, generating corresponding datasets for training and evaluation. With α = 0.5 and β = 0.1, the model predicts the sales volume for the next day. The predictive efficacy is evaluated, with the results and comparison curves presented in
Table 5 and
Figure 10, respectively.
The experimental results from
Table 5 and
Figure 10 indicate that the model’s predictive performance follows an increasing and then decreasing trend with the expansion of the historical backtrack window size. Specifically, when the window size is less than 7 days, the model’s predictive accuracy improves with the enlargement of the window, likely due to the small window’s inability to capture long-term dependencies in the time series, thus limiting its predictive power. However, when the window size exceeds 7 days, the performance begins to decline, possibly because excessive historical information introduces noise and irrelevant data, which reduces its relevance to current sales trends and interferes with the model’s prediction of current sales. Additionally, considering the limited number of samples in the target domain of this paper, a larger historical backtrack window could theoretically provide more data but actually lead to a further reduction in available training samples. This not only decreases the model’s training efficiency but also, due to data scarcity, may prevent the model from being adequately trained, ultimately affecting its performance and generalization capability. When the backtrack window size is set to 7 days, the model achieves the best predictive performance, outperforming other window sizes across all evaluation metrics. In summary, a 7-day historical backtrack period is used for subsequent experiments.
In the real-world home appliance market, accurate sales forecasting is crucial for strategic planning, inventory optimization, and flexible sales strategies. Consequently, it is necessary to further evaluate the performance of the proposed TransTLA model in multi-step forecasting. We employ a 7-day historical sliding window to generate predictions for the next 1 to 4 days, which are then compared to actual sales data. For instance, to predict the next 3 days, the model uses real data from the previous 7 days to forecast the sales for the next 3 days at once. Subsequently, the sliding window moves forward by 3 days, and the process is repeated.
Figure 11 presents the home appliance sales forecast results of the TransTLA model on the test set under different prediction steps (1 to 4 days ahead). The blue lines represent the actual sales volumes, while the orange lines indicate the predicted values. The model demonstrates high accuracy and sensitivity for short-term forecasts of 1 and 2 days, closely tracking actual sales. As the prediction horizon extends, slight discrepancies emerge between the forecasted and actual sales volumes. This is expected due to the increased uncertainty associated with longer-term predictions. However, the model still captures the overall long-term sales trends effectively, especially during significant upward or downward trends, showing strong trend detection and robustness.
5.5. Ablation Experiments and Results
To comprehensively evaluate the contribution of each component in the TransTLA model to the overall performance of home appliance sales forecasting, this paper conducts a series of ablation experiments. The essence of ablation experiments is to remove or modify key components of the model gradually, compare the performance of the complete model with the ablated models in forecasting tasks, and thus deeply understand and quantify the specific impact of these components on the overall model performance.
The experiments focus on four key parts, i.e., TCN layers, LSTM layers, attention mechanism layers, and TL mechanisms, and various combinations of these parts, respectively. Notably, when TL is not used, the model is trained and predicted using only the target domain dataset. Models with the prefix “Trans” indicate the use of TL strategies, which means training with both source and target domain data and evaluating the testing set of the target domain. The ablation experiments in this study cover a time span of 1 to 4 days in the future to fully analyze the impact of each component on the model’s forecasting ability. Through these ablation experiments, we can not only identify the model components that are crucial for the forecasting performance but also understand whether the role of these components differs significantly under different prediction time spans. To address the inherent stochasticity in DL models due to random weight initialization and optimization processes, we conducted 10 separate evaluations for each model.
Table 6 lists the average evaluation metrics of the models under different ablation configurations, with the standard deviation indicated in brackets.
Figure 12 visually displays how the MAE metric of different models changes with the time span. Additionally, to statistically confirm the significant differences in the performance of the models, we performed independent
t-tests, considering a significance level of
p < 0.05.
From
Figure 12, one can see that the predictive accuracy of all models declines as the forecast horizon increases, highlighting the complexity of long-term forecasting. This decline is due to escalating uncertainty and the diminishing relevance of historical data to future sales over time.
In single model structures, TCN and LSTM perform relatively well for a forecast step of one but deteriorate rapidly as the steps increase. Particularly, LSTM’s R2 becomes negative at step 4, indicating its predictions are less effective than a simple mean forecast and have very low relevance to the actual sales data. Thus, LSTM cannot effectively capture future trends. In contrast, TCN shows some predictive power at step 4.
Furthermore, the TCN-LSTM model shows some advantages at steps 1 and 2 but has poor performance at steps 3 and 4, indicating limitations in handling long-term forecasting tasks. However, the introduction of the attention mechanism improves the performance of the TCN-Attention, LSTM-Attention, and TCN-LSTM-Attention model, which achieves better results across single and multiple forecast steps, demonstrating that the attention mechanism helps the model focus on key information in the time series and confirms the advantage of hybrid model structures in capturing complex time series features.
The introduction of TL mechanisms enhances the model’s predictive performance. Such improvement can be attributed that leveraging the source domain data can effectively enhance the model’s understanding of the target domain. Comparisons clearly show that the proposed TransTLA model outperforms other ablation models at most forecast steps, with the percentage MAE improvement displayed in
Table 7. At a one-day forecast step, the TransTLA model reduces the MAE by 6.66% compared to the TCN model and by 8.55% compared to the LSTM model, with 5.25% improvement over the non-TL TCN-LSTM-Attention model. When predicting four days, the TransTLA model further reduces these metrics by 6.13% compared to the TCN model, by 36.31% compared to the LSTM model, and by 10.64% compared to the TCN-LSTM-Attention model. This indicates that combining attention and TL mechanisms with TCN and LSTM can better leverage their respective advantages for more accurate sales forecasting. The TransTLA model effectively captures both short-term and long-term dependencies in time series data, focuses on key time steps through the attention mechanism, and learns useful knowledge from the source domain through TL, thereby improving the forecast accuracy.
In summary, the ablation study reveals the key role of the TCN layers, LSTM layers, the attention mechanism, and TL in improving the accuracy of home appliance sales forecasting. The TransTLA model integrates these components to achieve optimal performance across different forecast time spans.
5.6. Comparative Experiments and Results
To further validate the effectiveness and superiority of the TransTLA model for home appliance sales forecasting, comparative experiments are conducted with several time series forecasting models to comprehensively assess the performance of the TransTLA model. In addition to the proposed TransTLA model, the comparison includes the following models:
ARIMA: Autoregressive Integrated Moving Average, a classic and commonly used statistical method that predicts by capturing linear dependencies in time series data.
XGBoost: Extreme Gradient Boosting, an advanced and scalable ML method that enhances model performance by combining the predictions of multiple weak learners and using gradient descent to minimize errors.
SVR: Support Vector Regression, a ML method that extends Support Vector Machines (SVMs) to regression tasks, aiming to find a function within a margin of error while handling nonlinear relationships using kernel functions.
RNN: A type of neural network designed for sequence data, characterized by maintaining states in the hidden layers to influence subsequent outputs with previous information. In the experiment, a two-layer RNN stack processes the input, followed by two fully connected layers to produce the forecast.
CNN: A DL model initially used in image recognition but also effective in extracting local features of time series through one-dimensional convolution. The CNN structure employs one-dimensional convolution, average pooling, and ReLU activation function, stacked twice, followed by two fully connected layers for the output.
CNN-LSTM [
18]: A combined model that first extracts local features with CNN layers and then captures temporal dependencies with LSTM layers to enhance the forecasting performance.
CNN-LSTM-Attention: A model that further combines the advantages of CNN, LSTM, and attention mechanisms to capture complex patterns and dependencies in time series data through layered abstraction and precise information filtering.
TransCNN-LSTM-Attention: A model that incorporates TL strategies into CNN-LSTM-Attention to explore the impact of TL on sales forecasting performance.
Experiments to predict sales volume for 1, 2, 3, and 4 days in advance are designed for each model. The results of various evaluation metrics in the comparative experiment are shown in
Table 8, and
Figure 13 displays the MAE comparison of different models across various forecast horizons.
The experimental results from
Table 8 and
Figure 12 demonstrate that, as the forecast horizon lengthens, the predictive performance of all models declines due to the increasing uncertainty associated with making longer-term predictions. Overall, ML- and DL-based models outperform the traditional statistical ARIMA model. Across all evaluation metrics and forecast steps, RNN, CNN, and their variants consistently perform significantly better than ARIMA. Particularly at the forecast steps of 3 and 4 days, for ARIMA, the values of R
2 drop to −0.292 and −0.723, respectively, indicating that its predictions are less effective than a simple mean forecast and exhibit a substantial negative correlation. This suggests that traditional statistical models struggle to effectively capture the nonlinear trends and complex patterns in the time series data of home appliance sales.
For ML-based methods, XGBoost and SVR show nuanced differences in their predictive performances over 1–4 day forecast horizons. XGBoost starts strong with an R2 of 0.652 on 1 day but sees a significant drop by 4 days (R2 of 0.105). Conversely, SVR maintains a more consistent performance, demonstrating its stronger capacity to handle extended forecasts. While both models’ accuracy declines over longer horizons, SVR consistently outperforms XGBoost across most metrics and steps, indicating better accuracy and robustness in handling nonlinear time series data complexities. Nevertheless, these ML models still underperform compared to DL models, particularly for longer forecasts.
Further analysis of DL models reveals that combined models perform better than individual RNN and CNN base models. RNN shows better performance in short-term forecasts (e.g., 1-day step) but a significant drop in performance for long-term forecasts (e.g., 3 and 4 days). This may be due to challenges in capturing long-term dependencies in time series data and susceptibility to the vanishing gradient problem. CNN exhibits a relatively stable performance across various forecast steps but falls short of the combined models in overall predictive accuracy, indicating that convolutional operations alone may not be sufficient to fully explore the temporal characteristics of time series data.
Combined models like CNN-LSTM and CNN-LSTM-Attention achieve superior predictive performance, with CNN-LSTM-Attention showing a high R2 value (0.797) at a 1-day step and maintaining a relatively stable performance even at a 4-day step. This indicates that integrating convolutional operations with LSTM or attention mechanisms can better model time series data, capturing both local and global features, thereby enhancing the predictive accuracy.
The TransCNN-LSTM-Attention model, which incorporates TL, demonstrates exceptional performance in both short-term and long-term forecasting tasks, outperforming CNN-LSTM-Attention in most evaluation metrics and forecast horizons and ranking just below the TransTLA model overall. This shows that TL can further enhance sales forecasting performance by leveraging knowledge from related domains.
Notably, the TransTLA model proposed in this paper exhibits the best overall predictive performance across all forecast horizons.
Table 9 presents the percentage MAE improvement of the TransTLA model over some comparative models. When predicting 1-day sales, its MAE is reduced by 9.83% compared to the CNN-LSTM-Attention model and by 9.29% compared to the second-best TransCNN-LSTM-Attention model. Especially in multi-step forecasting, its evaluation metrics are better than other comparative models. For instance, at a 3-day forecast step, the MAE is reduced by 12.12% compared to the CNN-LSTM-Attention model and by 10.53% compared to the TransCNN-LSTM-Attention model. This indicates that the components of the TransTLA model, such as TCN and TL, can more effectively mine intrinsic patterns and trends from time series data, further validating its high efficiency and robustness in multi-step forecasting tasks.
In addition to the superior predictive performance, it is also crucial to consider the computational complexity of these models.
Table 10 presents the number of floating-point operations (FLOPs), multiply–accumulate operations (MACs), and parameters (Params) for different models used in the experiments, based on the settings described in
Section 5.3.
From
Table 10, we can observe that the TransTLA model has higher FLOPs and MACs compared to most other models, indicating greater computational complexity. However, this complexity is justified by its significantly improved predictive performance. When comparing with simpler models like ARIMA, XGBoost, and SVR, the complexity of DL models including TransTLA is considerably higher. Traditional models such as ARIMA and SVR generally have lower computational complexity due to their simpler structure and fewer parameters. XGBoost, while more complex than ARIMA and SVR, still does not reach the level of complexity seen in DL models. However, these traditional models often fall short in capturing the intricate patterns and dependencies in time series data, especially for multi-step forecasting tasks, where the TransTLA model excels.
It is worth noting that the addition of TL in models leads to the number of parameters remaining unchanged, but the computational cost actually doubles. This is because data from both the source and target domains will pass through the model.
In summary, the comparative experimental results clearly show the performance differences of various model structures in the task of home appliance sales forecasting. The TransTLA model’s overall performance is significantly better than other classic and combined models, demonstrating excellent time series forecasting capabilities. This not only validates the effectiveness of its design but also provides a powerful tool for sales forecasting in the home appliance industry.