Next Article in Journal
Electricity Consumption and Efficiency Measures in Public Buildings: A Comprehensive Review
Previous Article in Journal
Renewable Energy Policies in the USA: A Comparative Study of Selected States
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:

Synergistic Non-Intrusive Load Monitoring: Dual-Model Training and Inference for Improved Load Disaggregation Prediction

Department of Informatics, Technische Universität Clausthal, 38678 Clausthal-Zellerfeld, Germany
Author to whom correspondence should be addressed.
Energies 2025, 18(3), 608;
Submission received: 7 January 2025 / Revised: 20 January 2025 / Accepted: 23 January 2025 / Published: 28 January 2025
(This article belongs to the Section F5: Artificial Intelligence and Smart Energy)


Load disaggregation is the process of identifying an individual appliance’s power demand within aggregate electrical load data. Virtually all recently proposed disaggregation methods are based on neural networks, thanks to their superior performance. Their achievable accuracy is, however, often limited by the quality of the data that have been used to train the underlying neural networks. In particular, if electrical appliances exhibit vastly differing temporal schedules and operational modes, their heterogeneous power consumption data can poison the disaggregation models and thus lead to a degraded overall accuracy. This paper presents a novel load disaggregation approach that relies on the use of multiple disaggregation models. We use a clustering method to partition training data by their power consumption patterns, allowing our approach to train separate models for appliance data exhibiting similar electricity consumption patterns. During disaggregation, our scheme selects the model that is closest to the appliance’s power consumption pattern, thereby leveraging the most fitting model for each specific instance. We demonstrate how our method surpasses the accuracy of the individual models designated for each cluster, while simultaneously improving upon the baseline model’s performance. When applied to the widely used ECO dataset, our approach achieves an average improvement of 13% in disaggregation accuracy over the use of a single baseline model.

1. Introduction

Load disaggregation, also known as non-intrusive load monitoring (NILM), has emerged as a pivotal technology in the realm of energy management. It enables the disaggregation of a building’s total electricity consumption into appliance-level usage data [1,2]. The integration of NILM with Internet of Things (IoT) and Home Energy Management Systems (HEMSs) has further expanded its utility and application scope [3]. Moreover, NILM’s applications cover various domains, ranging from residential settings [4] to industrial environments [5] and beyond. For example, Athanasiadis et al. have recently proposed a holistic, multi-objective HEMS that leverages NILM insights to optimize electric and thermal loads. Their approach jointly minimizes energy costs, CO2 emissions, and user discomfort, thereby highlighting the benefits of integrating accurate load disaggregation techniques into energy management systems [6]. Despite these diverse applications, NILM faces challenges concerning the generalizability of its methods across different settings. Each domain presents unique challenges in terms of appliance types, usage patterns, and environmental conditions, which impact the effectiveness of NILM techniques [7]. Additionally, data integrity threats, such as the injection of false data, can hamper data-driven NILM. Wang et al. recently introduced a spatio-temporal detection approach for malicious power system data [8], underscoring the need for robust anomaly handling. Such threats can restrict the effectiveness of clustering-based data partitioning and thus adversely affect disaggregation accuracy.
Recent advances in domain adaptation and federated learning (FL) have shown promise in addressing generalization challenges in NILM. For instance, PerFedNILM [9] has introduced personalized FL, training models tailored to individual clients, thereby reducing the impact of unique power consumption patterns on performance. Similarly, source-free domain adaptation methods, such as those proposed in [10], utilize self-supervised learning to adapt models to new environments without requiring access to the original source data. Additionally, synthetic-to-real domain adaptation techniques have been proposed to bridge the gap between training data and real-world application [11]. In a similar vein, HeartDIS [12] leverages both real and synthetic data to develop personalized disaggregation models, further demonstrating how synthetic data can mitigate data scarcity while improving model robustness. These methodologies inspire a need for NILM approaches that are inherently adaptable and responsive to diverse data distributions.
New possibilities in enhancing the effectiveness of NILM have also opened up as a result of the evolution of data analytics and machine learning [13]. Clustering methods, particularly those based on Euclidean distance, are popular due to their simplicity and computational efficiency [14], while also catering to their explainability by design. This paper leverages these advantages of clustering electricity consumption data in order to make an informed selection of the best-suited disaggregation model. In order to choose the right neural network model, we analyze the input data in time windows. We define a window as the finite-length time series of electrical power consumption data that is used as the input data for load disaggregation. Although these clustering methods, as highlighted by Wen et al. in [15], might not detect all types of shape similarities effectively, their simplicity and speed make them particularly suitable for applications where computational efficiency to compute a similarity measure for each window of consumption is a priority.
This paper introduces a methodology that capitalizes on clustering-based segmentation, specialized model training, and a dynamic selection mechanism to address the diverse range of appliance consumption patterns found in real-world settings. The key contributions of our method are as follows:
  • Rather than relying on a single generic model, we employ multiple specialized models, trained on clusters of time series windows with similar shapes and usage behaviors.
  • We introduce an on-the-fly model selection strategy at inference time. This way, our framework dynamically identifies the most suitable model for all input data.
  • We evaluate the performance of our multi-model NILM method on a publicly available dataset and demonstrate that our clustering-driven solution outperforms baseline models that rely on just a single disaggregation model.
  • Notably, the computational overhead of clustering and model selection remains manageable in practice, preserving the real-time capabilities often demanded by NILM systems.
Our evaluation in Section 4 confirms that these contributions collectively achieve superior performance over existing NILM methods while maintaining computational efficiency.

2. Related Work

The field of NILM has been an area of active research for several decades, evolving significantly since its inception. The concept was first introduced in 1985, outlining the potential of disaggregating total electrical load into its constituent appliances without individual monitoring [1,2]. This seminal work laid the foundation for subsequent research in the field. With the advent of artificial intelligence and machine learning, NILM research shifted towards more sophisticated techniques. Studies on NILM have employed various machine learning algorithms, including support vector machines [16] and neural networks [17], to improve disaggregation accuracy. These methods, although more effective than their predecessors, often require extensive training data and computational resources.
Cluster-based training approaches have demonstrated potential in some machine learning applications: Kang et al. explored the use of cluster-based sampling for selecting an initial training set in active learning for text classification. Their method involved dividing unlabeled examples into clusters and selecting the most representative example from each cluster, which could include model examples acting as cluster centroids [18]. This approach not only expedited the learning process but also achieved higher accuracy more quickly, as compared to traditional random sampling methods. Similarly, Czarnowski proposed a cluster-based instance selection approach, where the training data are grouped into clusters prior to the learning process. This method, executed by a team of agents, resulted in a more efficient instance reduction tool, contributing to higher-quality machine classification [19]. Both studies underline the efficacy of cluster-based approaches in managing large datasets and improving learning outcomes. Drawing inspiration from these methodologies, our research applies a novel cluster-based training approach in the context of NILM, aiming to enhance the predictive performance by tailoring the learning process into distinct consumption patterns identified through clustering.
The application of clustering methods in NILM has been explored in various contexts, reflecting the versatility and potential of these techniques in enhancing NILM’s performance. Desai et al. utilized unsupervised clustering to evaluate the performance of NILM algorithms, demonstrating the utility of clustering in understanding and improving NILM techniques [20]. This approach underscored the importance of clustering in discerning operational states of devices. In the domain of forecasting, spectral clustering has been integrated with NILM for residential power forecasting in [21]. The introduced method leveraged the disaggregated appliance power profiles for predicting future power consumption, highlighting the potential of clustering in enhancing the predictive capabilities of NILM systems. Furthermore, Lin et al. employed fuzzy clustering, specifically Fuzzy C-Means, to identify the statuses of individual appliances in NILM systems [22]. By utilizing the inherent fuzziness in appliance operation states, a more nuanced understanding of appliance behaviors was achieved, leading to improved disaggregation accuracy. In unsupervised settings, hierarchical clustering has been utilized to group appliances based on their consumption characteristics to facilitate the training process of event-based NILM systems [23]. The primary focus of these studies on clustering was on enhancing specific aspects of NILM, such as performance evaluation and appliance state identification. However, to the best of our knowledge, none have attempted to apply input window clustering in the way proposed in our study.
In summary, clustering methods have been integrated into NILM pipelines for various purposes, ranging from performance evaluation [20] and forecasting [21] to fuzzy classification [22]. Yet, these approaches typically focus on identifying appliance states or grouping loads into broad operational clusters. In contrast, our work leverages a distance-based window clustering of the aggregated input data. A fine-grained, repetitive signal shape detection is used to subsequently train multiple specialized disaggregation models, each aligned with different clusters of consumption windows. The most suitable model is then dynamically selected at inference time. By focusing on window-level clustering rather than grouping labeled appliance states, our approach is able to better capture diverse operational modes and thereby improve disaggregation accuracy. This perspective uniquely complements prior NILM methods by targeting shape similarity in aggregate load segments, rather than clustering purely at the appliance-usage or state-transition level.

3. Methodology and Framework

Our contribution is motivated by the observation that existing works (such as [13,17,24,25]) have coherently shown an appliance-dependent disaggregation performance. Many heating and cooling devices cycle between operational states of fixed power consumption and duration, which renders their detection in aggregate data comparatively simple. Conversely, for appliances with varying power consumption values during operation, and especially those strongly under-represented in training data, reported accuracy levels are coherently lower. With the power consumption levels of the latter type of appliances being much more diverse, our conjecture is that a single disaggregation model is insufficient for their disaggregation. We hence describe our approach of using a multi-model disaggregation concept (as shown in Figure 1), alongside the required preliminiaries, as follows. Aligned with the methodology used in existing NILM research, we use the individual power demand of one particular appliance in conjunction with the building’s total power demand to train NILM models. While existing NILM methods, however, strive to optimize the architecture and parameterization of one single model, such that the disaggregation outcome for the appliance under consideration yields the best performance, we train multiple NILM models for the same appliance.

3.1. Baseline Model

Among the various machine learning models applied to NILM, one of the most promising is the sequence-to-point (seq2point) model. Introduced by Zhang et al., this model utilizes convolutional neural networks (CNNs) to process windows of aggregated power consumption data, predicting the operation of specific appliances at the center of each window [26]. This approach has gained considerable attention in the NILM community for its robust disaggregation results, as highlighted in [13]. Advancements in this model include the integration of attention [27], enabling the model to focus on the most relevant features of the input data and thereby further improving its performance in NILM tasks. Given its prominence and demonstrated efficacy, the seq2point model serves as an ideal basis for our study, providing a benchmark against which the performance of our proposed methodology can be evaluated.

3.2. Matrix Profile for Time Series Analysis

The matrix profile is a novel method for identifying motifs, discords, and shapelets within time series data [28,29]. This tool facilitates the efficient computation of all-pairs similarity joins in time series data, a process which involves matching each subsequence within the dataset with its most similar counterpart based on a chosen similarity measure. This tool provides a detailed analysis of the underlying patterns and anomalies in the data by comparing all possible pairs of windows extracted from the dataset. We utilize the matrix profile to compute the Euclidean distance for all pairs of windows in the dataset, thereby uncovering the intricate structure of the time series data through a comprehensive similarity analysis.
It is important to note that the computational complexity of calculating the Euclidean distance for all pairs using a brute-force approach is O ( m · n 2 ) for a sequence of length n and a window size m. However, using the STOMP algorithm and exploiting GPU-accelerated matrix profile computation, this complexity can be reduced to O ( n 2 ) . This improvement is achieved by reducing the dependency on the window size m to O ( 1 ) , making the algorithm’s performance independent of the subsequence length [29,30,31].

3.3. Clustering

Each windows in the dataset is assigned to a cluster based on the shortest Euclidean distance to previous windows in the dataset. By analyzing the distribution of these distances across all windows in the aggregate data, we establish a threshold value θ to determine the model that is being trained using the data and for inference during testing. This threshold is computed by identifying the histogram bin with the highest frequency of distances and taking the average of that bin’s limits, effectively capturing the central range of common distances. In cases where all windows fall below or above θ , the approach dynamically adjusts θ to balance the data split between clusters, ensuring that at least 10% of the windows are assigned to each cluster. This adjustment ensures that both models receive sufficient training data, even in bimodal or skewed scenarios.
To generalize this clustering approach, let W = { w 1 , w 2 , , w N } denote the set of all windows in the dataset, where each window w i is a time series segment of length T. For the seq2point model, the windows are produced by sliding a fixed-length window over the time series, with a stride size of one sample. The pairwise Euclidean distance between two windows w i and w j is defined as follows:
d ( w i , w j ) = t = 1 T ( w i [ t ] w j [ t ] ) 2 ,
where w i [ t ] denotes the power consumption at time t in window w i . This distance can be computed efficiently using the matrix profile algorithm, as described in Section 3.2.
Let C = { C 1 , C 2 , , C K } be the sorted set of clusters 1 , , K , where each cluster C k W . Assign each window w i to the cluster C k based on the following condition:
C k = { w i min w j Y i d ( w i , w j ) [ θ k 1 , , θ k ] } ,
where θ k is the threshold value for cluster k, θ 0 : = 0 , θ K : = , and Y i is the set of all windows prior to w i within one year during inference and the full traing dataset during training.

Determining the Number of Clusters K and Thresholds θ k

The number of clusters K is determined by analyzing the distribution of the shortest Euclidean distances between all windows in the mains data from the training dataset. This is performed to identify the modes in the data μ . In practice, we construct a histogram of these distances using 100 bins, spanning the minimum to maximum distance values. We then identify the bins that correspond to local maxima as the modes of the distribution. We choose the number of clusters to be K = μ + 1 , meaning that each mode will correspond to one threshold boundary. To compute the actual threshold values separating the clusters ( θ k ), we examine all μ modes in the histogram and take the midpoints of their boundaries. Thus, we arrive at natural cut-off points θ k for each cluster C k that represent the upper distance limit up to which a data window is considered to be a member of C k . In other words, if the shortest distance for window w i lies below θ 1 , w i is assigned to the first cluster C 1 . If it lies between θ 1 and θ 2 , it is assigned to C 2 , and so on; distances above the last threshold θ K 1 fall into cluster C K .
It is also possible to choose K based on the appliance type if the disaggregation algorithm is appliance-specific, which is a common case in NILM research, as it is for the seq2point model. Still, we opted for a single K for all appliances in our study to simplify the evaluation and to demonstrate the general applicability of our approach.

3.4. Model Training and Inference

During training, model m k corresponding to the cluster C k is trained using the windows in w i C k . For inference, the shortest Euclidean distance between the current window w i and previous windows within the last year Y i is computed, and compared to each threshold θ k , selecting the corresponding model m k with the shortest distance for disaggregation. The one-year limit is imposed to maintain temporal consistency by avoiding seasonal influences. Additionally, this restriction ensures fixed computational demand, independent of dataset size, resulting in consistent inference times.
An example of this clustered training approach is illustrated in Figure 2 with K = 2 : In this scenario, the windows w 2 and w 4 are used to train the first model, while w 1 and w 3 are used to train the second model given that the window w 1 has the shortest Euclidean distance (d) to w 3 and w 2 has the shortest Euclidean distance to w 4 . For a chosen threshold distance θ , such that d ( w 1 , w 3 ) > θ > d ( w 2 , w 4 ) , the first and third windows would be used to train the first model, while the second and fourth windows would be used to train the second model.

4. Results and Discussion

This section presents the outcomes of the proposed framework, comparing the effectiveness of individual cluster-based models against the baseline and the integrated dual-model approach.

4.1. Evaluation Setup

In our evaluation, we apply the following boundary conditions:

4.1.1. Dataset and Pre-Processing

We consider the first house from the ECO dataset [32], where the data span a period of 245 days. The pre-processing steps include normalizing the data by removing the mean and dividing by the standard deviation. We focus on appliances with the highest mean consumption: Freezer, Hair Dryer, Washing Machine, and Fridge. Data resampling is performed using forward filling to align the dataset, resampled to one sample every 10 s.

4.1.2. Model Parameters

The seq2point model is trained with a window size and sequence length of 59 samples (i.e., an effective window length of ten minutes), corresponding to an input size tailored to the model. The training batch size is set at 512 samples, and models are trained for 20 epochs using the Adam optimizer with a learning rate of 0.0001. The dataset is split into 80% for training and 20% for testing.

4.1.3. Clustering

As can be observed in Figure 3, which shows the histograms of the pairwise windows distances in the training data, three out of four appliances follow a bimodal distribution, whereas the aggregate mains and the freezer can likely also be modeled using an unimodal distribution. Consequently, in accordance with Section 3.3, we choose K = 2 and employ two distinct seq2point models in order to accommodate the characteristics of all considered appliances, tailored to different clusters of consumption patterns. For appliances modeled unimodally, where clustering may not provide a clear advantage, the dual-model approach would reduce to using a single baseline model for training and inference. Preliminary results indicate that, for such cases, the performance of Synergistic NILM aligns closely with that of the baseline model. However, this approach primarily benefits datasets with distinct multimodal consumption patterns.
We use the Euclidean distance as metric when determining the closest cluster for a given time window. In other words, time windows of 59 samples in length (cf. Section 4.1.2) are being compared to the cluster centers, and the model corresponding to the closest cluster is being used for disaggregation. To further substantiate the clustering based on Euclidean distance, a quantile analysis is performed. The washing machine serves to illustrate this analysis, with results depicted in Figure 4. By observing the mean power consumption within each distance quantile, distinct patterns emerge, reinforcing the premise that different clusters exhibit varying characteristics. This underpins our strategy of using separate models for each cluster to capture the nuances in consumption behavior more accurately.

4.1.4. Inference Process

For inference, we compute the distance of a testing window against the closest historical sample within one year in our dataset. This approach involves identifying the minimum distance to any previously recorded sample, ensuring that our model selection for disaggregation is based on the most relevant historical consumption pattern.

4.2. Disaggregation Results

Table 1 summarizes the performance of the different models. The "Baseline" row represents the performance of the seq2point model when trained on the entire dataset and tested against the test set. Models m 1 and m 2 are the seq2point models trained on the first and second clusters of the dataset, respectively, and also tested on the full test set. Model m c denotes the composite model wherein the proposed framework is applied during inference, selecting the most suitable model for each testing window based on the Euclidean distance to the closest historical sample. The ratios for model utilization in m c were derived from the proportion of windows in each cluster during training. This ensures that m c leverages the most appropriate model based on the distribution of patterns observed in the training dataset.
As expected, the cluster-specific models perform comparably or slightly worse than the baseline, which is trained on the full dataset. However, the integrated model m c outperforms the baseline across all appliances. Model m 1 performs slightly worse than the baseline. However, it is noteworthy that model m 2 achieves performance on par with the baseline, highlighting its effectiveness in certain scenarios. This suggests that the data cluster represented by m 2 aligns closely with the overall distribution of the dataset used for the baseline model.
The relatively high error observed for the washing machine across all models, particularly for m 1 , can be attributed to the high consumption, sporadic usage, and irregular operational cycle of this appliance. The underperformance of model m 1 for the hair dryer is due to the short operation cycles, which are challenging for the model to learn and predict accurately. In contrast, both the baseline model and model m 2 perform better with this appliance, likely because they benefit from learning longer cycles. This distinction underscores the importance of considering the nature of appliance usage in NILM modeling.

4.3. Analysis of Loss Curves

The training and validation loss curves for the hair dryer, depicted in Figure 5, indicate that all models reach a plateau in performance improvement after approximately half the epochs. This suggests a saturation point beyond which traditional training offers diminishing returns. Typically, this problem requires more data to overcome, which is not always feasible. In contrast, our proposed framework, by segmenting the dataset and specializing models for each segment, effectively extracts additional performance gains from each model.

4.4. Sample Predictions

A closer examination of the freezer’s operational cycle predictions in Figure 6 reveals the nuanced advantages of each model. The baseline model initially provides the closest predictions to the ground truth but later in the window diverges. The cluster models m 1 and m 2 exhibit their strengths at different phases of the cycle, with m 2 performing better at the start and m 1 aligning closer towards the end. The m c model judiciously switches between m 1 and m 2 based on historical data proximity, resulting in the most accurate predictions throughout the cycle.

4.5. Impact of Limited Training Data

To assess how our approach performs under the presence of data scarcity, we have conducted an additional experiment where we used only 25% of the ECO dataset for training (first 25% of the time series) and the remaining for testing. Table 2 reports the resulting prediction error for each appliance. As before, the Baseline row corresponds to the seq2point model trained on this reduced dataset, while m 1 and m 2 denote the two cluster-specific models, and m c is our integrated multi-model approach.
Despite having significantly fewer training samples, m c still achieves better or comparable performance to the single Baseline model across all four appliances. In particular, m c achieves substantial error reductions for the hair dryer and washing machine. This result demonstrates that clustering-based specialization can remain advantageous even when data are limited, presumably because different operating modes or outlier segments are separated into different clusters. Nevertheless, the degradation in performance of the individual models m 1 and m 2 (e.g., for the washing machine) indicates that clustering alone does not entirely solve the problem of inadequate data coverage. Still, the ensemble-like nature of our integrated approach helps offset these individual deficiencies, showing resilience against limited or less diverse training data.

4.6. Trade-Offs and Feasibility

The proposed dual-model training technique introduces a trade-off in terms of memory consumption, as it necessitates storing multiple models instead of one. However, given the relatively modest size of the seq2point models in regard to the chosen input window size, this increase in memory requirement is not substantial. Specifically, loading a model into the GPU memory requires approximately 449 MiB, a figure that, while notable, does not significantly impact the overall system’s efficiency given the available GPU memory capacities. This increase in memory requirement is outweighed by the performance benefits. During inference, there is additional computational overhead as the system must calculate the shortest Euclidean distance to all samples within the historical data to select the appropriate model. Given the 10 s sampling rate, this computation was feasible and allows for online inference without delay. A practical analysis of this overhead is discussed in Section 4.9. For larger datasets, we would limit the historical data comparison to the last year to maintain computational efficiency and the feasibility of real-time inference as indicated in Section 3.4.
It should be noted that, while Figure 2 illustrates clustering on a daily basis for clarity, the methodology scales to longer time frames by retaining only representative samples for the previous year. These representatives are computed by clustering historical windows and retaining their centroids, significantly reducing the computational overhead.

4.7. Implications for Model Training

The quantile analysis, as visualized in Figure 4, illustrates that the characteristics of consumption data vary significantly across the different distance quantiles. For instance, the mean power values decrease markedly from Q1 to Q4, suggesting that the consumption patterns are distinctly different in each quantile. This variability supports our approach of training separate models for different segments of the data, as it allows each model to specialize in a specific behavior pattern, potentially leading to more accurate NILM predictions. Consequently, this analysis provides a data-driven rationale for our dual-model training approach, where each model is optimized for a subset of the data characterized by its Euclidean distance.

4.8. Scalability Analysis

As discussed in Section 3.4, our method trains multiple models for a single appliance on distinct data clusters and selects the most suitable model for inference based on the shortest Euclidean distance to historically observed windows. When increasing the size of the training dataset (e.g., from more sites or additional data collection), the computational requirements to train neural networks naturally grow as with any NILM approach, yet this is a one-time offline cost. Since our clustering-based approach partitions the dataset into subsets (one per cluster), each model m k is ultimately trained on fewer samples than a single comprehensive model would be. Hence, the total number of training iterations is comparable to a baseline approach that uses all data for one model, albeit with an additional overhead to determine the cluster assignments.
A greater number of appliances (i.e., if we repeat the same multi-model concept for many devices) may require more total model parameters to be stored in memory. In our experiments (c.f. Section 4), each seq2point model requires around 449 MiB of GPU memory. Loading two or three models is, hence, typically feasible on commodity hardware. If additional memory constraints arise, strategies such as on-demand loading of models from disk or model pruning and distillation can be applied to reduce memory usage.

4.9. Real-Time Feasibility

The principal runtime overhead incurred by our approach stems from the need to determine the closest historical window for each newly arriving window during inference. Since we limit the historical data lookup to the last year, the time needed to compute the Euclidean distance to all samples is capped. Furthermore, in practical NILM deployments, sampling intervals vary widely. In order to empirically quantify the inference overhead on low-power devices, we conducted an experiment on a Raspberry Pi 4 with 4 GB RAM, varying the sampling duration from 1 s to 20 s, and measuring how long the distance computation plus the model inference step took in each scenario.
Figure 7 illustrates the results for distance computation plus inference time at different sampling durations. The system can keep pace with real-time data at sampling intervals of 3 s and higher; for sampling intervals shorter than that, the distance computation plus inference steps occasionally exceed the available time window. In our evaluation setting, the average inference time is approximately 0.102 s, which comfortably fits within the sampling period, thus meeting real-time requirements. Notably, the overhead of loading two models into memory during inference was negligible compared to the distance computation steps.
These findings indicate that our approach can be deployed in real time on resource-limited hardware, provided that the sampling interval and the scope of historical data retrieval are chosen judiciously. In domains where extremely fine granularity (e.g., sub-second sampling) is required, more aggressive optimization is needed (e.g., GPU acceleration, approximate nearest-neighbor search structures, or further limiting the historical lookup window). Nevertheless, for typical NILM scenarios, our framework remains computationally feasible for online deployment on edge devices.

5. Conclusions

Our study introduces a novel NILM framework that significantly enhances predictive accuracy by employing a dual-model training strategy. This approach, which leverages normalized Euclidean distance for model selection, offers a promising solution to the challenge of model saturation, particularly in scenarios with limited data. The empirical evidence demonstrates that, while models trained on segmented data may not always surpass a comprehensive baseline in terms of their performance, an integrated model consistently achieves superior accuracy across various appliances.
The dual-model approach not only demonstrates improved prediction accuracy but also underscores the importance of adaptability in NILM applications. Our findings, particularly with the ECO dataset, indicate an average accuracy improvement of 13% over traditional methods, showcasing the framework’s potential for broader application. This study lays the groundwork for further advancements in NILM technology, paving the way for more efficient and adaptable energy management solutions.

Author Contributions

Conceptualization, M.B. and A.R.; methodology, M.B.; software, M.B.; validation, M.B.; formal analysis, M.B.; investigation, M.B. and A.R.; resources, A.R.; data curation, M.B.; writing—original draft preparation, M.B.; writing—review and editing, A.R.; visualization, M.B.; supervision, A.R.; project administration, A.R.; funding acquisition, A.R. All authors have read and agreed to the published version of the manuscript.


This research received no external funding.

Data Availability Statement

Data used in this study have been sourced from the ECO [32] dataset.


We acknowledge support by Open Access Publishing Fund of Clausthal University of Technology.

Conflicts of Interest

The authors declare no conflicts of interest.


The following abbreviations are used in this manuscript:
NILMNon-Intrusive Load Monitoring
ECOElectricity Consumption and Occupancy
IoTInternet of Things
HEMSsHome Energy Management Systems
FLFederated Learning
CNNConvolutional Neural Network


  1. Hart, G.W. Prototype Nonintrusive Appliance Load Monitor; Technical Report; MIT Energy Laboratory and Electric Power Research Institute: Cambridge, MA, USA, 1985. [Google Scholar]
  2. Hart, G. Nonintrusive appliance load monitoring. Proc. IEEE 1992, 80, 1870–1891. [Google Scholar] [CrossRef]
  3. Ruano, A.; Hernandez, A.; Ureña, J.; Ruano, M.; Garcia, J. NILM Techniques for Intelligent Home Energy Management and Ambient Assisted Living: A Review. Energies 2019, 12, 2203. [Google Scholar] [CrossRef]
  4. Garcia, F.D.; Souza, W.A.; Diniz, I.S.; Marafão, F.P. NILM-based approach for energy efficiency assessment of household appliances. Energy Inform. 2020, 3, 10. [Google Scholar] [CrossRef]
  5. Holmegaard, E.; Baun Kjaergaard, M. NILM in an Industrial Setting: A Load Characterization and Algorithm Evaluation. In Proceedings of the 2016 IEEE International Conference on Smart Computing (SMARTCOMP), St. Louis, MO, USA, 18–20 May 2016; pp. 1–8. [Google Scholar] [CrossRef]
  6. Athanasiadis, C.L.; Papadopoulos, T.A.; Kryonidis, G.C.; Doukas, D.I. A Holistic and Personalized Home Energy Management System With Non-Intrusive Load Monitoring. IEEE Trans. Consum. Electron. 2024, 70, 6725–6737. [Google Scholar] [CrossRef]
  7. Rafiq, H.; Manandhar, P.; Rodriguez-Ubinas, E.; Ahmed Qureshi, O.; Palpanas, T. A review of current methods and challenges of advanced deep learning-based non-intrusive load monitoring (NILM) in residential context. Energy Build. 2024, 305, 113890. [Google Scholar] [CrossRef]
  8. Wang, X.; Hu, M.; Luo, X.; Guan, X. A detection model for false data injection attacks in smart grids based on graph spatial features using temporal convolutional neural networks. Electr. Power Syst. Res. 2025, 238, 111126. [Google Scholar] [CrossRef]
  9. Pan, Z.; Wang, H.; Li, C.; Wang, H.; Zhao, J. Perfednilm: A practical personalized federated learning-based non-intrusive load monitoring. Ind. Artif. Intell. 2024, 2, 4. [Google Scholar] [CrossRef]
  10. Zhong, F.; Shan, Z.; Si, G.; Liu, A.; Zhao, G.; Li, B. Source-Free Domain Adaptation with Self-Supervised Learning for Nonintrusive Load Monitoring. IEEE Trans. Instrum. Meas. 2024, 73, 2534813. [Google Scholar] [CrossRef]
  11. Hao, P.; Zhu, L.; Yan, Z.; Huang, Y.; Lei, Y.; Wen, H. Synthetic-to-Real Domain Adaptation for Nonintrusive Load Monitoring via Reconstruction-Based Transfer Learning. IEEE Trans. Instrum. Meas. 2024, 73, 1–13. [Google Scholar] [CrossRef]
  12. Dimitriadis, I.; Virtsionis Gkalinikis, N.; Gkiouzelis, N.; Vakali, A.; Athanasiadis, C.; Baslis, C. HeartDIS: A Generalizable End-to-End Energy Disaggregation Pipeline. Energies 2023, 16, 5115. [Google Scholar] [CrossRef]
  13. Huber, P.; Calatroni, A.; Rumsch, A.; Paice, A. Review on Deep Neural Networks Applied to Low-Frequency NILM. Energies 2021, 14, 2390. [Google Scholar] [CrossRef]
  14. Teichgraeber, H.; Brandt, A.R. Clustering methods to find representative periods for the optimization of energy systems: An initial framework and comparison. Appl. Energy 2019, 239, 1283–1293. [Google Scholar] [CrossRef]
  15. Wen, L.; Zhou, K.; Yang, S. A shape-based clustering method for pattern recognition of residential electricity consumption. J. Clean. Prod. 2019, 212, 475–488. [Google Scholar] [CrossRef]
  16. Figueiredo, M.; de Almeida, A.; Ribeiro, B. Home electrical signal disaggregation for non-intrusive load monitoring (NILM) systems. Neurocomputing 2012, 96, 66–73. [Google Scholar] [CrossRef]
  17. Kelly, J.; Knottenbelt, W. Neural NILM: Deep Neural Networks Applied to Energy Disaggregation. In Proceedings of the 2nd ACM International Conference on Embedded Systems for Energy-Efficient Built Environments (BuildSys ’15), New York, NY, USA, 4–5 November 2015; pp. 55–64. [Google Scholar] [CrossRef]
  18. Kang, J.; Ryu, K.R.; Kwon, H.C. Using Cluster-Based Sampling to Select Initial Training Set for Active Learning in Text Classification. In Proceedings of the Advances in Knowledge Discovery and Data Mining, Sydney, Australia, 26–28 May 2004; Dai, H., Srikant, R., Zhang, C., Eds.; Springer: Berlin/Heidelberg, Germany, 2004; pp. 384–388. [Google Scholar] [CrossRef]
  19. Czarnowski, I. Cluster-based instance selection for machine classification. Knowl. Inf. Syst. 2012, 30, 113–133. [Google Scholar] [CrossRef]
  20. Desai, S.; Alhadad, R.; Mahmood, A.; Chilamkurti, N.; Rho, S. Multi-State Energy Classifier to Evaluate the Performance of the NILM Algorithm. Sensors 2019, 19, 5236. [Google Scholar] [CrossRef] [PubMed]
  21. Dinesh, C.; Makonin, S.; Bajić, I.V. Residential Power Forecasting Based on Affinity Aggregation Spectral Clustering. IEEE Access 2020, 8, 99431–99444. [Google Scholar] [CrossRef]
  22. Lin, Y.H.; Tsai, M.S.; Chen, C.S. Applications of fuzzy classification with fuzzy c-means clustering and optimization strategies for load identification in NILM systems. In Proceedings of the 2011 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2011), Taipei, Taiwan, 27–30 June 2011; pp. 859–866. [Google Scholar] [CrossRef]
  23. Jazizadeh, F.; Becerik-Gerber, B.; Berges, M.; Soibelman, L. Unsupervised Clustering of Residential Electricity Consumption Measurements for Facilitated User-Centric Non-Intrusive Load Monitoring. In Computing in Civil and Building Engineering (2014); ASCE: Reston, VA, USA, 2014; pp. 1869–1876. [Google Scholar] [CrossRef]
  24. Reinhardt, A.; Klemenjak, C. How does Load Disaggregation Performance Depend on Data Characteristics? Insights from a Benchmarking Study. In Proceedings of the Eleventh ACM International Conference on Future Energy Systems (e-Energy ’20), Virtual, 22–26 June 2020; pp. 167–177. [Google Scholar] [CrossRef]
  25. Angelis, G.F.; Timplalexis, C.; Krinidis, S.; Ioannidis, D.; Tzovaras, D. NILM applications: Literature review of learning approaches, recent developments and challenges. Energy Build. 2022, 261, 111951. [Google Scholar] [CrossRef]
  26. Zhang, C.; Zhong, M.; Wang, Z.; Goddard, N.; Sutton, C. Sequence-to-Point Learning With Neural Networks for Non-Intrusive Load Monitoring. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar] [CrossRef]
  27. Yang, M.; Li, X.; Liu, Y. Sequence to Point Learning Based on an Attention Neural Network for Nonintrusive Load Decomposition. Electronics 2021, 10, 1657. [Google Scholar] [CrossRef]
  28. Yeh, C.C.M.; Zhu, Y.; Ulanova, L.; Begum, N.; Ding, Y.; Dau, H.A.; Silva, D.F.; Mueen, A.; Keogh, E. Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View That Includes Motifs, Discords and Shapelets. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining (ICDM), Barcelona, Spain, 12–15 December 2016; pp. 1317–1322. [Google Scholar] [CrossRef]
  29. Zhu, Y.; Zimmerman, Z.; Senobari, N.S.; Yeh, C.C.M.; Funning, G.; Mueen, A.; Brisk, P.; Keogh, E. Matrix Profile II: Exploiting a Novel Algorithm and GPUs to Break the One Hundred Million Barrier for Time Series Motifs and Joins. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining (ICDM), Barcelona, Spain, 12–15 December 2016; pp. 739–748. [Google Scholar] [CrossRef]
  30. Zimmerman, Z.; Kamgar, K.; Senobari, N.S.; Crites, B.; Funning, G.; Brisk, P.; Keogh, E. Matrix Profile XIV: Scaling Time Series Motif Discovery with GPUs to Break a Quintillion Pairwise Comparisons a Day and Beyond. In Proceedings of the ACM Symposium on Cloud Computing (SoCC ’19), Santa Cruz, CA, USA, 20–23 November 2019; pp. 74–86. [Google Scholar] [CrossRef]
  31. Law, S.M. STUMPY: A Powerful and Scalable Python Library for Time Series Data Mining. J. Open Source Softw. 2019, 4, 1504. [Google Scholar] [CrossRef]
  32. Beckel, C.; Kleiminger, W.; Cicchetti, R.; Staake, T.; Santini, S. The ECO data set and the performance of non-intrusive load monitoring algorithms. In Proceedings of the 1st ACM Conference on Embedded Systems for Energy-Efficient Buildings (BuildSys ’14), Memphis, TN, USA, 5–6 November 2014; pp. 80–89. [Google Scholar] [CrossRef]
Figure 1. Comparison of the standard NILM data flow vs. the proposed flow that includes a model selection step.
Figure 1. Comparison of the standard NILM data flow vs. the proposed flow that includes a model selection step.
Energies 18 00608 g001
Figure 2. Illustrative example of bimodal clustered training, depicting just four windows for improved visual clarity (in practice, a sliding window across the entire time series would be used): w 2 , w 4 are used to train the first model, while w 1 , w 3 are used to train the second model.
Figure 2. Illustrative example of bimodal clustered training, depicting just four windows for improved visual clarity (in practice, a sliding window across the entire time series would be used): w 2 , w 4 are used to train the first model, while w 1 , w 3 are used to train the second model.
Energies 18 00608 g002
Figure 3. Logarithmically scaled histograms of the shortest normalized Euclidean distances for total household consumption (mains) and the four individual appliances: Freezer, Hair Dryer, Washing Machine, and Fridge. The marker in the first histogram indicates the threshold value θ for clustered training.
Figure 3. Logarithmically scaled histograms of the shortest normalized Euclidean distances for total household consumption (mains) and the four individual appliances: Freezer, Hair Dryer, Washing Machine, and Fridge. The marker in the first histogram indicates the threshold value θ for clustered training.
Energies 18 00608 g003
Figure 4. Mean power consumption of the washing machine across the four quantiles of the shortest normalized Euclidean distances. Each line corresponds to a quantile, with the legend indicating the overall mean ( μ ) consumption value for all windows within that quantile.
Figure 4. Mean power consumption of the washing machine across the four quantiles of the shortest normalized Euclidean distances. Each line corresponds to a quantile, with the legend indicating the overall mean ( μ ) consumption value for all windows within that quantile.
Energies 18 00608 g004
Figure 5. Loss curves for the baseline model and the two cluster models of the hair dryer.
Figure 5. Loss curves for the baseline model and the two cluster models of the hair dryer.
Energies 18 00608 g005
Figure 6. Sample predictions for the freezer from the test data for all models.
Figure 6. Sample predictions for the freezer from the test data for all models.
Energies 18 00608 g006
Figure 7. Computation time (seconds) on a Raspberry Pi 4 for distance calculation (to all windows from the last year) plus model inference, as a function of the sample duration. The dashed line indicates the threshold where real-time operation is achievable (i.e., total computation time stays below the sampling interval).
Figure 7. Computation time (seconds) on a Raspberry Pi 4 for distance calculation (to all windows from the last year) plus model inference, as a function of the sample duration. The dashed line indicates the threshold where real-time operation is achievable (i.e., total computation time stays below the sampling interval).
Energies 18 00608 g007
Table 1. Model performance by appliance as the mean absolute error (MAE) in watts, with percentages indicating the proportion of prediction windows in which m c utilized m 1 or m 2 . Bold values indicate the best (lowest) MAE for each appliance.
Table 1. Model performance by appliance as the mean absolute error (MAE) in watts, with percentages indicating the proportion of prediction windows in which m c utilized m 1 or m 2 . Bold values indicate the best (lowest) MAE for each appliance.
ApplianceFreezerHair DryerWashing MachineFridge
m 1 30.23 (36%)101.91 (59%)153.09 (17%)44.01 (32%)
m 2 28.09 (64%)57.95 (41%)139.77 (83%)41.75 (68%)
m c 25.9745.36122.0138.74
Table 2. Disaggregation performance (MAE in watts) when using only the first 25% of the ECO dataset for training and the remaining for testing. Best results for each appliance are in bold.
Table 2. Disaggregation performance (MAE in watts) when using only the first 25% of the ECO dataset for training and the remaining for testing. Best results for each appliance are in bold.
ApplianceFreezerHair DryerWashing MachineFridge
m 1 33.3187.81203.6644.24
m 2 33.6677.43200.2648.41
m c 33.3262.29155.4541.34
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bouchur, M.; Reinhardt, A. Synergistic Non-Intrusive Load Monitoring: Dual-Model Training and Inference for Improved Load Disaggregation Prediction. Energies 2025, 18, 608.

AMA Style

Bouchur M, Reinhardt A. Synergistic Non-Intrusive Load Monitoring: Dual-Model Training and Inference for Improved Load Disaggregation Prediction. Energies. 2025; 18(3):608.

Chicago/Turabian Style

Bouchur, Mazen, and Andreas Reinhardt. 2025. "Synergistic Non-Intrusive Load Monitoring: Dual-Model Training and Inference for Improved Load Disaggregation Prediction" Energies 18, no. 3: 608.

APA Style

Bouchur, M., & Reinhardt, A. (2025). Synergistic Non-Intrusive Load Monitoring: Dual-Model Training and Inference for Improved Load Disaggregation Prediction. Energies, 18(3), 608.

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop