3.1. Demand Shaping
User demand shaping uses external energy storage devices (such as a large rechargeable battery (RB) [
7,
35,
36,
37,
38,
39,
40,
41,
42], renewable energy system (RES) [
43,
44,
45,
46]), or load shifting [
47,
48,
49] to distort the real power consumption curves. The RB and RES method can be treated as a noise-adding approach at the physical layer, as the original power demand is distorted, the utility cannot infer sensitive information from the smart meter data. An RB system contains a smart meter, a battery, and an energy management unit (EMU), the EMU controls battery to implement optimal energy management policy (EMP), with the injection of power from the RB
, the mismatching between the power supplied by the grid
and consumers’ power demand
provide privacy guarantee to consumers. The works conclude that the larger the battery capacity size, the better privacy can be guaranteed. However, the RB is a finite capacity energy storage device with capacity ranges from 2 kWh to 20 kWh [
50], therefore there exists a lower and upper bound (
and
) to limit the performance of the mechanism. The optimal EMP, such as best-effort (BE) algorithm [
7], water -filling algorithm [
51], Q -Learning algorithm [
1], non-intrusive load-leveling (NILL) algorithm [
52] are introduced to optimize the charging/discharging process, these algorithms control the battery either hide, smooth, or obfuscate the load signature [
7]. NILL algorithms are designed to blind the NILM [
52], instead of only one target load, the NILL has two states, a steady-state and recovery state if the battery capacity cannot enable the load to maintain steady-state, the load is switched to the recovery state. A privacy-versus-cost trade-off strategy considering the TOU tariff is proposed by Giaconi et.al in 2017 [
53]. Instead of a constant load target, a piecewise load target referring to the current TOU price is generated, the cost of the electricity is minimized, and the consumers can also sell extra energy to the grid to reduce the cost further.
RES utilizes rooftop PV, small wind turbine, and even Electric Vehicle (EV) [
54] to replace the conventional battery. To overcome the difficulty to roll -out expensive RES and RB facilities, Reference [
55] proposed a multiuser shared RSE strategy that enables serval users to share one RES and one EMU. The EMU control the RES to allocate the energy from the RES to each user. In this case, the target of the system is to minimize the overall privacy loss of all users rather than an individual user. EV is another scheme to reduce the reliance of the RB [
54] since the charging period is almost overlapping with the peak load, it can mask other appliance signatures. However, the EV can only be used when the consumers are at home, the consumers are still under real-time surveillance since the adversary would obtain information when the residents leave their home.
To summarize, in RB/RES methods, researchers view the identification information of the load curve as the variation of the load measurements of two neighboring measure points
. The ideal situation for the grid curve is a constant value
which will not reveal any sensitive features of the demand, the modified load curve
is then compared to
, the more similarities between these two curves, better privacy can be guaranteed. To quantify the privacy loss, Mean Squared-Error (MSE) [
53], Mutual Information (MI), Fisher Information (FI) [
35], KL divergence [
7], Empirical MI [
37] are adopted in related works. However, user demand shaping also has drawbacks: Firstly, extra energy storage systems and renewable energy sources are required to implement the demand shaping strategy; these devices are prohibitively expensive and can be difficult to roll-out, the batteries need to be renewed frequently. Secondly, the energy storage system blinds demand response, which is one of the most important functions of the smart grid.
As the drawbacks of RB/RES methods are obvious, another demand shaping method named load shifting is proposed to replace the RB/RES techniques. This method hides sensitive information by shifting the controllable loads [
47,
48,
49]. The loads can be divided into uncontrollable loads (e.g., lighting, microvan, kettle) and controllable loads (e.g., heating, ventilation, and air conditioning (HVAC) systems, EV, dishwasher, washing machine). The operation time and model of the controllable loads can be scheduled by consumers to prevent occupancy detection. In [
49], combined heat and privacy (CHPr) are proposed, thermal energy storage such as electric water heater is adopted to mask occupancy. Compared with the RB approach, CHPr neither requires expensive devices nor increase electricity cost. There are several limitations of the load shifting technique, firstly, some of the controllable loads have limited operation modes and cannot be interrupted; secondly, there are restrictions for the thermal loads to store energy.
3.2. Data Manipulation
Different from the demand shaping approach, data manipulation aims to modify the smart meter data before sending it to the utility. Data aggregation, data obfuscation, data down-sampling, and anonymization all belong to this category.
Data obfuscation, which is also called data distortion, tries to add noise to the original smart meter data to cover the real power consumption [
56,
57,
58,
59,
60]. Like demand shaping technique, data obfuscation also reduces the privacy loss by distorting the smart meter data, but on the network layer. Noises such as Gaussian noise [
56,
60], Laplace noise [
56], gamma noise [
57] are added into the original smart meter data to distort the load curve. These noise-adding mechanisms follow normal distributions with mean μ equals to zero, hence the noise would cancel out if enough readings are added up together, P. Barbosa, et al. [
59] conclude that these probability distributions would not influence the relationship between the utility and privacy, so all distributions can achieve similar performance in protecting privacy. Moreover, to guarantee the billing correctness, serval schemes are proposed: Reference [
56] proposes a power consumption distribution reconstruction methods by adding another Gaussian distribution into the data, but the method does not quantify how much noise should be added to recover the original curve; Reference [
59] sends a filtered profile to the utility rather than masked profile, then result shows that the error of the overall power consumption is reduced in this way. However, they also find that the error during different periods (peak period, off-peak period) is significantly different, which provides new challenge. In summary, although the data distortion scheme shows efficient performance in reducing privacy loss, there are serval problems which should be discussed in future studies: (1) The TOU tariff is unavailable. Although the noise would be zero-mean, but the multiplier for TOU pricing is not. Hence the sum of TOU bills would be influenced. (2) Although from the signal processing and information-theoretic viewpoint that a zero-mean noise would not influence the result, we should notice that the power system is operating on a real-time basis. The power system operator manages the grid with the real-time data sent from the smart meter, even a minor error between the ground truth with the distorted data could result in serious faults, even the collapse of the whole system.
Data aggregation reduces the privacy loss by constructing aggregators to collecting the data from a few smart meters together, so the utility is unable to detect the electricity events in a single house [
8,
22,
60,
61,
62]. The data aggregation technique is divided into aggregation with trusted third parties (TTP) [
60] and aggregation without TTP [
8,
22]. J.-M Bohli, et al. [
60] propose data aggregation with TTP, the data aggregator (DA) operated by the TTP is responsible for gathering the data from neighbouring smart meters and then sending the aggregated data to ES. At the end of every month, the DA also generates energy consumption of individual consumer for billing purpose. However, there are several concerns about involving TTP. Above all, a TTP could try to infer the personal information, so the TTP itself may bring extra privacy risks to the system. Secondly, with the increasing numbers of smart meters being installed, it is unrealistic for the TTP to build enough DA to satisfy the demand, and the maintenance and development budget would be unaffordable to EP and NO.
References [
8,
22,
61,
62,
63] introduces data aggregation mechanisms without TTP. Instead, encryption techniques such as HE, MPC [
8,
22,
62,
63] are employed. Both HE and MPC encrypt personal smart meter data before sending it to the utility/TP. However, differently from conventional encryption techniques, HE and MPC enable TPs to manipulate the data without knowing the detail of it. F. Li, et al. [
8] and R. Lu, et al. [
22] independently proposed an aggregation method with HE separately. By encrypting smart meter data, the DA can implement aggregation without knowing the data details. In this way, there are no concerns that the TTP may infer sensitive information without permission. However, the drawbacks of data aggregation technology are twofold. Firstly, after aggregating, it is impossible for the utility to obtain the power usage information of an individual consumer. Secondly, complex encryption would cause high computational overhead. MPC requires low computing ability but involves several servers to deal with the data [
63]. In MPC, each server holds a part of the input data and they cannot infer the whole information. MPC has been successfully adopted in smart metering services such as TOU billing. However, complex value-added services, such as load forecasting and online energy disaggregation, require an advanced cloud server to implement these algorithms. So, the availability of MPC in these services should be discussed. The privacy boundary of aggregation size is also investigated in T.N. Buescher, et al. work [
61]. They investigated the aggregation size referring to a privacy metric named ‘privacy game’. Referring to the data-driven evaluation, a conclusion is made that even a DA with over 100 houses can still reveal private information. But the privacy measure they adopt is abstract and just simply measures the difference between the individual load curve and the aggregated curve, a more detailed privacy measure should be proposed to reflect whether advanced algorithms (such as NILM) can infer personal information from the aggregated data.
References [
56,
57,
58] combines data aggregation with noise-adding technique together, to enable differential privacy to the aggregated data. Differential privacy is employed as privacy guarantee, the concept of differential privacy is through adding noise to a largescale dataset, any two neighboring datasets (only one data in these two datasets is different) should be indistinguishable [
64]. In other aggregation mechanisms,
N smart meters are aggregated at first, then a distributed Laplacian Perturbation Algorithm (DLPA) is applied to the aggregated data. By adjusting the parameters
and
, then we can say (
-differential privacy is achieved (
is the parameter to show the strength of privacy guarantee, and the
is the failure probability, the closer
and
to 0, the better privacy can guarantee). The security and privacy performance are analysed in [
56], two denoising filter attacks, the linear mean (LM) filter, and the non-local mean (NLM) filter are employed to evaluate the original. The results convince that attackers cannot infer the original load curve from the distorted one.
Data Anonymization mechanism [
12,
65,
66] reduces privacy loss by replacing the real smart meter identification with pseudonyms. C. Efthymiou and G. Kalogridis proposed a data anonymization method with a TTP escrow in 2010 [
12]. They suggested that two IDs are attached to each smart meter, LFID for sending attributable low frequency and HFID for sending anonymous high-frequency data, while the HFIDs are kept by a TTP, making it unknown to the utility. The low-frequency data are used for billing purposes while the high-frequency information is for network management. However, the workload of the TTP is high, and the development costs increase since all anonymous IDs are processed here. Moreover, with the introduction of the TTP escrow, the privacy risks are not eliminated but just shift from the utility to TTP.
The down-sampling method is a naive approach that aims to reduce sensitive information by reducing the interval resolution of the metered data [
13,
33,
66]. However, like other methods, functions such as demand response and TOU billing would be sacrificed. Moreover, value-add services that require high-resolution data are unavailable as well. To quantify the privacy loss with different interval data, G. Eibl and D. Engel adopt NILM as adversary to the extract of personal information. They apply an edge detection NILM to smart meter data and examine the performance of 15 appliances via F-score values and the proportion of appliances. They conclude that 15-min interval data already protect most appliances. We would like to have an in-depth research based on the research by implanting more powerful NILM algorithm (such as deep learning based NILM) since deep learning has shown distinctive ability to extract features than conventional approaches.
To sum up, solutions either require the installation of expensive devices (rechargeable battery or RES) or employ complex and high computing algorithms (data distortion and data aggregation). Moreover, some schemes introduce TTP into the smart metering system, which just moves the privacy risk from one party (ES) to another one (TTP). Most importantly, unlike other communication networks, the physical connections of the electricity grid already aggregate load consumptions at feeder level or substation level without privacy concerns, the construction of the data aggregator is superfluous. And no existing solution emphasizes the availability of value-added services, which is the vital functionality the smart meter brings to consumers. Comparing the two solutions listed above, the proposed scheme is simpler and more efficient: The proposed scheme is based on existing physical facilities (the smart meter, private platform, distribution substation) and does not require any extra RES or high computation encryption. In the proposed scheme, the smart meter only communicates with the private platform (PC or smartphone) inside the house via Home Area Network (HAN). A multi-channel smart metering system enables the private platform to communicate with other stakeholders (e.g., ES, TP) with different data granularity, which takes the advantages of both data aggregation mechanism to enable grid operation and management and data down sampling mechanism to provide accurate TOU bills. Furthermore, a privacy preserving NILM algorithm is designed to enable value-added services.