1. Introduction
An electrical power system (EPS) encompasses subsystems such as generation, transmission, distribution, market, and users. An EPS is the most complex system built by mankind due to the fact that the costs of communication systems and sensors have reduced rapidly, resulting in the growth of the fitting of millions of sensors in all areas. The electricity consumption of EPSs is managed by measurement and control devices with suitable levels of communication, resistance, reliability, safety, and the capacity to adjust to loads that can vary frequently. A smart grid manages all interactions between physical and computational components and systems that operate in parallel. For the system’s management, not only must electrical variables be considered but also commercial models, economic opportunities, technologies, and regulatory policies, resulting in a new smart electrical network based on distributed systems, such as energy management systems (EMS), demand response management systems (DRMS), advanced distribution management systems (ADMS), advanced metering infrastructures (AMI), distributed energy resource management systems (DERMS), etc. [
1]. These are high-penetration systems that monitor a network that depend on of the number of measurement points, and the sampling rate can generate zettabytes (ZB) of data that must be processed, transmitted, and stored. By the end of 2022, the amount of data generated by China by the IoT is estimated to be 10 ZB [
2].
Related Work
This section presents the state-of-the-art in big data generated in electrical systems. In addition, the most important methodologies that are used for data compression techniques for electrical signals in quality management are listed below:
Signal compression is classified into two different approaches by considering data losses: lossy compression and lossless compression. In many documents, both are combined to improve compression ratios. Lossless compression is based on Delta, Run-length, Lempel–Ziv–Welch, and Huffman encoders. The compression rates obtained are lower than that obtained with lossy compression that eliminates noisy or redundant data [
3,
4]. For lossy signals, the most widely used techniques are orthogonal transforms such as Discrete Cosine Transform (DCT), Discrete Fourier transform (DFT) and Wavelet Transform (WT). The signals are decomposed based on their frequencies and the threshold determines the information required for analysis [
5].
In the research presented in [
5], the authors showed an approach that took a wavelet transform and windowing in order to remove repeated signals. Among the electrical signals that were analyzed were flicker, sag, and swell, and the compression ratio was close to 800:1.
In the research shown in [
3], the authors showed a compression method for electrical signals based on the Fourier transform, components, and an adaptive threshold obtained through mathematical morphology. The compression ratios obtained were 33.33:1.
The authors in [
6] studied the big data architecture developed for the electrical power system. This paper proposed the use of technologies to manage big data, cloud computing, the internet of things, mobile internet, artificial intelligence, and blockchain, Introducing high-power big data as a new asset of Chinese electricity companies.
In research on an electrical power emergency warning mechanism based on meteorological big data [
7], the authors showed protection mechanisms against natural disasters by implementing early detection systems using measurement equipment and generating large amounts of data.
The researchers in references [
8,
9] showed the structure of a metering system and the working principles of smart meters by presenting a big data analysis along with data collection and data security.
In the work presented in [
10] about a reliability evaluation of electric power communication networks based on big data, the authors presented a communication network designed to transmit large amounts of information complying with reliability standards.
The authors in paper [
11] used greedy algorithms such as Orthogonal Matching Pursuit (OMP), which is based on building a dictionary with orthogonal bases that represent the original signal in a sparse representation. The best compression ratio results were obtained with a maximum compression ratio of 14:1.
In reference [
12], the authors showed the compression of power quality signals as harmonics, flickers, and transients. The compression obtained using DTCWT in voltage decreases was 84%, in voltage increases was 88% in flickers was 83%, in transients was 69%, and finally in harmonics was 20%.
In reference [
13], the researchers implemented improvements in the classification and compression of power quality signals. The main results were improvements in segmentation and classification with compression ratios of 25:1 and a performance of 56% compared to traditional compression techniques.
The authors in reference [
14] showed a method for restoring signals with information losses in power transmission lines using compressed sensing techniques such as Basis Pursuit, Matching Pursuit, and Orthogonal Matching Pursuit. Signals were restored by using a dictionary that contained the most relevant information about the signals. These techniques allowed for the recovering of signals with 30% of randomly lost samples with times between 1 and 10 s.
In reference [
15], the authors proposed a signal compression technique based on the Regularization Sparsity Adaptive Matching Pursuit algorithm (RCoSaMP) using measurements from power quality loggers. The stored data size was about 9.25 MB and, after compression, it was 16.22 kB; using a new detector, a 72:1 compression was obtained.
The researchers in reference [
16] analyzed big data that needed to be transmitted by communication equipment to HMI control centers. The authors proposed a lossy compression technique called FF0; the compression results ranged from 65% to 99%.
The authors in reference [
17] presented a technique for denoising and compressing lossy signals using wavelet transform, which was based on Shannon entropy, in order to calculate the basis of a signal. The results obtained were close to a 91% compression with high ratios in terms of the RTE, NMSE and COR rates.
The authors in reference [
18] proposed a signal compression method that was applied to electrical power quality using genetic algorithms and neural networks. The signals were obtained from digital fault recorders (DFRs) and digital protection units, and the best compression results obtained were 24:1.
In reference [
19], the researchers proposed an electrical signal compression method based on the detection of anomalies in a hierarchical block of samples and subsamples by applying a simple iterated delta filter and a streaming data pipeline. The best results obtained were 45.58:1.
The authors in reference [
20,
21] proposed a lossy compression method for electrical signals using a Fourier transform to find the harmonic components in each cycle of a signal to later create a matrix with spectral variations of the signals and to eliminate the repeated components of the signal matrix. The best results were 10,780:1. These results were inconsistent with the entire literature review, considering that compression with a loss of information is applied based on the thresholds of the frequencies obtained by the Fourier transform. Finally, the Fourier transform presents problems with time-varying signals, since it does not have windowing like wavelets. It is for this reason that the best result that was considered in this research was 570:1.
This article is organized as follows.
Section 2 presents the formulation of the problem.
Section 3 presents the simulation and the results.
Section 4 analyzes the results of the model. Finally,
Section 5 presents the conclusions of the research.
2. Problem Formulation
As was shown in the above-mentioned summary of the state-of-the-art in big data, a trend is evident in the installation of measurement equipment at all levels of electrical systems, from generation to consumer, generating enormous amounts of information. The electrical signals are sampled at high frequencies in Hz and stored in vectors where the domain is in terms of time, frequency, and power, and the range is in terms of the values of voltage, current, etc. The resolution of the signal depends on the quantization bits of the analogue-to-digital converters (adc), forming vectors
The inclusion of several metering devices in the residential, commercial, and industrial sectors, generating reliable electrical networks with a low environmental impact and with financial benefits for making informed decisions based on an electricity market in real time.
2.1. Lossless Compression
The answer to the problem of the use of lossy compression was proposed based on the representation of signals in their orthogonal bases, such that for 0 < p< ∞. In numerical analysis, wavelets are used as a tool for solving partial differential equations and applications of linear operators are used for arbitrary functions, Such as applications in sound engineering, image, and signal processing.
An orthonormal wavelet basis for
is a family of functions:
By translation and dilation of the mother wavelet
, any function
can be expressed as a function of the wavelet
:
Maintaining the
equality, the wavelet coefficients are calculated by the scalar products:
The biorthogonal wavelet for
can be represented as:
Let be a dictionary of vectors having the unit of -norm, i.e., for all This dictionary is supposed to be complete, which means that it includes K linearly independent vectors that define a basis of the signal space
2.2. Compressive Sampling Matching Pursuit (CoSaMP)
For vectors that represent sparse and compressible signals in
, the
“quasinorm” is defined as:
Signal is sparse when . Real signals can be transformed to sparse signals, which means that their inputs decay fast when categorized by magnitude. As a result, compressible signals closely approximate sparse signals, for example, orthonormal bases such as a Fourier or wavelet basis.
CoSaMP is a greedy pursuit algorithm that integrates combinatorial algorithms to ensure speed and to provide accurate error bounds. The most important information of the signals is stored in the discrete vectors , where the values are the representations of atoms. The reconstruction of the signal is carried out with an alternative technique to Nyquist, since the vector is a representation of the original signal with sparse data. To reconstruct the original signal , a vector is required that is the sparse representation of the original signal, , multiplied by a dictionary matrix, , which are the orthogonal bases of the original signal. The rows of the dictionary are much smaller than the columns fulfilling plus an error approximation . Finally, x
, where
.
For a given precision parameter,
, the algorithm CoSaMP produces a sparse approximation
that satisfies:
where
is a best
-sparse approximation to
. The running time is
, where
bounds the cost of a matrix vector multiplied by Φ. The working storage is O(N).
2.3. Reconstruction Quality Metrics
The compression process has metrics that allow the analyzing of the quality of processed signals, taking into account the reconstructed signal and comparing it with the original signal . The two vectors must have the same dimension, that is, , that is, they have the same number of samples that symbolize the values of the signal in discrete time. Next, the equations that allow the determining of the quality of a signal are presented:
Normalized mean-squared error (
).—the
result between the original signal and the reconstructed signal must be the closest to 0. The equation is presented below:
Correlation (
).—the
result between the original signal and the reconstructed signal must be the closest to 1, where the operator “·” is the inner product of the vectors. The equation is shown as follows:
Percentage of retained energy (
).—the
result between the original and the reconstructed signal should be as close to 100%. The equation is shown as follows:
2.4. Proposed Algorithms
This research proposed three algorithms. Algorithm 1 was developed to extract signals characteristic such as the zero-crossing index, the maximum amplitude, and the number of samples per period, thus allowing any type of signal with different sampling rates or amplitudes to be compressed. The first step is the acquisition of data from any measurement device. The data can contain different signals such as the voltage and current measurements of three phases at the beginning and end of a transmission line along with time, with a total of 13 signals that are arranged in a matrix called Electrical Signal “ES”.
For Algorithm 1 to be executed dynamically regardless of the number of signals, the number of rows and columns of the matrix is calculated, and the values in the Row Electrical Signal “RES” and Column Electrical Signal “CES” are stored. The second step is the extraction of the characteristics of the signals such as the zero-crossing index that stores the values in Start Zero-Crossing “SZC” and Final Zero-Crossing “FZC”. Identifying the zero-crossing index allows the algorithm to quantify the number of samples per cycle “NS” and to determine the maximum amplitudes of each signal “ML” and the location index “IL”.
The third step is to determine the orthogonal bases of each signal using the six level biorthogonal wavelet transform. “C” contains the wavelet decomposition, and “L” contains the number of coefficients per level. Finally, the approximation coefficients at level N are calculated using the wavelet decomposition structure [C,L]. Finally, the result is the compressed signal “CS”.
It must be taken into consideration that the execution time of Matching Pursuit depends on the number of samples of the original signal. The time complexity in big O notation is . It is for this reason that the signal is first compressed by applying a wavelet in order to later apply compressed sensing, obtaining low compression and reconstruction times. The fourth step separates the time vector from the signals by taking the first and last time values along with the size of the vector and storing them in the Compression Time “CT”.
Next, the size of the new Compressed Data CD matrix that contains only the signal samples is calculated, storing the values in the Row Compressed Data “RCD” and Column Compressed Data “CCD”. The fifth step is the same as that performed in the second step. The sixth step creates an identity matrix of size NS, which is the number of samples per cycle, and stores it in the Identity Matrix “ID”, and then creates the transposed Discrete Cosine Transform matrix of size NS and stores it in the variable Psi. With these two matrices, the Phi sensing matrix is formed. Next, values are assigned to certain parameters that are required by Matching Pursuit.
Algorithm 1 Feature extraction and orthogonal bases |
1: Step 1: Acquire data from a database file 2: ES = 𝑙𝑜𝑎𝑑 (′𝑓𝑖𝑙𝑒𝑛𝑎𝑚𝑒′) 3: [RES,CES] = size(ES) 4: Step 2: Feature Extraction 5: 𝑓𝑜𝑟 𝑖 = 1: 𝐶ES 6: SZC (𝑖) = ES (:, 𝑖) < 0 𝑎𝑛𝑑 ES (:+1, 𝑖) > 0 7: FZC (𝑖) = ES (:, 𝑖) > 0 𝑎𝑛𝑑 ES (:+1, 𝑖) < 0 8: NS(𝑖) = SZC (𝑖)- FZC (𝑖) 9: 𝑒𝑛𝑑𝑓𝑜𝑟 10: 𝑓𝑜𝑟 𝑖 = 1: 𝐶ES 11: [ML (𝑖), IL (𝑖)] = 𝑚𝑎𝑥 (ES(SZC (𝑖):FZC(𝑖), 𝑖)) 12: 𝑒𝑛𝑑𝑓𝑜𝑟 13: Step 3: Orthogonal bases 14: for i = 1: CES 15: [C, L] = wavedec(ES(:,i),level,’bior1.1’); 16: CS(:i) = appcoef(C,L,’bior1.1’); 17: end 18: Step 4: Split the time vector from the data 19: CT = [ES (1,1) ES (end,1) RES] 20: CD = (CS(:,2:end)) 21: [RCD,CCD] = size(CD) 22: Step 5: Feature Extraction 23: 𝑓𝑜𝑟 𝑖 = 1: CCD 24: SZC (𝑖) = CD (:, 𝑖) < 0 𝑎𝑛𝑑 CD (:+1, 𝑖) > 0 25: FZC (𝑖) = CD (:, 𝑖) > 0 𝑎𝑛𝑑 CD (:+1, 𝑖) < 0 26: NS(𝑖) = SZC (𝑖)- FZC (𝑖) 27: 𝑒𝑛𝑑𝑓𝑜𝑟 28: 𝑓𝑜𝑟 𝑖 = 1: CCD 29: [ML (𝑖), IL (𝑖)] = 𝑚𝑎𝑥 (CD(SZC (𝑖) :FZC(𝑖), 𝑖)) 30: 𝑒𝑛𝑑𝑓𝑜𝑟 31: Step 6: Compressed sensing 32: ID = eye(NS) 33: Psi = dctmtx(NS)’ 34: Phi = [ID Psi] 35: k = 4 36: SV = Compressive Sampling Matching Pursuit (Phi, k) 37: CSMP = SV(CD) 38: Data.CT = CT 39: Data.CSMP = CSMP |
For example, the minimum number of atoms (variable k) that are necessary to carry out an acceptable reconstruction is three when compared to the fifty-two samples per cycle of the signal. Next, the Compressive Sampling Matching Pursuit algorithm is implemented with the Phi matrix and k values and is stored in the solver variable SV. The signal compression is performed by entering each cycle of the signal in the SV solver, and the result is a sparse vector of size 1 × 52, in which only one value is presented and the remaining fifty-one are zeros, which are stored in the Compressed Signal Matching Pursuit CSMP variable Finally, in order not to store an array of zeros, the values obtained with the indices of the positions and the time are stored in an array of arrays, which is a structure called Data.
Algorithm 2 presents the steps that must be performed to reconstruct the signal. The first step is the acquisition of the data from the Data structure. The second step is conducted to form the signal, where it must be taken into account that the only values stored in CSMP are the four data values and the indices that represent the locations in the matrix of zeros in each period of the signal. The second step is the signal reconstruction.
The reconstruction of each period is conducted by multiplying CSMP by the Phi matrix, thus obtaining the CS signal. If the application requires the same amount of data as the original signal, ES, interpolation must be performed, thus obtaining a reconstructed signal with the same size as the original signal that is stored in Signal Recovery SR. The CT values are the start time, end time, and the number of samples of the original signal, which allow the creation of a time vector equal to the original one. Finally, it is necessary to calculate the metrics of the quality of the signals that are NMSE, COR, and RTE between the original signal and the reconstructed one.
Algorithm 2 Reconstruction |
1: Step 1: Data acquisition 2: load Data 3: CT = Data.CT 4: CSMP = Data.CSMP 5: Step 2: Reconstruction 6: CS = Phi * CSMP 7: SR = interpft(CS,CT(3)); 8: TR = linspace(CT(1), CT(2), CT(3)) 9: NMSE 10: COR 11: RTE |
3. Results
The power quality variables analyzed were swell, sag, flicker, triphasic fault, and stable state. For the measurement of very fast transients such as flickers or atmospheric discharges that are of a short duration, impulses between 50 ns and 1 ms are required. The established sampling frequency was 200 kHz for 0.33 s, generating 66,000 samples per phase. The matrix that was formed by each electrical distortion was of a size of 66,000 × 4 and contained the time along with the measurements of the three phases R, S, and T. The disk size was 1,889,931 bytes. The equipment in which the entire signal compression process was carried out was a laptop with an Intel(R) Xeon(R) E-2176M Processor CPU @ 2.70 GHz and with 64 GB of RAM; the GPUs of the graphics card were used for parallel processing.
Figure 1 shows, in the upper-left part, different types of electrical faults, where the signals are represented in the time domain and are displayed by cycles. The number of samples for each signal was 52 per cycle. Next, different results are shown by modifying the index k, which is the number of atoms in the signal. In the upper right part, the graphical results of the algorithms proposed with k = 1 are presented. It can be seen that a minimum of one atom was required to carry out the reconstruction of the original signal with a certain degree of error. In the lower-left part, the result is presented with k = 5. It can be seen that a minimum of five atoms were required to carry out the reconstruction of the original signal with a certain degree of error. In the bottom-right part, the result is presented with k = 10. It can be seen that a minimum of ten atoms were required to carry out the reconstruction of the original signal with a certain degree of error.
The results obtained are presented below.
Figure 2,
Figure 3 and
Figure 4 shows the compression results for signals in a steady state, plus the swell, flicker, triphasic fault, and sag, respectively. The index k was varied from 1 to 10. When k = 1, the optimization process calculated the single most representative atom; when the index k increased, the number of atoms increased in steps of one. It can be seen that while the index k increased, the RTE, NMSE, and COR metrics improved, the TC compression times and restoration times were reduced due to the shorter optimization process, and the final weight increased while the compression ratio decreases.
Figure 2 shows the statistical results of the 50 simulations performed using different power quality signals and by varying the number of atoms. The retained energy percentage reached optimal levels from k = 4, as shown in the results, with an RTE between 98.2% and 99.7%.
Figure 3 shows the similarity degree of the statistical results between the original and reconstructed signals. The correlation of the 50 simulations carried out is presented, and the optimal levels were achieved from k = 5, as shown in the results, with a COR between 99.6% and 99.8%.
Figure 4 shows the statistical results of the degree of the normalized mean-square error between the original signal and the reconstructed signal. The NMSE of the 50 simulations carried out is presented, and the optimal levels were achieved from k = 4, as shown in the results, with an NMSE between 0.0028% and 0.017%.
Table 1 shows the average compression and reconstruction times, the final size of the signal, the compression percentage, and the average compression ratio of the different energy quality signals with the number of atoms ranging from one to ten.
5. Conclusions
This research presented a methodology that used compressed sensing techniques and contributed to the processes of measurement, transmission, processing, and storage of information, thanks to the high levels of compression achieved. The results achieved with the algorithms proposed in this research compressed the electrical signals of power quality with ratios of 2216:1, but the quality indicators were not good. To improve the indicators, we recommend using a minimum value of k = 3; in this way, all the results achieved by other researchers until the year 2022 were surpassed.
To reconstruct a signal that contained fifty-two samples per cycle, a minimum of one atom was required. The atom was the most representative value of the signal and was obtained by applying the Compressive Sampling Matching Pursuit technique. Each signal contained different values and positions of the atoms. Atom positions depend on changes that might happen in the signal under analysis.
Finally, the compression level presents inverse proportion when compared to the quality indicators of the RTE, NMSE, and COR signals. As can be seen in the results, the higher the compression, for example with k = 1and considering a compression ratio of of 2216:1, the correlation was 70.00. On the other hand with k = 3 and a lower level of compression of 1024:1, the correlation was 99.07
It is proposed as future work to use the algorithms proposed in the present investigation with windowing, combining compression with and without loss of information to further improve the compression indexes RTE, NMSE, and COR.