1. Introduction
Direction of Arrival (DOA) estimation is an array signal processing application found in radars, navigation, military devices, and medical appliances [
1]. Most recently, it has been applied in wireless communication systems such as Bluetooth to localize devices in indoor environments where satellite-based radio-navigation systems fail to do so. In 2015, Bluetooth released its positioning technology based on the Received Signal Strength (RSS) [
2]. In that past technology, a Bluetooth Low Energy (BLE) tag transmitted radio frequency (RF) signals to receivers. The receivers could estimate the positions of such tags from RSS measurements. The positions were not precisely estimated, in fact, the tag could be anywhere inside a circular zone if trilateration was used, which is a typical RSS-based positioning method. According to Bluetooth itself, such technology has an accuracy of a few meters (1–10 m) [
3], and even if more complex algorithms are employed, there are limitations on the achievable positioning accuracy. In some cases, such accuracy of a few meters is enough, but some occasions need higher accuracy, for example, machine navigation for autonomous mobile robots, drones, industrial automation, or navigation systems that guide people in indoor environments. Higher accuracy is also desirable in asset tracking, where factories track material workflow and hospitals track equipment location.
Most recently, Bluetooth has released a new capability that makes it possible to estimate DOAs [
4], unlocking high accuracy as DOA-based positioning is reported to attain sub-meter-level position accuracy [
3,
5,
6,
7]. In the direction of arrival version of that new capability, receivers have an array of antennas to make it possible to estimate the azimuth and elevation angles coming from a signal emitted from a BLE tag. By applying, for example, the triangulation method, which employs the cited angles for each receiver and their positions, it is possible to estimate the location of BLE tags. Besides using for DOA estimation, BLE is broadly used for wireless communication of the Internet of Things (IoT). In the IoT terminology, its networks are composed of many nodes that are small battery-powered resource-constrained embedded systems. Particularly, in IoT networks that deploy DOA-based positioning systems, some nodes, called anchor nodes, have an array of antennas that execute DOA methods to estimate the azimuth and elevation of RF signals emitted from BLE tags which are also nodes.
Figure 1 depicts an IoT mesh network with the cited positioning systems.
In IoT networks, it is important to note that nodes are often constrained embedded systems. These systems typically consist of three distinct subsystems [
8], as illustrated in
Figure 2. The microcontroller or computing subsystem is responsible for controlling the node’s functionalities, and runs instructions on a low-power processor that commonly operates at a few megahertz. It also has flash memory and RAM whose sizes are in order of kilobytes. Flash memory is a non-volatile memory that stores the program’s instructions and constant data values, whereas RAM is a volatile memory that is used as data storage for the program while it is running. Nodes usually have a simple real-time operating system that executes all functionalities, also known as tasks, concurrently. The sensor subsystem collects data from natural phenomena in the form of analog signals via sensor readings and transforms them into digital signals using an Analog-to-Digital Converter (ADC). Finally, the communication subsystem is made up of a transceiver and normally a co-processor device that is responsible for transmitting and receiving data.
While Bluetooth specifies a protocol for transmitting a constant tone for DOA purposes, the development of direction of arrival algorithms is left to implementation. However, the implementation of DOA methods in IoT devices presents a significant challenge as these devices are often battery-powered and resource-constrained embedded systems with limited computational resources as previously mentioned. In contrast, DOA methods consist of complex numerical algorithms that are resource-intensive and time-consuming, potentially leading to rapid battery depletion, unacceptable execution times, and resource starvation. Moreover, IoT devices typically run a simple real-time operating system that executes small tasks concurrently, such as data acquisition from sensors and communication with other devices. The execution of DOA methods in such a multi-threaded environment is even more challenging, and computational resource management needs to be carefully thought out.Therefore, developing DOA algorithms for IoT devices demands an innovative approach that considers the balance between resource constraints and DOA accuracy, without compromising battery life or real-time system performance.
2. Literature Review
Typically, research about DOA does not take into account the implementation point of view sufficiently. In many cases, research is carried out using multi-paradigm programming languages such as MATLAB, which already have a range of pre-made general-purpose numerical functions, resulting in little motivation to develop tailor-made numerical algorithms for DOA methods. However, when implementing these methods in constrained embedded systems, numerical algorithms must be developed from scratch, as C programming language libraries offer very limited support. Additionally, widely-used linear algebra libraries such as LAPACK [
9] and Armadillo [
10] are not suitable for use in constrained embedded systems. Although the Common Microcontroller Software Interface Standard (CMSIS) DSP Software Library is designed for these systems, it lacks some numerical algorithms used in DOA methods. Furthermore, DOA estimations are typically used in radars or large antenna arrays that employ much more powerful processors than those in constrained embedded systems. While accuracy is usually the primary performance criterion of interest, energy consumption, and memory usage are also crucial considerations for battery-powered IoT devices. Therefore, DOA methods should not occupy significant amounts of memory, blocking other IoT tasks. Additionally, complex numerical algorithms require considerable computation, which affects the execution time, an essential performance criterion for real-time applications.
The array of antenna used in this research is the L-shaped array. The L-shaped array has an interesting propriety since it is composed of two orthogonal Uniform Linear Arrays (ULAs). One-dimensional (1D) DOA methods can be applied separately for two ULAs to estimate the azimuth and elevation angles. Other shapes of planar arrays, such as Uniform Rectangular Arrays (URAs) and Uniform Rectangular Frame Arrays (URFAs), rely on two-dimensional (2D) DOA methods which are more complicated than their 1D versions. Moreover, well-known fast algorithms were specifically developed for ULAs. Additionally, some exploit the Vandermonde structure in the signal model found in such an array to speed up their computations. Namely, Root Multiple Signal Classification (Root MUSIC) [
11], Estimation of Signal Parameters via Rotational Invariant Techniques (ESPRIT) [
12], Fourier Domain MUSIC [
13], and Rank-Reduction Method (RARE) [
14]. Most recently, many modified versions of the cited methods have been devised, claiming to be a better version in a certain way. Nevertheless, ESPRIT was later designed for URAs and Uniform Circular Arrays (UCAs) [
15], whereas Fourier Domain MUSIC can be applied to non-uniform linear arrays as well.
Among many DOA methods, Multiple Signal Classification (MUSIC) [
16], invented in the 80s, is an important algorithm that has been extensively tested in simulation and the real world for many decades as well as comprehensively studied, culminating in some well-known modified versions. Notably, the standard MUSIC detects DOAs by searching for peaks in the spatial spectrum. It also can be extended to find azimuth and elevation angles by searching for peaks in a 2D spatial spectrum of the planar array of antennas. However, that 2D search is a prohibitively expensive operation which motivated the development of Reduced-Dimension (R-D) MUSIC [
17] that exploits the structure of an L-shaped array of antennas to do that search in two 1D spatial spectra, one for each ULA. Knowing Root-MUSIC is a search-free method that exploits the Vandermonde structure of ULAs to apply a root-finding method that substantially reduces its execution time, the Reduced-Dimension (R-D) Root MUSIC came naturally as a modified version that executes Root MUSIC two times, one execution for each ULA.
Research about the BLE direction finding is still scarce as it is a recent feature released in 2017. In [
18], research was conducted involving real-world experiments using that BLE feature in an indoor environment focused on array calibrations using two ULAs and one URA. It reported that mutual coupling had a minimal influence or even had no clear improvement on the accuracy of a variation of MUSIC. However, Carrier Frequency Offset (CFO) compensation substantially impacted it. In [
5], four 4 × 4 URAs were employed in an indoor environment of 25 m × 15 m wide composed of pillars, walls, human movements, tables, chairs, devices, and lamps. In total, 8 eight distinct positions were estimated, and 32 different estimations were evaluated using BLE direction finding and the MUSIC method. The average error was
m, attaining a good result concerning the sub-1-meter accuracy purpose.
Papers about real-world implementations are less common than about in a simulation. In [
19], a modified version of MUSIC was developed in a Digital Signal Processor (DSP) for underwater acoustic sources. Notably, it employed a Reduced-Order Root-MUSIC method that reduces the polynomial order of Root MUSIC to speed up computations. In [
20,
21], researchers conducted a small-scale experiment using a ULA with four elements and one single transmitter. The array of antennas was connected to the NI PXI platform, and an antenna transmitter was connected to another PX platform. After running a single-source MUSIC and a Total Least Squares ESPRIT in LabView NI hardware to estimate two different DOAs, they concluded both methods could be used in real-time applications. Moreover, the development of 2D MUSIC for an L-shaped antenna array to estimate the azimuth and elevation angles based on parallel computing was successfully devised for a Digital Signal Processor (DSP) [
22]. More specifically, researchers parallelized the eigenvalue decomposition to construct the signal or noise subspace and the peak searching method to find DOAs by exhaustive search. Despite the parallelization, the execution time of parallel MUSIC was
ms, which can be seen as slow considering that the experiment used a DSP of 1 GHz.
Most of the research did not focus on the implementation aspect regarding the optimization of the method to be adapted to constrained embedded systems. Except for one cited paper, the rest did not evaluate other important performance criteria such as execution time, energy consumption, and memory footprint. This research takes a step further by adapting the R-D Root MUSIC for a Radio Frequency (RF) switch based on Bluetooth specification. Additionally, the adapted method applies unitary transformations to avoid complex arithmetic, and a real-valued Eigenvalue Decomposition Method (EVD) is employed to find the roots of complex-valued polynomials. The novel method also exploits the radio communication system design of Bluetooth to speed up its execution based on BLE receivers’ capability of estimating one single DOA only.
Notation: by defining the complex number , the operators and return the real and imaginary part of z, respectively. The argument of a complex number is an operator that returns the phase angle () in the interval while is the complex conjugate. The complex absolute operator is defined as . The operator denotes a diagonal matrix formed by an input vector. The Hermitian transpose is which applies the transpose and complex conjugate on a matrix. The operator is the floor function, that is, it outputs the greatest integer less than or equal to the input which is a real number. The Hadamard product is an operator that takes two matrices, for example, and of the same dimension, and outputs another matrix whose elements are given by . is an identity matrix while is a zero column vector both of size n.
3. Mathematical Model for L-Shaped Uniform Array
Figure 3 shows the structure of the L-shaped uniform array. It is composed of two orthogonal uniform linear arrays of
M antennas in the X-Y plane in which the distance between two adjacent antennas is
. All antennas are assumed to be identical, isotropic, and omnidirectional. Suppose there are
d (
) independent far-field narrowband stationary signals,
such that
, incident on the array plane at 2D angle
in which
is the azimuth and
is the elevation angle. Let us also assume the signals propagate in an AWGN channel with linear and isotropic transmission medium. DOA methods compute the broadside angle, which is the signal direction measured relative to the line perpendicular to the array. Since that array is composed of two ULAs, there are two broadside angles,
, and
, as shown in
Figure 3, which correspond to the x-axis and y-axis ULAs, respectively. From geometric properties, the relation between
and
with azimuth and elevation are shown in Equations (
1) [
23],
Assume that the L-shaped array is not subject to nonidealities such as mutual coupling and cross-polarization effect. Additionally, consider that all antennas are identical and have omnidirectional gain functions, i.e.,
. The model of the signal samples received by x-axis and y-axis arrays at a timestamp
t can be expressed as Equations (
2) and (
3) [
24],
where
is the array observation at timestamp
t of the x-axis ULA which is a vector of the signal samples for each individual antenna in the x-axis ULA, such that
corresponds to a single signal sample received from the antenna
i at timestamp
t. Likewise, for
in the case of y-axis ULA. Moreover,
is a vector of signals of
d sources,
are the additive white Gaussian noise (AWGN) and
are the ideal steering matrices of the x-axis array and y-axis array, respectively, as shown in Equations (
4) and (
5),
where the ideal array responses are defined in Equations (
6) and (
7),
in which
and
, and
c is the speed of light,
is the carrier frequency.
Moreover, in our analyses, it would be useful in some cases to consider the model signal samples for the whole L-shaped array instead of two separate ULAs. Let us define
as the array observation of the L-shaped array at timestamp
t which is a vector of the signal samples for each individual antenna in the L-shaped array, such that
corresponds a single signal sample received from the antenna
i at timestamp
t. The signal samples of an L-shaped array are composed of
such that
, the common antenna
for both ULAs in addition to
such that
. The model of the signal samples received by the L-shaped array at a timestamp
t can be expressed as Equation (
8),
where
is the additive white Gaussian noise (AWGN) and
is the ideal steering matrix of the L-shaped array, respectively, shown in Equation (
9),
if each element of the ideal L-shaped array response,
, is represented in Equation (
10),
where
and
, hence the L-shaped array response can be expressed in a compact form in accordance with Equation (
11),
5. RF Switch Model
Theoretically, all antennas within an array should sample the signal at the same time at each antenna port. However, this would require each antenna to have its own RF front-end, which includes components such as analog-to-digital converters, filters, mixers, and low-noise amplifiers. Unfortunately, incorporating such analog components for each antenna would lead to increased power consumption, physical size, and higher costs for constrained embedded IoT devices. To address these challenges, it is more appropriate for the array to have a single RF front-end and an RF switch, enabling each antenna to utilize the RF front-end at different times. The Bluetooth protocol considers this radio architecture for its direction-finding capability, and in this research, we opted to utilize the switching protocol outlined in the Bluetooth 5.1 specification [
4] with an L-shaped array.
Bluetooth utilizes Gaussian Frequency-Shift Keying (GFSK) where 0s and 1s are modulated into different frequency shifts [
26]. The two frequencies are equal to the central frequency (
) in addition to a frequency deviation (
). To comply with the theoretical assumption, the signal should be stationary, that is, a signal with a constant time-frequency is desirable. Thus, the transmitter (signal source) sends a Constant Tone Extension (CTE), composed of a continuous series of digital ones, so the frequency remains the same during the IQ sampling. Notably, if the signal were non-stationary, DOA methods would be more complex [
27].
During the CTE, the first 4 μs is the guard period. The reference period, which takes 8 μs, is the beginning of the IQ sampling operation but only one antenna of the receiver (anchor node) carries out the sampling operation. Only one single IQ sample is performed per 1 μs, totaling eight IQ samples. Afterward, a series of sampling and switching operations begin, which is referred to switch-sample period. The switch and sample slot last 1 μs or 2 μs; in this research we consider 1 μs time slot. For every switch slot, the RF switch device switches from one antenna to another, so that another antenna can acquire a single IQ sample during the next sample slot. Since this research considers a 1 μs time slot, there are 74 sample slots in the switch-sample period.
Figure 4 depicts all operations during the CTE.
Bluetooth low energy signals can be considered narrowband when using 1 MHz bandwidth in indoor scenarios where the typical delay spread is between 20 ns and 60 ns [
28]. Therefore, BLE satisfies the narrowband premise in
Section 3, if we take into account all the cited assumptions as well, the mathematical model is the same as Equations (
2) and (
3) in addition to the phase shift due to the RF switch. Without AWGN, the phase shift between two consecutive samples in the reference period was reported to be about
–
[
29]. These numbers double for two consecutive samples in the switch-sample period. As a result, it is imperative to develop a phase compensation to make the DOA method work properly. As previously mentioned, the transmitter sends a continuous signal representing digital ones which is the CTE where its carrier frequency
is between 2402 MHz and 2480 MHz depending on the used channel. The narrowband incoming signal can be expressed in a complex format in Equation (
27) [
18],
where
t is the time, and
is called the complex envelope. However, that frequency in order of gigahertz is too high for the ADC, so the receiver RF front-end performs complex downconversion also known as quadrature demodulation. That operation outputs the in-phase and quadrature (IQ) components of
in the baseband in such a way that the central frequency (
) corresponds to a DC [
29]. As a result, the IQ components can be expressed in Equation (
28),
where
A is the amplitude,
is the initial phase,
= 250 kHz is the frequency deviation considering LE 1M physical layer [
4], and
is the carrier frequency offset (CFO) which is in order of 10 kHz [
29]. Without loss of generality, let us consider that IQ samples from the reference period correspond to the first antenna of the L-shaped array. From Equation (
28), the phase shift between two consecutive samples of the reference period without considering AWGN is evidenced by Equation (
29),
where
t = 1 μs and
. In other words, it is possible to estimate the phase shift over a 1 μs time period by using the samples of the reference period. However, we are interested in
since we can use it to estimate the phase shift over other time periods. Assuming the carrier frequency offset is constant during the CTE, we can estimate
by calculating the average of the phase difference between two consecutive IQ samples of the reference period, as shown in Equation (
30),
By adopting the Round Robin switch pattern, each antenna from the L-shaped array carries out IQ sampling sequentially, as shown in
Figure 5. Note that the switch pattern begins in the last reference period slot. As a result, the L-shaped array samples 75 IQ samples in total, 74 samples from the switch sample period, and 1 sample from the last reference period slot. Moreover, the array observation, in this case, is defined as one single sequence of the Round Robin pattern, that is, when all antennas in the array complete the IQ sampling. Observe that antenna
k performs an IQ sampling
μs after the array observation starts. It means that the phase shift is
without considering AGWN as indicated by Equation (
31),
where
. We can generalize the observation of Equation (
31). As a result, let
be an array observation of a Round Robin sequence that begins at timestamp
t, the array
relates to
by the phase shift matrix due to the RF switch labeled as
in accordance with Equation (
32),
such that,
where
and the RF switch phase shift matrix is a diagonal matrix defined in Equation (
33),
DOA methods such as MUSIC can estimate multiple DOAs during their execution, so radar applications sending sounding signals and measuring when their own signal is received from different reflections can take full advantage of that capability by identifying multiple copies of their own reflected signal. However, in IoT radio communication systems where anchors are employed to locate multiple tags, this is not possible in practice with low-cost single receiver anchor nodes operating at a single RF channel at a given time such as in Bluetooth receivers [
30]. That is, if more than one BLE tag sends a signal to an anchor node at the same time and frequency resources, the signal to interference and noise ratio would be too low for that radio receiver to detect transmission reliably.
For example, a receiver could not be able to decode both transmitter IDs of the transmitters reliably as each transmission would interfere with the other. Therefore, in this scenario, DOA methods can only estimate a single DOA only per execution. As result, the L-shaped array observation model of one sequence of the Round Robin pattern is defined in Equation (
34),
Note,
is a scalar that represents one single signal in opposition to the vector
found in Equations (
2) and (
3). Moreover, the number of array observations depends on the number of antennas, that is,
. Notably, 75 is the number of IQ samples and
is the number of antennas in the L-shaped array, such that
. Observe that, if 75 is not divisible by
, the last
IQ samples are not used.
MUSIC was devised considering that all antennas perform IQ sampling at the same time, which is not the case for Bluetooth specification. Thus, without a phase compensation, the accuracy of Unitary R-D Root MUSIC is totally compromised, as shown experimentally in
Section 7. The DOA method receives
N array observations as the input shown in Equation (
35),
From Equation (
32), we know that
. Thus, the DOA method needs to apply the RF switch compensation matrix (
) into the N array observation matrix (
) as expressed in Equation (
36),
where
Note that in the real world, the equality in Equation (
36) is an approximation, since the RF switch compensation matrix (
) takes an estimation of
calculated in Equation (
30). Moreover, the Unitary R-D Root MUSIC needs to obtain matrices
and
, which are the
N array observations from x-axis and y-axis ULA, respectively, as defined in step 1
Section 4. To do that, observe that
is equal to the first
M rows of
, and
is equal to the last
M rows of
. As a result, step 1 needs an additional operation to obtain matrices
and
prior to covariance calculations.
6. Modified Unitary R-D Root MUSIC
The implemented solution optimizes the Unitary R-D Root MUSIC by exploiting Bluetooth’s radio communication system design where only a single BLE tag transmits a signal at a time, as discussed in
Section 5. It means that the signal subspaces,
and
, is a column vector. As a result, the implemented solution can void applying the time-consuming EVD and instead apply the Power Method, which is a much simpler algorithm. Experimentally, we found the Power Method converge mostly in four iterations only in our solution. Moreover, the computation of the RF switch compensation matrix is performed in a linear time complexity instead of executing the inverse of a matrix with a cubic time complexity. The implemented solution utilizes a finding-root method based on EVD that does not require computing complex arithmetic despite the polynomial having complex coefficients and roots. However, the implemented solution employs the ideal array response. The real array response must be empirically found and plugged into the implemented solution. Refer to [
18,
31,
32] to know how to compute the real array response.
The objective of the optimization is to reduce the memory consumption and execution time of Unitary R-D Root MUSIC to attain satisfactory portability to run in constrained embedded systems. Thus, the algorithms were implemented from scratch in the C99 programming language, except the inverse of sine, the inverse of a tangent, and squared root, which are functions from
math.h. The tailor-made numerical methods include the Power Method, an EVD, which is an adaptation of [
33] that consists of the Shifted QR Algorithm, the Balancing technique, Hessenberg decomposition, and auxiliary linear algebra algorithms such as the norm of a vector and multiplication of a matrix with a vector. Since one of the objectives of the implemented solution is to attain a minimal memory footprint as much as possible, it does not use the
complex.h library from C programming language. Instead, it has a data structure for complex numbers with two variables representing the real and imaginary parts and functions for complex multiplication, addition, division, and conjugation. Notably, the implemented solution only employs
math.h and
stdint.h libraries, reassuring its minimal computational resources consumption goal and portability.
Due to the switching protocol of Bluetooth 5.1, more operations are required in step 1 of
Section 4. That is, the method collects samples from the reference period and
N array observations (
) by performing the Round Robin switch pattern. Subsequently, the implemented solution calculates
from Equation (
30) using the samples from the reference period. Afterward, it applies the RF switch phase compensation (Equation (
36)) to estimate the matrix
. Then, it separates the IQ samples of the x-axis ULA from the y-axis one. More specifically,
is composed of the first
M rows of the estimated
, while
is the last
M rows. Finally, it calculates the two covariance matrices,
and
from Equations (
19) and (
20).
The first and simpler optimization concerns Equation (
19). As discussed in [
34], the matrix
is big for constrained embedded devices since it contains many IQ samples that are complex numbers. In fact, if the implemented solution uses all the IQ samples,
will occupy 600 bytes considering the single-precision floating point. By performing the sample covariance matrix, the code may have to store a temporary matrix
as well, which would double the memory consumption. The implemented solution does not store
. To analyze how it is possible, let us define
as a matrix that stores
. Normally, the standard way to multiply two matrices, particularly
, is evidenced by Equation (
37),
Since
, then
; therefore, Equation (
37) could be written as indicated by Equation (
38),
Moreover, since
is Hermitian, which means
, thus the solution applies matrix multiplication only on its upper triangular part; therefore, Equation (
38) is remodeled as the set of relations in (
39),
From Equation (
39), the implemented solution estimates the covariance matrix using the same matrix
twice by applying an element-wise conjugate transpose operation, thus it does not need to store
. Furthermore, it only computes the upper triangular part of
. To sum up, that approach saves execution time by half and memory usage in the order of
. That is an important improvement, since calculating the sample covariance matrix is the second most time-consuming operation, as shown in
Section 7. The same optimization is carried out in Equation (
20) for the y-axis ULA.
Another minor optimization concerns the computation of the RF switch compensation matrix, which requires the inverse matrix calculation of
defined in Equation (
33). The Gauss–Jordan elimination is a popular algorithm to calculate the inverse of a matrix whose complex is
[
35]. However, since
is a diagonal matrix in which the generic form of its elements is known, we can avoid applying the time-consuming inverse matrix calculation. From complex arithmetic, we know that
, thus, the inverse of
is in line with Equation (
40),
Since
, then
for
. By defining,
, the inverse of
can be redefined in a more compact form as evidenced by Equation (
41),
From the redefinition in Equation (
41), we can derive the Relation (
42),
Thus, to calculate
, the implemented solution only needs to set
, compute
, and apply Equation (
42) successively for
to take advantage of the previous computation. As a result, the implemented solution does not need to compute each element of
explicitly, that is
, which may require computing the finite Maclaurin series
times to calculate all of
elements, except the first, which is 1. Notably, the finite Maclaurin series is a well-known method to evaluate complex numbers by computers, as illustrated in Equation (
43),
where
n is the number of elements. Moreover, to reduce memory consumption, the implemented solution just needs to store the diagonal of
as a column vector and compute the element-wise (Hadamard product) of that vector with
defined in Equation (
32). More specifically, let
be the cited column vector, since
the implemented solution calculates
by applying the Hadamard product of
in each element of
as shown in Equation (
44),
replacing Equation (
36). Particularly, the computation in Equation (
44) is faster than (
36), since the RF switch compensation is a matrix in Equation (
36), whereas in Equation (
44) it is a vector
. Moreover, Algorithm 1 calculates the RF switch compensation (
) whose complexity is
.
Algorithm 1: computation of the RF switch compensation () |
|
We can compute the two covariance noise subspaces (
and
) indirectly via their respective signal subspace (
and
) by Equations (
45) and (
46) [
36],
Remember that the implemented solution only estimates one single DOA, which means,
. Thus, the signal subspace is composed of only one eigenvector, which means we can apply the Power Method [
37] that only estimates one eigenvector, which is the signal subspace, as proved in the next paragraph. The noise subspace is composed of
eigenvectors corresponding to the smallest eigenvalues. Therefore, if we calculate the covariance noise subspace directly, we would apply an EVD method that computes all eigenvectors and eigenvalues. In addition, computing all of them requires a very complicated and time-consuming algorithm in addition to more memory footprint.
Notably, the complexity of the QR Algorithm, a typical method for EVD, is
per iteration [
38], not to mention the Hessenberg decomposition that could be performed before the QR Algorithm, and an algorithm to create the noise subspace from the probable unsorted pairs of eigenvalue-eigenvector. However, since they could be unsorted, to construct a noise subspace a reasonable approach seems to apply a sorting algorithm that could have an average complexity between
to
[
39]. While the Power Method has a complexity of
per iteration, it calculates the signal subspace, requires very simple computations, and experimentally we found it converges mostly in four iterations only in our solution.
Figure 6 depicts the algorithm overview of the covariance noise subspace computation. The left one computes the covariance directly while its fastest version (the right one) calculates the covariance indirectly via signal subspace.
To guarantee the Power Method will converge, the matrix must be diagonalizable, there must exist only one eigenvalue with the greatest absolute value, and it must be a real number [
40]. For example, considering
to be eigenvalues of a diagonalizable matrix, if
then the cited matrix satisfies the convergence requirements. The matrix
is a real covariance matrix, thus it is symmetric [
41]; therefore, it is diagonalizable, and its all eigenvalues are real numbers [
42].
The Power Method outputs the eigenvector and its associated eigenvalue, which is the greatest in magnitude. Here, we prove that the greatest eigenvalue in magnitude is the one of the signal subspace. Note that
is a real covariance matrix, hence, it is positive semi-definite [
41], which means all eigenvalues are non-negative. The line-of-sight (LOS) component of the received signal that constitutes the eigenvalue of the signal subspace is greater than the eigenvalues of the noise subspace [
24], and since they are all non-negative, it is not possible to have eigenvalues of the noise subspace equal to or greater than the one of signal subspace in magnitude. Therefore, the eigenvalue of the signal subspace is the greatest in magnitude, hence its corresponding eigenvector is the signal subspace. The same analysis goes to
.
The implemented Power Method (Algorithm 2) does not compute the eigenvalue since the solution only needs the eigenvector. We considered and . We carried out thousands of experiments and we verified that in most cases the Algorithm 2 takes 4 to 5 iterations to converge, and 30 iterations are much more than enough in all experimental instances, hence for we assume the algorithm fails to compute the signal subspace.
Finding all roots of a polynomial is a difficult computational task that requires time-consuming methods, and to aggravate the problem, the polynomial in question has complex coefficients with complex roots. Classical algorithms that operate directly on the polynomial function, such as the Newton–Raphson method, Secant method, and Brent’s method, may be hard to work in practice. They only estimate one single root, some are guaranteed to converge if only certain conditions are satisfied and might not work on complex-valued polynomials, and the initial point must be chosen wisely since it could impact their convergence [
43,
44]. Although they can be extended to estimate multiple roots, they are highly sensitive to computing error since they operate directly on the polynomial function. That is the reason practical computer eigenvalue solvers, such as in MATLAB [
45], hardly resemble these root-finding algorithms [
46]. Instead, they apply EVD on the polynomial’s companion matrix to find the roots. Since such a matrix is non-symmetric, the implemented solution estimates the roots of polynomial
(or
) defined in Equation (
22) (or (
23)) by applying the Shifted QR Algorithm in which its implementation is an adaptation of the algorithm found in [
33]. The Shifted QR Algorithm is a well-known method that performs exceptionally well in practice and works even on non-symmetric matrices. It is an EVD method and an improved version of the standard QR Algorithm by including the shifting technique for rapid convergence. This algorithm calculates the eigenvalues of the companion matrix of
(or
), which are its roots. To simplify, let us consider a polynomial of the form
, where it could be either
or
.
Algorithm 2: Power Method |
|
The companion matrix of the polynomial
is defined in Equation (
47) [
43],
where
. That is, the companion matrix by definition relates to a polynomial in which its highest degree coefficient is one (
), that is the reason we have to divide all coefficients by
as indicated by Equation (
48),
The matrix
is in a complex domain, but the implemented solution executes the Shifted QR Algorithm for real-valued matrices only. To circumvent this problem, that algorithm solves the complex EVD via equivalent real formulation by converting the complex-valued companion matrix into a real-valued one. That is, the
complex eigenvalue problem in Equation (
49),
can be reformulated as
real matrix problem in accordance with Equation (
50) [
47],
However, the eigenvalue (
) could still be a complex number since the matrix in problem (
50) is non-symmetric ([
33], pp. 486–487). That would apparently require complex arithmetic, but the implemented Shifted QR Algorithm circumvents it. Refer to [
33] for more detail. The eigenvalue decomposition of
in the reformulated problem (Equation (
50)) outputs two times the number of eigenvalues of
and the complex-valued ones come in pairs of
since
is a real matrix [
46,
48]. The implemented solution can easily detect which one,
or
, is the eigenvalue of
by verifying which is the root of the polynomial
. Moreover, the EVD of
will increase the computations since its size is two times greater than
, but since the number of antennas (
M) is small for an ULA in IoT devices, we can afford this small overburden, especially because the elimination of complex arithmetic could partly or even totally compensate this computational increment.
However, before executing the Shifted QR Algorithm the implemented solution applies the balancing technique, and afterward, it reduces the matrix to Hessenberg form. Both algorithms are an adaptation of [
33] for the implemented solution. The balancing technique is a method to reduce the rounding error sensitivity of eigenvalues during the execution of EVD. Hessenberg decomposition transforms a matrix into a simpler one (Hessenberg form) to speed up the execution time of the Shifted QR Algorithms.
Algorithm 3 outputs
eigenvalues in which
are the roots of the polynomial
, and the other
are not roots but the complex conjugate of the cited eigenvalues, as explained previously. Since
, ideally the implemented solution needs to find only one single root closest to the unit circle and inside it. However, due to AWGN, that root does not need to be inside the unit circle, in mathematical terms,
Algorithm 3: Polynomial Finding Roots Method |
Input: The polynomial in which or . Output: The roots of and their conjugate, . 1 Define the companion matrix of as described in Equation ( 47). 2 Solve the complex EVD via equivalent real formulation as explained previously. Therefore, let’s define a real matrix from , that is,
3 Apply the Balancing technique in to reduce the rounding errors sensitivity of eigenvalues. 4 Reduce the balanced to Hessenberg form to speed up the execution of Shifted QR Algorithm. 5 Apply the Shifted QR Algorithm to the Hessenberg form of the balanced to get its eigenvalues, . |
The implemented solution applies Algorithm 4 to solve the optimization problem (
51). However, instead of applying the complex absolute operator (
), the algorithm uses its squared value (
), to avoid calling the square root method in every iteration of line three. The square root method usually is an iterative algorithm and it needs to converge to a point. However, modern processors have a built-in circuit for square roots. Despite that, by avoiding that operation the execution time of Algorithm 4 is slightly decreased. In lines 2–6, observe that the algorithm finds the eigenvalue (
), which is the closest to the unit circle. However, its conjugate also is the closest to the unit circle since
. Thus, in lines 7–9, the algorithm finds which one is the root of the polynomial
. That is, if the eigenvalue
is also a root, then
which means
, otherwise, its conjugate is the root. In that way, the algorithm does not need to check if the eigenvalue is the polynomial root in every iteration (lines 2–6) which saves execution time.
Algorithm 4: Find the eigenvalue which is the root closest to the unit circle |
|