1. Introduction
In the era of increasingly constrained spectrum resources, the utilization of visible light as a sustainable energy source emerges as a potent solution, enabling the realization of “lighting as communication” through the mechanism of direct vision communication, characterized by its low power consumption. This innovation bears the potential to substantially alleviate the energy demand and spectrum requirements of 6G networks [
1]. Visible Light Communication (VLC) harnesses Light Emitting Diodes (LEDs) as emissive sources for light-based signal transmission, concurrently fulfilling roles in illumination and high-speed wireless communication. Its attributes encompass resistance to electromagnetic interference, robust security features, anti-eavesdropping capabilities, environmental sustainability, minimal carbon footprint, cost-effectiveness, and unhindered access to spectrum resources. This renders VLC an auspicious contender for indoor coverage within the realm of 6G networks [
2,
3,
4]. Notably, Orthogonal Frequency Division Multiplexing (OFDM), widely proven in radio-frequency (RF) systems due to its formidable resistance against inter-code interference and its optimal utilization of ultra-high frequency bands, has found its application extended to Optical Orthogonal Frequency Division Multiplexing (O-OFDM). This extension garners growing attention due to its heightened spectral efficiency and augmented system capacity in comparison to single-carrier modulation strategies. O-OFDM’s incorporation is especially conspicuous within VLC frameworks [
5]. The distinctive characteristic of VLC systems is their reliance on intensity modulation (IM) to impart modulation onto the instantaneous luminous intensity emitted by LEDs, coupled with direct detection (DD) via photodiodes (PDs). This mandates that all signals within the VLC channel exhibit strict non-negativity and real-value. For this purpose, Direct Current (DC)-biased optical OFDM (DCO-OFDM) emerges, augmenting a DC bias and Hermitian symmetry to guarantee non-negative real-valued signals. This integration of IM and DD into a cost-effective architecture has gained wide adoption within VLC systems [
6].
Despite the potential of DCO-OFDM-based indoor VLC systems to enable high-speed data transfer and serve as conduits for both lighting and location data, their efficacy hinges on the accuracy of light transmission, rendering them susceptible to environmental variables within indoor spaces, such as fluctuations in light intensity and signal attenuation. The modulated signal invariably undergoes attenuation due to nonlinearities within LEDs and the dispersion of light propagation. This attenuation impairs the orthogonality between subcarriers, collectively undermining system performance and challenging extant paradigms of transceiver design [
7]. Consequently, the accurate modeling of channel state information and a comprehensive understanding of channel characteristics assume paramount importance for optimizing transmission efficiency within VLC links. A multitude of investigations have delved into VLC link performance under fading channels, proffering diverse equalization techniques to mitigate fading and multipath interference [
8].
Channel estimation (CE), a cornerstone of channel equalization at reception, traditionally hinges upon methods like Least Squares (LS) and Minimum Mean Square Error (MMSE). However, LS exhibits susceptibility to noise amplification, and MMSE’s implementation complexity is a hurdle [
9]. Recent endeavors have directed attention toward leveraging deep learning (DL) to harness its robust data learning, recognition, and prediction capabilities in channel state information acquisition. DL emerges as a potent tool for addressing intricate communication challenges presented by novel or demanding channels [
10,
11]. In the realm of VLC, DL has been harnessed in OFDM-based VLC receivers for the channel equalization of received data [
12,
13,
14,
15,
16,
17]. Deep Neural Networks (DNNs) manifest an inherent capability to apprehend and analyze channel attributes, marking them as plausible solutions for CE. Notably, DNNs have been employed to supplant traditional channel estimation methodologies, reducing reliance on pilot symbols and preserving spectrum resources [
10]. Similar approaches, encompassing DNNs [
13,
14], Convolutional Neural Networks (CNNs) [
15], Recurrent Neural Networks (RNNs) [
16,
17], and diverse neural networks (NNs), have demonstrated efficacy as nonlinear equalizers in diverse scenarios to effectively counteract nonlinear impairments. However, extant studies have predominantly necessitated converting input for NNs from complex vectors, after Discrete Fourier Transform (DFT), or fast Fourier transform (
FFT), into real vectors. Moreover, the symbol classifier, integral to preprocessing labels, necessitates preemptive transformation of M-ary quadrature amplitude modulation (M-QAM) constellation points into fixed labels.
The present research introduces an innovative receiver scheme centered on neural network-based waveform equalization. This scheme exhibits the capacity to adeptly acquire the requisite skills for processing received sample signals. It achieves this by adeptly harnessing the channel state information, thereby markedly enhancing the reliability and efficacy of signal transmission. It is worth noting that this approach significantly departs from all preceding receiver methodologies. In this novel paradigm, the neural network undertakes data processing prior to engaging in the FFT procedure. A pivotal aspect of this paradigm is that the input and anticipated output configurations of the neural network retain coherence with those of conventional receivers. This congruence in input–output structure translates to a consequential advantage, allowing for the direct integration of this scheme into orthogonal frequency division multiplexing (OFDM) systems, devoid of supplementary superfluous processing steps. Consequently, the scope of this investigation is distinctly focused on the channel model and the intricate channel state information germane to the visible light communication (VLC) link and explores how to leverage this information to optimize the performance of the DCO-OFDM system.
In this paper, we introduce two existing reception methods and propose a DL-based direct reception method superior to these schemes for application in the DCO-OFDM-VLC system. (1) Traditional method: pilot-based LS and MMSE CE method. (2) DL-based typical receive: nonlinear processing power of DL is used to replace the traditional CE and equalization signal processing after
FFT. (3) DL-based direct receive: this method uses DL to process the received signal, can directly learn the interference of nonlinear devices and channels on the signal, and reduces the complex data processing required in existing scenarios, and thus is different from (2). To obtain a better performance receiver, we compare the performances of two DL receivers: BiLSTM and BiGRU. The performance was evaluated using the bit error rate (BER) [
18]. The feasibility of the direct receiver in a commercial LED-based single-light 4QAM DCO-OFDM-VLC system with a transmission distance of 3 m was verified by changing the transmission data rate and receiving the optical power of the experimental setup.
The rest of this article is structured as follows:
Section 2 describes the VLC system based on DCO-OFDM and the reception method. In
Section 3, we propose a BiGRU-based receiver, while experimental validation and result analyses are presented in
Section 4. Finally, we conclude the full article in
Section 5.
2. VLC System
The architecture of the VLC system based on DCO-OFDM is shown in
Figure 1. Equalization in existing DL-based OFDM receivers mainly uses channel modeling methods, where the trained model replaces the CE and equalization parts.
In contrast, the proposed BiGRU NN-based direct receiver in this study can directly process the oscilloscope-sampled signal to recover the outgoing OFDM-modulated signal. The sampling frequency is set to twice the transmitting signal frequency by adjusting the storage depth, and the channel features carried by the signals in the circular prefix and Hermitian symmetric parts are also fully utilized. To focus on the receiver, forward error correction coding techniques were not used. It is worth noting that the sampling rate of direct sampling of the oscilloscope is much greater than the data bit rate, so it needs to be sampled again before FFT, that is, model 2 is directly trained from oversampled data.
The non-negative signal
x(
n) with DC bias was added, transmitted in the optical path through the LED, captured directly by the PD at the receiver side, and received
y(
n) after the analog-to-digital conversion. The received
y(
n) was directly recovered by BiGRU processing to recover
x(n) and get
x(
n), and the
x(
n) was taken for de-cp processing. Without considering cp, the discrete-time domain OFDM signal
x(
n) can be expressed as
where
X(
k) represents the complex sign of the constellation diagram generated using the fast Fourier transform operation to achieve multiplexing;
,
N is the number of points in the
FFT; and
k is the number of subcarriers. The received signal
y(
n) can be expressed as
where
denotes the circular convolution,
h(
n) represents the channel impulse response, and
w(
n) represents the additive Gaussian white noise (AWGN). The subcarrier division multiplexing achieved with
FFT can be defined as
where
H(
k) represents the channel frequency response;
W(
k) represents the AWGN.
The unified reception was obtained using a CE method, such as LS or MMSE, which correctly enables the channel equalization technique. The LS method is based on parameter estimation that minimizes the square of the difference between the observed data
Hp(
k)
Xp(
k) and the expected value
Yp(
k) at the pilot position
p and loss value
, which is expressed as
where
Y denotes the received signal vector,
X is the transmit signal vector, and
is the estimated value of the channel parameters. The final LS estimation is as follows:
The MMSE estimation is based on the LS estimation with the addition of the weighting matrix
:
where
is the autocorrelation matrix of the
matrix and is obtained from the LS estimation. The final MMSE estimation is
where
is the noise variance,
is the transmit signal vector variance, and
I denotes the unit matrix.
3. Principles of the BiGRU-Based Receiver
3.1. BiGRU
Unlike classification detection based on image processing, the present and past sequential data are interrelated [
16]. An RNN retains data sequences with network-hidden states. RNNs can efficiently process sequential data and learn information from previous data during processing [
19]. Long short-term memory (LSTM) and GRU have improved RNN models by alleviating the problem due to severe gradient disappearance and gradient explosion; GRU is a simple variant of LSTM [
20].
Figure 2 shows the detailed structures of GRU and LSTM units. Unlike in LSTM, GRU combines the forget and input gates into a single update gate. A GRU unit comprises a reset gate
rt and an update gate
Zt [
21]. Under the control of these two gates, the output
ht, determined by the current input
xt and the previous state
ht−1, is calculated as (8)–(11):
where
denotes the weight matrix;
b denotes the bias term;
is the sigmoid transform; and
is the Hadamard product (obtained by the multiplication of corresponding elements of vectors).
For the received OFDM signal, the current data are affected by the forward data and the backward sequence. This study considered a model with a bidirectional structure to learn information better from the preceding and following data.
Figure 3 shows the BiGRU model structure and the proposed architecture of the BiGRU network for implementing the receiver. The BiGRU model was determined based on the state of the two GRUs in the opposite direction [
22].
3.2. BiGRU-Based Receiver
Our proposed acceptance scheme was based on BiGRU. The received DCO-OFDM signal sequence is denoted as r = [r1, r2, …, rT], where the vector ri(i = 1, 2, …, T) corresponds to the i-th bit of the received symbol. The corresponding prediction sequence is denoted as y = [y1, y2, …, yT], and the vector yi(i = 1, 2, …, T) corresponds to the i-th bit of the received symbol.
The first layer was the input layer. Moreover, for each symbol ri(i = 1, 2, …, T), the current symbol ri was wrapped with its k pre-symbols and k post-symbols to form x(i) = [ri−k, …, ri,…, ri+k], which was used as the input sequence of the BiGRU network. The second layer was the BiGRU model layer. This study set the BiGRU model cyclic time step to 2k + 1. The hidden state ht contains the flow of symbolic information between the cyclic time steps. The output of the BiGRU model layer was entirely connected to the fully connected layer. To effectively solve the nonlinear problem, two fully connected layers were used, where the latter contained the same number of nodes as the number of classes that send signals. As this study focuses on the intensity of the de-DC signal, the tanh function was selected for the output layer, and the output was the predicted value corresponding to the current symbol ri. Thus, the predicted value corresponding to the i-th symbol of ri(I = 1, 2, …, T) was obtained.
3.3. Complexity Analysis
The complexity of the algorithm was analyzed and compared with a bidirectional LSTM (BiLSTM) NN-based receiver. The number of parameters and multiplications required for each symbol was noted during the analysis.
The parameters of the proposed BiGRU network include the parameters of the BiGRU and linear layers. Moreover, two fully connected layers were used in this study (
Figure 3). The parameters of the GRU unit (
Figure 2) contained three weight matrices, three deviation vectors for the input, and three deviation vectors for the previous state
ht−1. The size of the weight matrix of input
xt was associated with the size of its input and output features and that of the hidden state
xt. If the size of the input features of input
xt was assumed to be 1 ×
F and that of the hidden state
ht was 1 ×
H, then the weight matrix of input
xt was
F ×
H. Similarly, the size of the input and output of the previous state
ht−1 was the size of the hidden state; thus, the size of its weight matrix was
H ×
H, and that of the bias vector was 1 ×
H. Therefore, the number of parameters for a single-layer GRU can be calculated as
The number of parameters for a linear layer can be calculated as
The number of parameters for the BiGRU network can be calculated as follows:
As the number of multiplications per symbol is dependent on the BiGRU and linear layers and lets the length of the input sequence be
L, then the number of multiplications for the BiGRU layer can be calculated as [
13]
The linear layer was
, so the number of multiplications for the BiGRU network can be calculated as
In contrast, the parameters of the LSTM unit contain four weight matrices and four deviation vectors for the input
xt and four weight matrices and four deviation vectors for the previous state
ht−1 [
23]. The number of parameters for the BiLSTM network can be calculated as
The number of multiplications for the BiLSTM network can be calculated as
4. Results and Discussion
Based on the theoretical study of the VLC system based on OFDM in
Section 2, the experimental platform, as shown in
Figure 4, was built in this paper, and the transmission performance of 4QAM-DCOOFDM on the VLC link of commercial LED was numerically studied, and the feasibility of using the equalizer with deep learning structure for digital compensation of the channel was verified. OFDM signals were generated in MATLAB and transmitted through an arbitrary waveform generator (AWG). A set of
X(
K) lengths was set to 64, corresponding to an
FFT length
N_fft of 2 × 64 + 2, a cyclic prefix
N_cp of 1/4 of the
X(
K) length, and a full OFDM symbol length of
N_fft + N_cp. The bit rate of symbols emitted by AWG was set to 100 mbps, the bias voltage of the LED was set to 6 V, and the drive voltage of the signal was set to 1 V. At the receiving end, a commercial PD was used at a distance of 3 m from the LED to receive the light signal and perform the photoelectric conversion. The signal was then acquired by an oscilloscope and demodulated in MATLAB. The BER of three different receiving methods in
Section 2 under different signal-to-noise ratios (SNR) were compared, and no forward error correction coding technique was used in the experiment. In this paper, we only use the different number of pilots based on traditional receiving methods. The BiGRU network was constructed, trained, and evaluated in TensorFlow 2.6.2. The model used the cross-entropy loss as the loss function. The Adam optimizer optimized the BiGRU network (learning rate set to 0.005). The dataset comprised 250 symbols and 36,500 data sets, divided into training data (80%) and test data (20%). Other related parameters of the experimental platform are introduced in
Table 1.
In order to verify the equalization ability of BiLSTM and BiGRU algorithms for visible light channels, this paper compares the waveforms received at an SNR of 16 dB with the waveforms equalized based on the two algorithms, as shown in
Figure 5. Because of the interference of the actual channel and nonlinear devices, the information characteristics of the directly received waveform are seriously damaged. The waveform of the proposed equalizing receiver scheme is used to correct the interference of the received signal, and its waveform is similar to the original transmitting waveform. This shows that the proposed algorithm has a certain equalization ability for visible-light channels. In order to better evaluate the performance of the proposed method, the Kullback–Leibler (KL) divergence of signals before and after the proposed algorithm is equalized with the original transmission signal and is also estimated [
24]. It is often used to assess the similarity of two distributions. The smaller value indicates how close the two distributions are. KL divergence is calculated as
where
is the actual transmitted signal and
is the different received signals. The Kl divergence results are shown in
Figure 5b, and the KL divergence value between the signal recovered based on the BiLSTM and BiGRU algorithms and the originally transmitted signal is close, which is about 1/5 of the KL divergence between the directly received signal and the original signal. The proposed deep learning network can reduce the KL value, that is, the signal distribution after the algorithm recovery is closer to the distribution of the original transmitted signal, which further shows that both the BiLSTM and BiGRU algorithms have a strong ability to equalize visible-light channel interference. In addition, the KL values of the two algorithms are similar, indicating that BiGRU can achieve a similar equilibrium effect to BiLSTM.
Figure 6 shows the recovery of the received OFDM signal by BiGRU and BiLSTM. With an SNR of 16 dB, the waveform of the signals processed by the two receivers is compared to the OFDM transmitted signal. Both algorithms can better recover the original transmitted signal data. We calculated the mean squared error between the recovered signal and the original transmitted signal based on these two algorithms, with a result of 0.000142 for BiGRU and 0.000179 for BiLSTM. The results verify that both algorithms can learn the feature pair of the channel and the received signal equalization by learning the long-term dependent information.
Table 2 presents a comprehensive comparison of the practical intricacies involved in processing OFDM signals using both the BiGRU and BiLSTM architectures. All computations were conducted on the same computing platform. A salient observation is that the nonlinear equalizer is initially trained offline and, during the equalization process, the parameters within the nonlinear equalizer remain fixed. This strategic reduction in the parameter count contributes to a reduction in testing time and computational overhead. Based on the experimental findings, both algorithms exhibit comparable channel learning and equilibrium capabilities. However, it is noteworthy that the parameter count of the BiGRU model is 19.69% lower than that of the BiLSTM counterpart. BiGRU also necessitates 22.34% fewer multiplicative operations per symbol in comparison to BiLSTM. This discrepancy in computational requirements can be attributed to the streamlined architecture of the GRU within BiGRU when compared to the LSTM gate in BiLSTM. These empirical results lend further credence to the structural analysis of the two algorithms, as expounded upon in
Section 3.
Furthermore, this study delves into a comparison of the BER across three distinct reception methods, elucidated in
Section 2, under varying SNRs. Of particular significance is the fact that all DL-based reception methods employ pilots set to a value of 0. The outcomes are visually depicted in
Figure 7. Notably,
Figure 7a highlights the elevated BER exhibited by the conventional receiver relying on LS and MMSE channel estimations due to the absence of pilot signals. The insufficiency of channel information and the corresponding incapacity to discern channel characteristics account for this higher BER. While
Figure 7c displays a marginal enhancement in the BER of LS and MMSE when compared to
Figure 7b, the impact of augmenting the pilot count from 16 to 32 does not manifest a significant improvement. This pattern implies a threshold beyond which the inclusion of additional pilots ceases to appreciably enhance system performance. Strikingly,
Figure 7c underscores the exceptional performance of the MMSE estimator-based receiver following the addition of 32 pilots. This outcome underscores the pivotal role of pilot signals in enhancing the performance of this receiver type, which now exhibits parity with conventional DL-based receivers.
In the overarching analysis,
Figure 7 delineates the performance hierarchy: traditional LS channel estimation-based receivers exhibit the poorest performance, while direct receivers surpass typical DL-based and traditional multi-pilot receivers. Especially within the signal-to-noise ratio range of 0 dB to 20 dB, the direct receiver consistently outperforms its counterparts, especially beyond an SNR of 8 dB. It is worth highlighting that the BER of the MMSE receiver, with 32 pilots, closely approximates that of a conventional DL-based receiver devoid of pilots, yet it remains higher than the BER of the pilot-less direct receiver presented in this study. Noteworthy is the observation that although contemporary research predominantly centers on DL-based receivers, network models employed for post-
FFT data processing exhibit commendable error rate performance even without pilot signals, even surpassing the performance of the MMSE estimator with 32 pilots. However, the proposed receiver method attains the most favorable error rate performance. The deep learning-based receiver, even without pilot signals, consistently demonstrates the optimal error rate performance. This remarkable outcome can be attributed to the efficacy of both the BiLSTM and BiGRU models in mitigating the effects of signal nonlinearity to a certain extent. These deep learning models can be pre-trained offline and promptly fine-tuned without inordinate computational demands. By contrast, the deployment of MMSE estimators necessitates temporal and resource investments. The proposed approach of training deep learning network models within the direct receiver configuration, pre-
FFT, to glean channel characteristics, surpasses the typical post-equalization strategies that encompass deep learning models trained post-
FFT in terms of BER performance. This superiority emanates from the utilization of oversampled data, acquired through oscilloscope sampling, directly in the pre-
FFT trained model, leading to heightened learning efficacy and enhanced channel feature comprehension.