1. Introduction
During geophysical exploration, recorded data are intertwined with random noise which covers the data and leads to misinterpretation. In the scientific community, there are a number of predominant mathematical models used to attenuate high-frequency as well as low-frequency noises. They are particularized by having some benefits and some shortcomings. The main sources of random noise during seismic data recording are instruments, wind motion, environmental waves, etc. Similarly, surface waves, direct waves, and ghost waves are considered coherent noise [
1]. Based on the nature of contaminated noise, numerous attenuation approaches have been proposed and deployed.
Some of the noise reduction methods traditionally used are based on filtering.
deconvolution denoising [
2],
domain predictive filtering [
3], Wiener filtering [
4], and Kalman filtering were designated to smooth the signals in the frequency domain. Other approaches include using transform domains such as Fourier transform and wavelet-transform-based applications including seismic denoising [
5,
6,
7,
8,
9], curvelet transform [
10], contourlet transform [
11], and Shearlet transform [
12]. Spectral-decomposition-based methods such as the empirical mode of decomposition (EMD) [
13], variational mode of decomposition (VMD) [
14,
15,
16], and geometrical mode of decomposition (GMD) [
17,
18] contribute to removing noise from one- and two-dimensional seismic data.
These days, many seismic exploration and analysis techniques are designated based on artificial intelligence [
19,
20]. The convolutional neural network (CNN) of the deep learning algorithm shows a strong ability in the field of computer vision and image/signal processing. The characteristic learning efficiency achieved by the CNN when learning images is highly remarkable [
21]. The promising result of the successful mapping between noise-free and contaminated data during the training process helps to restore the original signal. Therefore, CNN-based seismic noise attenuation [
22] is becoming much more significant. Image denoising and inpainting with deep neural networks [
23], hyperspectral denoising via adversarial learning [
24,
25,
26,
27], Gaussian noise removal [
28], and especially CNNs for seismic data denoising approaches [
29] play a crucial role. CNN-based image-denoising methods have been quite successful; however, they are confined by some limitations. These models typically have a fixed architecture and hence cannot adapt well to various noise levels or types. They might struggle with noise patterns that they were not explicitly trained on. Training and running deep CNN models [
30] can be computationally intensive, making them less suitable for real-time applications or resource-constrained devices. Sometimes, over-smoothing images results in a loss of fine details and textures while reducing noise [
31,
32,
33]. This trade-off between noise reduction and detail preservation can be challenging to balance. Existing methods can introduce new artifacts or errors into the denoised image, especially when dealing with highly noisy data. Since the convolution kernel is content independent, it cannot represent and restore the different data regions [
34,
35,
36,
37]. Additionally, the kernel is created as a small patch, which enables the extraction of local features and neglects the global information. Another architecture called the U-Net [
38,
39] is utilized for biomedical image processing and other image-processing tasks. A residual network [
40] that is used to address the degradation problem and a dense network [
41,
42] to reuse the information map from each layer as input within the network, which helps more precise extraction. Such types of networks are more suitable for seismic data [
43,
44] as they are computationally efficient and trainable with small data sets and can be trained end to end.
In this paper, we propose dense and residual DARE U-Nets, which are variations of the traditional U-Net architecture designed to improve performance in seismic data denoising. In the dense U-Nets, each layer is connected to every other layer in a feed-forward manner. This means that the output of each layer is fed as an input to all subsequent layers. This dense connectivity helps information flow across different levels of abstraction, allowing for better feature reuse and gradient propagation. Additionally, we implement local residual connections between layers within the encoder layer of the network, like in the residual network, which allows earlier layers to directly connect with deeper layers. The flow of information from preceding to succeeding layers is made more efficient by allowing the network to utilize filtered and unfiltered input. It enables the reuse of features learned in earlier layers, which is beneficial for tasks where low-level features are relevant throughout the network. This promotes the flow of information from input to output more efficiently by allowing the network to bypass certain layers if needed. Also, it enables the reuse of features learned in earlier layers, which is beneficial for tasks where low-level features are relevant throughout the network. These combined network connections allow the model to learn dense–residual functions, capturing the difference between the input and the desired output. This simplifies the learning process by improving gradient flow, enabling better feature reuse, and simplifying the optimization of deep architects, especially in deeper networks, and by focusing on learning the residual details.
2. Methodology
Seismic noise attenuation aims to reconstruct a clean image
x from noisy image
y. The equation is formulated as
where
is the observed noisy data,
is the noise-free original data, and
represents the amount of random noise added.
The parametric function
can be used to restore
x.
where
represents the estimated signal,
M is the mapping relation, and
denotes the network parameters. The weight
and bias
k are eligible for modification. Since noisy data
y have important features and information about the noise-free data
x, assume a parametric mapping
such that
Now, the noise attenuation parametric model based on residual learning is
We can solve the following optimization problem to estimate the parameters:
where
is a set of training data and
denotes the loss function. Equation (
4) is the combination of the fidelity term and regularization term, while
controls the trade-off between them.
The loss function is defined by
and the ADAM algorithm is used to minimize the objective function. ADAM upgrades the network weight significantly and is different from stochastic gradient descent.
The architecture of the proposed DARE U-Net is shown in
Figure 1. The network consists of an encoder, bridge, and decoder. The encoder, also called the contractive path, contracts the input, reducing the spatial resolution; however, it captures the contextual information. It follows the typical architecture from the convolutional neural network, consisting of repeated convolution with ReLU activation, a (3, 3) kernel, and, finally, a max-pooling operation in a single layer. The first layer of the encoder allows the input image to pass through the first convolutional layer with 64 filters of size (3, 3) followed by a ReLU activation.
For any signal
y, the ReLU is
It introduces the non-linearity in the network, and the gradient problem vanishes. The ReLU function output will be the same if the input is positive and zero if the model input is negative.
The output from the first convolution is then passed through another convolutional layer with 64 filter sizes (3, 3) followed by a ReLU activation. The output from the second convolution is used as a residual connection and is concatenated as input in the same layer as the decoder. The result is then passed through a max-pooling layer with a kernel size of (2, 2) and stride of (2, 2), which down-samples the feature maps. This process is repeated three times consecutively for three layers in the encoder with the number of filters doubling at each layer (64, 128, 256, 512). Thus, in a subsequent layer of an encoder, as in the first layer, the following operation occurs:
Conv2D (128 filters) → Conv2D (128 filters) → Addition → Max Pooling;
Conv2D (256 filters) → Conv2D (256 filters) → Addition → Max Pooling;
Conv2D (512 filters) → Conv2D (512 filters) → Dropout (0.5).
Figure 2,
Figure 3 and
Figure 4 are representations of skip connections, the local residual connections within each layer of the encoder, and the residual–dense block architecture.
In this way, the input image passes through these layers, and the spatial dimensions are reduced while the feature channels are increased. Through this process, the network captures the context and semantic information at various scales, producing a feature map of size (32, 32, 512).
Similarly, the decoder, also called the expansive path, of the network includes the up-sampling process in order to increase the spatial dimensions of the feature maps. It expands the encoded data, either as a bridge or a latent representation of the input, up-sampling the feature maps, maintaining the spatial resolution of the input, and using the contextual representation to generate data. In the first layer of the decoder, the output from the bridge is up-sampled using a transposed convolutional layer (UpConv2D) with 256 filters and a kernel size of (2, 2). The output is then concatenated (depth-wise) with the corresponding feature maps from the encoder using the residual connection. The bridge is the latent representation of the learned input. This is the basis whereby the decoder is able to decode the output. The latent representation also acts as a bridge between the encoder and decoder such that the learned representation of the input is transferred to the decoder through it; hence, it is called the bridge.
The concatenated output is then passed through two convolutional layers with 256 filters of size (3, 3), followed by ReLU activations. This process is repeated three times consecutively for three layers of the decoder with the number of filters halving at each layer (256, 128, 64).
Each layer in the expansive path consists of convolution with up-sampling, which increases the spatial dimension of the feature maps. Finally, the up-sampled feature maps are concatenated with the corresponding feature maps from the contracting path via residual connections. The final output is produced by a convolutional layer with a single filter of size (1, 1), followed by a linear activation. The residual connections from the encoder help to preserve the spatial information loss during the contraction process so that the decoder can locate the features more accurately by retaining the high-resolution features in front of the encoder, enabling precise location.
In the dense U-Net, each layer is connected to every other layer in a feed-forward manner. This implies that the result of each layer is fed as input to all subsequent layers. This dense connectivity helps information flow across different levels of abstraction, allowing for better feature reuse and gradient propagation. These connections also help to alleviate the vanishing gradient problem, enabling deeper networks to be trained effectively and leading to an increase in the number of parameters compared to the standard U-Net. Residual U-Nets incorporate residual connections, inspired by ResNet architectures. These connections allow the model to learn residual functions, capturing the difference between the input and the desired output. This simplifies the learning process, especially in deeper networks, by focusing on learning the residual details. These connections also help in preventing the vanishing gradient problem and enable the training of very deep networks more effectively. By focusing on learning residuals, they often require fewer parameters compared to traditional U-Nets while achieving similar or better performance.
2.1. Structural Similarity Index Measure and PSNR
The performance of the proposed architecture and existing models can be measured using the structural similarity index measure (SSIM). We calculate three components related to structure, luminance, and contrast changes in noisy and denoised images. This helps us to analyze and compare the performance of different models, including estimating the amount of signal loss.
Let
A and
B be images and
and
be the mean intensity. The luminance distortion component is given by
where
is the regularization constant.
Similarly, the contrast distortion component is given by
where
and
are the standard deviation and
is the regularization constant.
Finally, structural distortion is given by
where
is the covariance and
is a regularization constant.
Now, the structural similarity index measure is given by
where
,
, and
are weight coefficients. In practice, we assume
and
for simplicity.
SSIM values range between 0 and 1. If the values are closer to 1, it indicates good image restoration quality and it is more similar.
2.2. Peak Signal-to-Noise Ratio
Peak signal-to-noise ratio (
PSNR) is the ratio between the maximum possible power of an image and the power of the noise that degrades the quality of that image. Mathematically, it can be expressed as
The term ‘MAX’ is the maximum possible pixel value of the image, and ‘MSE’ represents the mean squared error between two images. It is measured in decibels and higher PSNR values indicate better quality. PSNR is specifically designed to compare the denoised image to the original image. This direct comparison makes it a more suitable measure for denoising tasks. PSNR is a normalized measure that accounts for the dynamic range of pixel values. This helps in providing a more consistent evaluation across different images or image formats. In contrast, the SNR might not account for the range and could lead to misleading results when comparing different types of images. PSNR is often more sensitive to small changes in image quality, such as those introduced by compression artifacts or noise reduction techniques.
3. Experiments
We carry out a series of experiments for different seismic data sets and calculate the peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM).
Figure 5 is an example of a training data set. Noise-free training sample data (
Figure 5a) and noisy training data (
Figure 5b) can be converted into the desirable size within the network. For this experiment, we use generated seismic images. If we have to denoise massive seismic data in reality, a patch-based method can be used. Dealing with a whole seismic volume is not practical; therefore, we split it into many patches and process each patch.
The four sets of noise-free seismic data are shown in the first column of
Figure 6 and their noisy versions are shown in the second column. The PSNR value for the noisy data is 23 dB. The experimental PSNR and SSIM results obtained from all four data sets are noted. First, seismic data are created with the size
with three linear events. Most of the algorithms show better results for data with linear events, but real-time recorded data consist of more complex geometric patterns and a curve-like nature. Hence, we carry out an experiment on synthetic data with two linear events and a curve event in our second experiment, as shown in the second row of
Figure 6. The third set of data is a magnified version of seismic data with a single linear event, which helps to test artifacts visibility clearly. The fourth one is real-time recorded seismic data with weak events and non-regular seismic features. The first column is a noise-free set; the second column is a noisy version. The third, fourth, and fifth columns are the results obtained by the wavelet transform, the U-Net, and the proposed model, respectively. We use 700 images for training, 100 images for validation, and 100 images for testing. During the training stage, N = 60 and 780 patches with size
are created with 128 strides. The optimal strategies, such as using the ADAM optimizer, are tested and confirmed to have better results. The total number of epochs is 80, and all experiments are carried out on an NVIDIA GTX1080Ti which consumes a training time of around 240 min.
To verify the effectiveness of the proposed method, we perform other experiments on seismic shot gather data, as shown in
Figure 7. The data set is visible as continuous, coherent patterns across multiple traces, and continuous horizontal or near-horizontal bands indicate reflection events from subsurface layers. PSNR and SSIM values given by wavelet transform, the U-Net, and the proposed model are calculated.
Additionally, to verify the performance of different algorithms, the FK spectrum is sketched and analyzed. It helps to capture the frequency information and spatial directionality at different noise statuses by converting data from the time–space domain to the frequency–wavenumber domain.
Figure 8 demonstrates the FK spectrum of the results obtained by the three methods. The horizontal axis indicates the normalized wavenumber, and the vertical axis indicates the frequency. Noise-free data and noisy data are represented in
Figure 8a and
Figure 8b, respectively. A noise mask can be clearly observed at low frequencies of 0–20 Hz, and the region around 30–50 Hz is masked completely in
Figure 8b.
Figure 8c is the spectrum obtained after wavelet denoising, in which a lot of noise is seen at low frequencies (0–20 Hz) and little information is lost at frequencies in the range of 30–50 Hz, which indicates that this method cannot stop the loss of curve signals during noise removal. Compared to these methods (
Figure 8c,d), the proposed model removes most noise and preserves the signals significantly, as shown in
Figure 8e. We observe noise of low frequency in the FK spectrum, which indicates that there is still noise in the obtained data set, and the loss of the high-frequency part means that some useful information is lost.
4. Discussion
The main purpose of attenuating the seismic data is to enhance the useful information by removing unwanted frequencies. Balancing the noise removal and preservation of the weak seismic features is very important during the process. Hence, we consciously focus on the resolution of image, peak signal-to-noise ratio, and similarity structure index measure. To preserve the edges is also equivalently important. Four sets of seismic data are shown in the first column of
Figure 6, and their noisy versions are shown in the second column. The PSNR value for the noisy data is 23 dB. The experimental PSNR results obtained from the first data set by wavelet, U-Net, and DARE U-Net methods are 28 dB, 33 dB, and 36 dB, respectively. The resulting values from the second data set are 27.4 dB, 31 dB, and 35.5 dB, obtained using the three different methods. Similarly, we obtain 30.7 dB, 35.3 dB, and 37.9 dB and 29.5 dB, 33.5 dB, and 37.5 dB from the third and fourth data sets by applying wavelet, U-Net, and DARE U-Net methods, respectively.
The fourth column of
Figure 6 is the denoised result given by U-Net. The quality of the data is significantly improved as the PSNR values are quite a lot higher. The fifth column represents the denoised results of the different models of seismic data obtained by DARE U-Net. The resolution and most of the original features are restored while applying the proposed architecture. Comparatively, the proposed method has a higher PSNR, indicating a better denoising result and restoration. The details of the PSNR numerical results are shown in
Table 1.
Similarly, we compare the structural similarity index measure (SSIM) between the denoised results and the original image. This measurement assists in estimating the amount of signal loss and restoration of original features. The SSIM values lie between 0 and 1. The SSIM values corresponding to
Figure 6 show the strength of the proposed model.
The SSIM results given by the wavelet, U-Net, and DARE U-Net methods on the first data set are 0.841, 0.891, and 0.951, respectively. The second seismic data set has SSIM values of 0.831, 0.861, and 0.901 for the three different methods. The results from the third data set are 0.863, 0.898, and 0.925. The SSIM values from the fourth data set obtained by wavelet, U-Net, and DARE U-Net methods are 0.835, 0.872, and 0.907, respectively. Details of the numerical results are shown in
Table 2. Since an SSIM value closer to 1 means the two images are more similar and have better results, these values show that the proposed model restores the image with minimal loss of information, and features are well preserved.
Similarly, we attenuate noise from the seismic shot gather data, as shown in
Figure 7. The noise-free data (
Figure 7a) and their noisy version with a 22 dB PSNR are represented by
Figure 7b. The wavelet result (
Figure 7c), U-Net result (
Figure 7d), and DARE U-Net result (
Figure 7e) have PSNR values of 27.5 dB, 31 dB, and 33.5 dB, respectively. The image resolution given by the proposed model is high compared to wavelet and U-Net methods, suppressing the noise significantly. Structural similarity index measures of 0.841, 0.885, and 0.912 are achieved by wavelet, U-Net, and DARE U-Net methods, respectively. The details are mentioned in
Table 3. Finally, we apply the proposed model to the post-stack real seismic data set, which consists of 250 seismic traces and has a size of
, as shown in
Figure 9a. The real data are composed of some weak features and have a complex nature. Usually, we use traditional methods to obtain denoised labels. However, this is not ideal. For a special case, the data may only contain noise in part of an area, and then the data in another area can be used as labels. Since we aim to validate the effectiveness of the proposed model, the blurriness is increased in
Figure 9b by adding some arbitrary noise. The seismic features around the 400–600 ms section are less visible, and a few weak features around 800 ms disappear due to noise.
Figure 9c represents the result obtained by the wavelet method in which a few noise versions still appear and some weak horizontal events are lost or broken.
Figure 9d,e show the noise-free outcomes achieved using U-Net and DARE U-Net methods. It can be clearly seen that the result obtained by the proposed model have a high resolution, and masked seismic features are preserved and recovered successfully. The residual parts (
Figure 10) are also collected to verify the results. In the noise section removed by the wavelet method, shown in
Figure 10a, some seismic events are seen, which means information is not well preserved and is lost.
Figure 10b,c indicate the residual of U-Net and DARE U-Net outcomes, showing no horizontal lines or very few lines, which indicates that useful pieces of information are well preserved.