1. Introduction
Over the past decade, 3D-NAND flash memory technology has rapidly enhanced the bit storage density per unit area by increasing the number of vertically stacked gate layers. In recent years, the semiconductor industry has witnessed a significant increase in demand for high-capacity storage devices, such as 3D-NAND flash memory, driven by the rapid growth of various sectors, including cloud services and mobile devices. The latest generation of 3D-NAND features more than 200 layers of vertical gate stacks. To manufacture such a highly integrated 3D-NAND flash memory, high-aspect-ratio (HAR) etching technology is essential, enabling the precise etching of numerous layers, ranging in the several hundreds [
1,
2]. In these structures, accurately controlling the etching depth and precisely detecting the etching endpoint for each layer has become increasingly important. Consequently, reliable etch endpoint detection (EPD) techniques are crucial. EPD plays a vital role in determining the yield and quality of the etching process, and its significance continues to grow [
3].
Accurate etch endpoint detection plays a crucial role in improving manufacturing process yield and reducing process failure rates. In the plasma etching process, the intensity of the wavelength corresponding to the thin-film layer to be etched is monitored, and the change in intensity provides a real-time view of the etching progress. The principle of EPD is that when the etched material is fully reacted, and no more by-products are produced, a change in the intensity of the corresponding wavelengths is detected. Various EPD techniques have been developed in recent years [
4,
5]. However, several limitations still exist. Traditional EPD algorithms have primarily relied on signal changes at a single wavelength [
6,
7,
8]. Such single-wavelength-based EPD methods are vulnerable to noise and signal fluctuations, which can lead to a low signal-to-noise ratio [
9,
10]. This may pose a challenge to the precision of endpoint detection.
While recent studies have focused on improving EPD performance by employing multiple wavelengths [
11,
12], most EPD algorithms still rely on signal patterns at specific time points for decision-making. This reliance on instantaneous signal patterns makes it challenging to predict the EPD moment in advance or accurately detect the endpoint, as the algorithms mainly focus on the signal change at the exact EPD time. On the other hand, the optical emission spectroscopy (OES) time series data measured during the entire etching process contains information spanning from the beginning to the end of the etching procedure. However, existing EPD algorithms often struggle to fully account for the temporal dependencies embedded within these time series data, lacking consideration for the comprehensive flow and context of the data. As a result, there is a pressing need for data analysis and decision-making approaches that effectively incorporate these long-term temporal dependencies.
In recent years, efforts have been made to address these issues by leveraging deep learning techniques [
13,
14,
15,
16]. Kim et al. utilized a convolutional neural network (CNN) based on OES data to detect the etch endpoint [
17], but it has a limitation in directly modeling the temporal dependencies of time series data, as the authors transformed the OES data into a two-dimensional format to be used as input for the CNN. Hwang et al. proposed a method for detecting anomalies in time series data from semiconductor manufacturing processes using a long short-term memory (LSTM) autoencoder [
18]. This suggests that the LSTM can be effectively utilized for analyzing semiconductor process data, but there is still a lack of direct application to EPD. Existing EPD studies have primarily focused on endpoint detection itself, with a tendency to rely on signal patterns at specific time points. The LSTM is a type of recurrent neural network (RNN) designed to address the long-term dependency problem [
19,
20,
21,
22]. The LSTM can effectively learn long-term patterns in time series data through its cell state and multiple gates [
23,
24,
25]. These characteristics of the LSTM make it suitable for analyzing time series data, such as the intensity values of wavelengths obtained through OES sensors.
In this study, we propose an EPD method that considers the entire flow and context of time series data using the LSTM. Through the LSTM, we can comprehensively analyze the change patterns of OES signals from the past to the present, considering the temporal dependencies and interactions between multiple wavelengths. The goal is to improve EPD performance by reflecting the context of the entire time series rather than simply looking at the signal at a specific time point. By doing so, we expect to increase accuracy and reduce the false detection rate.
2. Experiment
Experiments were conducted using an in-house modified commercial 300 mm capacitive coupled plasma (CCP) etch system operating at frequencies of 2 MHz, 13.56 MHz, and 100 MHz, as depicted in
Figure 1. A commercial 300 mm high-end manufacturing etch chamber, in-house modified with triple-RF power systems for the purpose of plasma equipment intelligence, located in the Plasma E.I. Convergence Research Center in Korea Institute of Fusion Energy, was employed for the experiment. The triple-RF power systems consist of a 100 MHz VHF source power in the upper electrode and a combination of 13.56 MHz and 2 MHz LF bias power in the lower electrode. The upper electrode utilizes a very-high-frequency (VHF) power source to control the ion flux incident on the substrate, while the lower electrode employs a low-frequency (LF) power source to control the ion energy incident on the substrate. For the etching target, SiO
2/Si
3N
4/Si coupon wafers (50 mm × 50 mm) were etched using a mixture of CF
4, O
2, and Ar gasses according to the recipe provided in
Table 1.
We acquired real-time OES data by connecting an OES sensor to the viewport sidewall of the etch chamber. Using the non-invasive OES sensor, we measured the wavelength and intensity of the light emitted from the plasma. The measurement range spanned from 200 nm to 1100 nm, and the wavelength information was obtained at 100 ms intervals. OES is commonly used to understand the energy state transitions of radicals, reactive ions, and atoms within a plasma chamber [
26]. In this experiment, the emission wavelength intensities of radicals and reactive ions in the plasma were measured and identified.
When etching SiO
2 using a CF
4/O
2/Ar plasma, the main chemical species generated through chemical and physical reactions are the reaction products in the surface reaction mechanism. The surface reactions can be simplified as
and
. The OES wavelength features selected in this process were the peaks related to the reactants F, O, and C, and the peaks related to the reaction products in the reaction surface mechanism, CO, CN, and SiF. The wavelengths used can be found in
Table 2.
All coupon wafers were etched for a pre-determined fixed time. To ensure the complete etching of the SiO2 layer, over-etching was performed. During this process, it was confirmed through ellipsometry that approximately 200 nm of the Si3N4 layer was etched. This was carried out to verify that the SiO2 layer was fully etched. Therefore, the optical emission signals collected through OES serve as time series data spanning the entire etching process. By using these data as input for the LSTM model, the model can learn long-term dependencies and detect the etch endpoint.
3. LSTM Method
The LSTM is a model designed to address the limitations of the RNN, specifically the problems of vanishing gradients and exploding gradients. The gradient information in the LSTM is more effective in maintaining and utilizing information over longer sequences compared to the RNN [
27].
As shown in
Figure 2, the LSTM consists of three gates: the input gate, forget gate, and output gate. Each gate has the following roles. The input gate determines how to update the internal state based on the current state and the previous hidden state [
28]. It decides what information should be stored in the cell state. First, it determines what information to store from the current input and the previous hidden state through a sigmoid layer. Then, the hyperbolic tangent (tanh) layer generates a new candidate vector. The forget gate determines what information should be forgotten and to what extent, from the previous cell state, by passing it through a sigmoid layer. The output gate determines how much of the cell state should be reflected in the hidden state and decides the hidden state to be passed to the next time step of the LSTM [
28].
The LSTM controls the cell state using three gates. The cell state contains all the core information, and the hidden state is processed whenever necessary to propagate information in a form that exposes only the required information for each time step. The equations for the LSTM are as follows:
In these equations, , , and represent the forget, input, and output gates, respectively. The function denotes the sigmoid function. The matrices , , , and are the weight matrices for the corresponding gates, and the vectors , , , and are the bias vectors for the corresponding gates. represents the hidden state of the previous time step, is the input at the current time step, denotes the candidate cell state, is the cell state from the previous time step, and is the cell state at the current time step. These are all used to update and output the internal state of the LSTM.
- A.
Data preprocessing
In this study, we conducted etching experiments and acquired a total of 10 datasets. Each acquired dataset consisted of approximately 20,000 OES intensity values. The intensity values measured through OES can vary depending on the environment and sensor settings. Therefore, the preprocessing stage is essential to enhance the performance of the LSTM model. In this stage, the OES sensor data with different intensity values were preprocessed by normalizing between 0 and 1 using MinMaxScaler to adjust them to a consistent range. The algorithm proposed for EPD utilized the OES peak intensity data, which effectively reflect the characteristics of the etch endpoint, as shown in
Figure 3, along with the aforementioned wavelengths. The preprocessed data were used as input data to detect the etch endpoint. The labeling results showed that the ratio of class 0 to class 1 was approximately 6:4, indicating a relatively balanced dataset. The dataset was divided into training and validation sets with an 8:2 ratio, which is a commonly used split ratio in machine learning.
- B.
LSTM modeling and results
In this study, to detect the etch endpoint of SiO
2, we designed the LSTM modeling, as shown in
Figure 4, considering the entire flow and context of the time series data. The modeled LSTM consists of three stacked layers, each performing a unique role. The layer configuration was determined through an iterative process, where we repeatedly evaluated performance to identify the optimal structure. The modeled LSTM consists of three stacked layers, each performing a unique role. The first layer contains 64 cells. The second layer includes 32 cells to remember the values from the first layer. In the third layer, a fully connected layer and the rectified linear unit (ReLU) activation function are used to accurately detect the etch endpoint based on the remembered state information. Finally, by using the sigmoid activation function, the output values are restricted between 0 and 1, enabling the identification of the situations before and after the etch endpoint. This configuration provides an optimal balance between performance and training speed during the model design process. The LSTM model structure and parameter values can be found in
Table 3.
The LSTM model training process is illustrated in
Figure 5. During the training of the LSTM model, the wavelength intensity values from
Table 2 were bundled in 0.5 s intervals and input into the model. This allows the LSTM to determine the occurrence of EPD based on the intensity changes over a 0.5 s period, distinguishing between the situations before and after the endpoint. The parameters set during the training process were as follows: the learning rate was 0.001, the number of training iterations was 200, and the batch size was 32. To reduce the training time, the learning process was set to stop if the same loss is obtained for 10 consecutive iterations. With these conditions, the model achieved an accuracy of 97.1% upon completion of the training. To evaluate the generalization ability of the trained model, we performed a 10-fold cross-validation and found that the LSTM model was able to maintain a stable predictive performance, with an average accuracy of 97.1%, even on data not used for training.
As shown in
Figure 6, the performance of the model was evaluated using a 10-fold cross-validation and the loss and accuracy graphs of the model. The loss graph reveals that the training loss decreases rapidly in the initial stages but gradually becomes more gradual. This indicates that the model is being optimized through learning. The validation loss also exhibits a similar trend, and no signs of overfitting were observed. As shown in
Table 4, when evaluating the accuracy using test data, the model achieved a 97.1% accuracy in identifying EPD, with a false detection rate of approximately 1%. Consequently, the LSTM model demonstrated high accuracy by effectively learning the temporal dependencies of the time series data.
The accuracy of the proposed LSTM model was evaluated by comparing the predicted results with the actual results. To compare the performance of the LSTM-based model, support vector machine (SVM), Random Forest, and Adaboost models, which are models that do not consider temporal dependencies, were used as etch endpoint detectors. The SVM classifier demonstrated an accuracy of 93.6%, while RandomForest showed an accuracy of 94.8%. AdaBoost exhibited an accuracy of 92%. Consequently, the proposed LSTM model outperformed the other three models, confirming that a model considering temporal dependencies is superior. These model accuracy results are presented in
Table 5.
On the other hand, deep learning models such as the Bidirectional LSTM (BiLSTM) and vanilla RNN were used to compare the performance of the LSTM model. BiLSTM showed an accuracy of 93.1%, while vanilla RNN demonstrated an accuracy of 96.4%. As a result, the proposed LSTM model achieved the highest performance, and these model accuracy results are presented in
Table 6.
To improve the LSTM model, an attention layer was added to the proposed LSTM model. This was carried out to more effectively capture the intensity change characteristics of the OES data as the etching process progresses, thereby enhancing the performance of the EPD model. The acquired OES data are characterized by a gradual change in intensity as the etching process progresses, with different intensity widths at certain critical points where the intensity changes in unit time. To effectively capture these change features, we used an attention mechanism.
The attention mechanism can focus on points in a sequence of OES data that have different intensity changes. It emphasizes the important parts of the sequence by assigning higher weights to points where the change in intensity represents a change from the previous state. During the learning process of the model, it evaluates the importance of each element of the input sequence and helps to ensure that important information is not lost [
30]. The attention-based LSTM model was designed as shown in
Figure 7. It consists of a basic LSTM layer that processes the input data, followed by an attention layer that evaluates the importance of each sequence. Finally, a fully connected layer is added to receive the output from the attention layer and generate the final predicted values. This ultimately produces an output that identifies the situations before and after the etch endpoint. The structure and parameter values of the attention-based LSTM model can be found in
Table 7.
The training process for the attention-based LSTM model is the same as that of the LSTM model, with the only difference being the addition of the attention layer. When the training of the attention-based LSTM model was completed, it achieved an accuracy of approximately 98.2%. The performance of the model was evaluated using 10-fold cross-validation and the loss and accuracy graphs of the model. The results showed that, similar to the LSTM model, the validation loss exhibited a similar trend, and no signs of overfitting were observed.
As shown in
Table 8, the attention-based LSTM model improved the etch endpoint detection performance by approximately 1% and reduced the false detection rate (F1 score) by approximately 1% compared to the LSTM model.