1. Introduction
Machine learning has made significant inroads into various disciplines, including optics [
1,
2]. For instance, machine learning strategies have been employed to design and optimize mode-locked fiber lasers [
3], enhance optical supercontinuum sources [
4], and predict soliton properties [
5]. Compared to numerical methods like the Split-Step Fourier Method (SSFM) [
6], machine learning approaches offer several advantages. First, high-precision fiber systems often exhibit rapid changes. Although numerous studies have discussed automatic stability control, the discrepancies between computed and actual values lag behind the rapid changes in fiber systems [
7,
8]. Second, numerical methods may not account for the various influences affecting lasers in real environments, and the complexity of simulations and calculations increases with system complexity [
9]. In contrast, data-driven machine learning methods can model actual optical systems and even predict their future states by inputting the corresponding input–output states [
10], a feat that numerical methods like SSFM cannot be achieved.
A state-of-the-art architecture, DeepONet, has been proposed for solving partial differential equations [
11,
12]. The DeepONet architecture efficiently learns and approximates complex nonlinear systems by mapping input functions to output functions [
11,
12]. Its core idea combines traditional neural networks with operator learning, thereby enabling the handling of highly complex dynamic systems. Both operator learning and Physics-Informed Neural Networks (PINNs) leverage neural networks to approximate a mapping [
13]. According to the universal approximation theorem, a feedforward neural network with a single hidden layer, with sufficient width and nonlinear activation functions, can approximate a mapping to any desired accuracy [
11]. The main difference between the two is that PINNs approximate the solution of the differential equation themselves, while operator learning approximates the mapping from the input function of the differential equation to the solution function. In this context, the independent variable in the mapping is the input function of the differential equation, while the dependent variable is the output function.
The advantage of operator learning over traditional PINNs is that when the input function, initial conditions, or boundary conditions of the differential equation change, the neural network does not need retraining [
14]. Regular neural networks approximate the solution of the differential equation itself, causing any slight change in initial conditions to result in a different solution. Conversely, the DeepONet neural network learns the mapping from the input function to the solution, allowing it to manage input functions not included in the training set and derive the corresponding solutions [
14]. As a result, a trained DeepONet exhibits stronger generalization capabilities, theoretically achieving similar results with less data, and does not require retraining when switching to a new simulation object as long as the type of differential equation remains the same. This represents a significant advancement given the high costs associated with training neural networks [
11,
12].
Although studies have shown that various data-driven neural networks, such as Recurrent Neural Networks (RNNs) [
15], Convolutional Neural Networks (CNNs) [
16], and Long Short-Term Memory networks (LSTMs) [
17], perform well in simulating optical systems, they also have drawbacks. These models necessitate specific architectural designs and training for particular conditions, treating the fiber system as a black box and learning the time and frequency domain signals of input–output relationships. Once the fiber configuration changes, the learned input–output relationships also change. Consequently, while these neural networks can handle different lengths and input signals, they cannot adapt to fiber systems with varying parameters [
15,
16,
17].
In this study, we propose an advanced DeepONet model architecture integrated with an attention mechanism for full-field pulse transmission simulation. Research on RNNs, CNNs, and LSTMs suggests their potential; thus, they are incorporated into the Branch network of DeepONet to process the input function values at fixed sensing points. This enables the learning of relationships between input and output signals through multilayer neural networks. The Trunk network handles the locations encoding the output function, which includes the initial pulse parameters such as Full Width at Half Maximum (FWHM) of the spectrum, input peak power, fiber length and propagation distance, and linear and nonlinear model parameters like dispersion and nonlinear coefficients.
The integration of an attention mechanism [
18] aims to enhance the model’s competency in managing complex input–output relationships. By dynamically assigning importance weights to various input features, the attention mechanism effectively captures crucial information within the input signals, thereby improving the model’s learning efficiency and prediction accuracy. In the Branch network, the attention mechanism highlights essential features within the input signals, enhancing the comprehension of multi-level input information. In the Trunk network, it ensures the precise handling of the relationship between output location encoding and the corresponding initial pulse and transmission parameters, bolstering the overall simulation accuracy.
The integrated attention mechanism within the DeepONet model not only excels in solving complex nonlinear fiber transmission problems but also enhances the overall accuracy and robustness of full-field pulse transmission simulation by emphasizing critical input features.
By training the DeepONet model architecture on the NLSE equation, we observed several advantages:
Extremely fast generation speed: Since the simulation process eliminates the need for iterative calculations, only a single computation is required regardless of the parameters and initial conditions. The pulse generation speed, recorded at approximately 0.0014 s (GPU: RTX 4090, PyTorch V2.3.1, CUDA V12.1), is significantly faster than the step-by-step calculations of SSFM and previous iterative neural network models for long-distance transmission.
Enhanced generalization ability and high accuracy: Compared to RNN, CNN, and LSTM models [
15,
16,
17], DeepONet demonstrates superior generalization ability without a substantial loss in accuracy. The training results showcase its high stability and strong generalization capability.
Flexible input pulse processing: The Branch network can handle dynamic input pulse lengths without the need to truncate pulses to reduce complexity. Pulse length only affects the uniformity of input pulse accuracy without losing critical data.
2. Materials and Methods
2.1. Data Generation via SSFM
To verify and evaluate the performance of DeepONet in modeling the nonlinear dynamics of fiber propagation, we generate a series of high-quality simulation data. In this study, we utilize the Split-Step Fourier Method (SSFM) to generate these data [
6]. SSFM is a widely used numerical method for solving the Nonlinear Schrödinger Equation (NLSE) by iteratively simulating the propagation of optical pulses in fibers through both the time and frequency domains [
19].
The NLSE describes the evolution of the optical field within the cavity and can be written as follows:
where
A denotes the pulse envelope, which describes the complex envelope of the optical pulse propagating in the fiber. Here,
Z is the propagation distance,
is the linear loss coefficient representing absorption or scattering loss, and
g is the gain coefficient depicting the gain due to stimulated emission in a fiber amplifier. The term
is the second-order dispersion coefficient associated with group velocity dispersion (GVD) effects on pulse propagation, while
is the third-order dispersion coefficient affecting pulse asymmetry. The parameter
represents the gain bandwidth, equivalent to the width of the gain spectrum. The nonlinear coefficient
accounts for nonlinear effects such as self-phase modulation (SPM) and cross-phase modulation (XPM).
is the carrier frequency, representing the central frequency of the optical pulse. The term
is the Raman response time, typically around 3 fs for silica fibers.
The left-hand side of the equation describes changes in the pulse envelope along the propagation distance, including linear loss and gain effects, second-order dispersion, third-order dispersion, and gain bandwidth effects. The right-hand side captures the effects of SPM, self-steepening which is represented by the term with , and Raman scattering which is represented by the last term with .
Consider a traveling wave solution of the form , where (with v being the group velocity). This transforms the partial differential equation (PDE) into an ordinary differential equation (ODE) with respect to .
Using the variable transformation
the derivatives are transformed, as follows:
Substituting these expressions into the original equation, we obtain
Thus, the equation is transformed from a PDE to an ODE with respect to the variable
. Our objective is to learn an operator
that maps an input function
to its output values
. After transforming the NLSE to its ODE form, the target operator can be simplified as follows:
2.2. Model Architecture
Figure 1 illustrates the proposed deep learning model framework for simulating nonlinear fiber pulse propagation. The dataset is uniformly sampled at 1024 points for both input and output signals, with real and imaginary parts fed into the branch network. The input signal, determined by different fiber parameters and peak power, is normalized and processed in the trunk network.
BranchNet utilizes convolutional layers followed by residual blocks to extract hierarchical features from the input pulse, effectively capturing the complex dynamics of the signal. After feature extraction, an adaptive average pooling layer reduces the feature dimensions, and a fully connected layer maps these features into a compact representation. Specifically, BranchNet is designed with a consistent network width that is proportional to the input size, ensuring that the feature extraction process is both deep and wide enough to handle the complexity of the pulse dynamics.
TrunkNet incorporates a self-attention mechanism to process the normalized fiber parameters, enhancing the model’s ability to capture interactions between different parameters. TrunkNet’s width is aligned with that of BranchNet, with both networks maintaining a proportional relationship between their layers to ensure balanced learning and feature integration. The resulting features are further refined through fully connected layers, progressively capturing the intricate relationships between the input parameters.
Finally, the outputs from both BranchNet and TrunkNet are multiplied and passed through a multilayer fully connected network, which generates the final output pulse. This final network layer is also designed with a width proportional to the preceding layers, ensuring that the combined features are adequately processed. This architecture leverages the strengths of both convolutional feature extraction and attention mechanisms, along with carefully calibrated network widths, to achieve high accuracy in simulating fiber pulse dynamics, offering a significant improvement over traditional methods.
To ensure model accuracy, we employed a mean squared error (MSE) loss function, defined as
where
N is the number of samples,
represents the actual values, and
represents the predicted values. The model was trained using the RMSprop optimizer, with a cosine annealing scheduler to adjust the learning rate dynamically. An early stopping mechanism was also implemented to prevent overfitting. The dataset was split into training and validation sets in a 9:1 ratio, and model generalization was verified on an unseen dataset.
This model’s unique architecture, particularly its use of dynamically adjustable network components and efficient training strategies, demonstrates significant improvements in both accuracy and computational efficiency over conventional methods.
The structure of the input–output data for the model is outlined as follows:
Trunk Network
The trunk network encodes the location of the output function. The input to the trunk network is the position T, corresponding to the output location and its associated parameters, effectively representing the high-dimensional location information of the equation. The output of the trunk network is a set of feature representations related to the input location. Testing by inputting only the variable among the model parameters can effectively improve its accuracy and training speed.
Input:
Output:
Branch Network
The branch network processes the values of the input function at fixed sensor points. The input to the branch network is the function value at the sensor points . In this specific task, the input corresponds to the state of the optical pulse at a specific time or location. The output of the branch network is a set of feature representations related to the input function values.
Input:
Output:
Output Combination
The output feature representations from the branch network
and trunk network
are combined through dot product or other nonlinear methods to obtain the final operator value
. Given the complex nonlinear relationship between fiber parameters and outputs, we found that using a feedforward neural network (FNN) yields the best results.
Finally, the Split-Step Fourier Method (SSFM) algorithm generates the dataset structure with 1024 sampled points by default.
3. Results
3.1. Prediction Results of High-Order Soliton Compression
To train and test the capability of the neural network model in predicting nonlinear dynamics, we first simulated high-order soliton (HOS) compression under three random variable conditions. In these simulations, the fixed parameters included a step size of 0.13 cm for the Nonlinear Schrödinger Equation (NLSE) and a time window size of 10 ps. The variable parameters were as follows: pulse width (Full Width at Half Maximum, FWHM) ranging from 0.5 to 1.4 ps, input peak power ranging from 18 to 34 watts, and fiber length ranging from 0 to 20 m with a step size of 0.2 cm. The fixed parameters were as follows: Dispersion Coefficients ps2/km and ps3/km, and Nonlinear Parameter W−1m−1. The Branch Network processes the initial pulse input, which comprises 2 channels (representing real and imaginary parts) with 1024 points each. It employs a convolutional layer that expands the 2 input channels to 256 channels, followed by two residual blocks, each maintaining 256 channels. The output undergoes global average pooling, and a final fully connected layer produces a 256-unit representation of the input pulse features.
The Trunk Network is designed to process the single fiber parameter. It consists of a simple feedforward network with two fully connected layers, each containing 256 units and using ReLU activations. This network effectively captures the influence of the single parameter on pulse propagation.
The outputs from the Branch and Trunk Networks are combined through element-wise multiplication, resulting in a 256-dimensional vector. This combined representation is then passed through a multilayer perceptron with one hidden layer of 512 units, followed by an output layer that produces 2048 units. These output units are reshaped to represent the real and imaginary parts of the propagated pulse (2 × 1024).
For our dataset, we generated 2000 samples by varying the chosen fiber parameter within its physically relevant range while keeping other potential variables constant. This dataset was split into 1800 training sets and 200 test sets, providing a robust basis for model training and evaluation.
Figure 2a illustrates the temporal intensity evolution, and
Figure 2b depicts the spectral intensity evolution of HOS propagation dynamics for a pulse width of 1.0 ps and an input peak power of 30 watts. As shown in
Figure 3, the pulse propagation predicted by the neural network closely matches the pulse propagation simulated by the NLSE in both temporal and spectral intensity evolution. The narrowest pulse in the temporal intensity evolution corresponds to the broadest spectral width, indicating that the predicted maximum compression distance is accurate.
Figure 2c provides a clearer depiction of the time-domain evolution as a function of propagation distance under these conditions.
To facilitate a clearer visual comparison, we randomly selected pulses with pulse widths (FWHM) between 0.5 and 1.4 ps and input peak powers ranging from 18 to 34 watts for plotting. As illustrated in
Figure 3, the corresponding parameters are detailed in
Table 1. The substantial overlap of the full-time intensity pulses indicates the model’s robust generalization ability and accuracy across various parameter variations.
3.2. Results of Simulating Three Types of Fibers Using a Single DeepONet Model
As previously mentioned, the DeepONet architecture exhibits superior generalization capabilities compared to other models. To demonstrate this, we employed a single model to simultaneously simulate three distinct types of fibers: Normal Dispersion Fiber (NDF), High Nonlinearity Fiber (HNLF), and Standard Single-Mode Fiber (SMF). The trained model encompasses the parameter ranges of these three different fibers, as shown in
Table 2.
In the DeepONet model, both the Trunk and Branch Networks utilize a primary hidden size of 512 units. The Branch Network processes the initial pulse input, which has a size of 1024 points and 2 channels (representing the real and imaginary parts of the pulse). It starts with an initial convolutional layer that expands the 2 input channels to 512 channels using a kernel size of 7. This is followed by three residual blocks, each maintaining 512 channels, which help capture complex features from the input pulse. The output of these residual blocks undergoes global average pooling, followed by a final fully connected layer, which preserves the 512-unit representation. Similarly, the Trunk Network processes the fiber parameters, applying a self-attention mechanism and fully connected layers, also maintaining the 512-unit hidden size throughout the network.
The Trunk Network handles the 5-dimensional fiber parameter input using a self-attention mechanism with a hidden size of 512 units. This is followed by three fully connected layers, each maintaining 512 units, allowing the network to capture intricate relationships within the fiber parameters.
After processing through the Branch and Trunk Networks, the outputs are combined through element-wise multiplication. The resulting 512-dimensional vector is then passed through a multilayer perceptron. This perceptron consists of two hidden layers with 1024 units each, employing ReLU activations and dropout for regularization. The final output layer expands the representation to 2048 units, which is then reshaped to produce the 2 × 1024 output that represents the real and imaginary parts of the propagated pulse.
For dataset generation, we randomly selected various parameters within the entire possible range, including , , , and , while maintaining the fiber length between 40 mm and 120 mm. The soliton order, being a derived quantity, varied as a result of the randomly selected parameters. This comprehensive approach ensured that our dataset covered a wide spectrum of possible fiber configurations and pulse propagation scenarios.
We generated a total of 5000 datasets, divided into 4500 training sets and 500 test sets. The model was trained using the AdamW optimizer with an initial learning rate of and weight decay of . We implemented a learning rate scheduler to adjust the learning rate based on the validation loss. The training process included early stopping with a patience of 30 epochs to prevent overfitting.
After training, we evaluated the model’s performance by setting the parameter values for different fiber types to their median values and fixing the fiber length at 80 mm to regenerate the test set. The comparison between the predicted pulses and the Split-Step Fourier Method (SSFM)-calculated pulses is presented in
Figure 4 and
Figure 5, demonstrating the model’s ability to accurately simulate pulse propagation across diverse fiber types with varying characteristics.
This unified model approach showcases the DeepONet architecture’s potential in handling complex, multi-parameter physical systems, providing a versatile tool for simulating optical pulse propagation in various fiber types with high accuracy and computational efficiency.
4. Discussion
4.1. Comparison of Neural Network Prediction Time with SSFM
We recorded the generation times as shown in
Figure 6, which compares the times required for the Split-Step Fourier Method (SSFM) and DeepONet across different fiber lengths. The step size for SSFM was set to 1 mm. The vertical axis represents generation time (in seconds, on a linear scale), while the horizontal axis represents fiber length. The figure demonstrates that the generation time for SSFM increases exponentially with fiber length, whereas the generation time for DeepONet remains significantly lower and nearly constant at approximately
s across different fiber lengths. This indicates that DeepONet is far more efficient than SSFM as the fiber length increases.
4.2. Accuracy of Neural Network Predictions
To quantify the superior generalization ability and computational accuracy of our model compared to others, we compiled the loss functions for different configurations during HOS generation, as shown in
Figure 7. Here,
F represents the number of free fiber parameters. In scenarios where only the fiber length varies (
), the root mean square error (RMSE) is 0.01764, the lowest recorded value, indicating high accuracy in pulse prediction throughout the evolution process. We also tested scenarios where pulse width, input peak power, and fiber length varied randomly. When the model width parameter
, the accuracy significantly dropped, converging at 0.156. However, increasing the model size to
significantly improved the model’s learning ability, achieving a validation loss of 0.0248 within only 320 epochs, demonstrating excellent scalability and performance under complex conditions. The ability to handle three fiber parameter variables simultaneously, with changes in both fiber parameters and propagation distance, represents a broader range of fiber output variability and increased learning difficulty, in which our model performs exceptionally well.
As shown in
Figure 8, the relationship between RMSE and distance reveals gradually accumulated errors during transmission. Training beyond this length was not conducted to test the generalization ability. The results indicate that the model maintained continuous accuracy within any generated distance included in the training set, attributed to its ability to incorporate distance into the learning scope. Unlike other models that accumulate prediction errors over long distances due to progressively accumulated nonlinear effects, our model’s prediction error does not accumulate with increasing propagation distance. However, beyond the untrained 13 m distance, the model’s error size fluctuates and increases due to enhanced nonlinear effects caused by increased propagation distance. These enhanced nonlinear effects can lead to complex pulse shape changes, making them difficult for the model to accurately capture, resulting in increased errors. Nevertheless, the error remains reasonably low compared to the CNN and RNN [
16]. Unlike RNN and CNN, our model considers different combinations of input parameters, thus eliminating the need for the special optimization of different parameters, significantly reducing training difficulty and expanding its applicability.
4.3. Ablation Study
In this ablation study, we conducted an in-depth exploration of the roles of various model components and analyzed their inter-relationships and impact on overall performance, as shown in
Figure 9. These experiments reveal the critical importance and interdependence of each component in handling the complex problem of nonlinear fiber pulse propagation.
Firstly, residual blocks and the self-attention mechanism play pivotal roles in the entire model. Removing the residual blocks led to a significant increase in MSE, reaching 1.45 times the baseline model. This indicates that residual blocks are crucial for extracting and processing multi-level features. Similar to “skip connections” in deep neural networks, residual blocks effectively mitigate the vanishing gradient problem and enhance the model’s ability to represent complex signals. The self-attention mechanism further amplifies this effect by capturing the interdependencies among input features, significantly improving the model’s predictive accuracy. When the self-attention mechanism was removed, the MSE soared to 1.81 times the baseline, marking the most substantial impact among all configurations. This underscores the indispensability of the self-attention mechanism in handling the highly intricate task of nonlinear fiber transmission.
The FLOPs (Floating Point Operations per Second) and model parameter count provide additional insight into the computational efficiency and complexity of the model. Removing the residual blocks drastically reduced the FLOPs from 4.859 G to 1.629 G, highlighting the computational load associated with these blocks. This reduction in computational cost, however, comes at the expense of model accuracy, as evidenced by the increase in MSE. Similarly, while the self-attention mechanism has no impact on the FLOPs, it slightly decreases the model parameter count from 9.473 M to 9.204 M. This indicates that the self-attention mechanism, although not computationally expensive, is crucial for the model’s performance due to its ability to efficiently manage dependencies within the data.
Secondly, the dropout layer and the learning rate scheduler also demonstrate important roles in enhancing the model’s generalization ability and stability, although their impact is not as pronounced as that of the residual blocks and the self-attention mechanism. The MSE increased by 18% when the dropout layer was removed, indicating that dropout contributes to preventing overfitting and maintaining generalization. Dropout achieves this by randomly dropping out certain neurons during training, preventing the model from over-relying on specific pathways, thereby improving its performance on unseen data. Notably, neither the FLOPs nor the model parameter count were affected by the removal of dropout, confirming that its contribution lies primarily in regularization rather than computational efficiency.
The learning rate scheduler plays a critical role in dynamically adjusting the learning rate during training, ensuring the model converges smoothly to the optimal solution. When this mechanism was removed, the MSE increased by 52%, confirming the importance of the learning rate scheduler in the optimization process. Particularly during the later stages of training, the scheduler helps the model continue optimizing in the presence of small loss gradients by gradually reducing the learning rate, thus avoiding local minima. Similar to dropout, removing the learning rate scheduler did not impact the FLOPs or model parameter count, indicating that its primary function is to guide the optimization process rather than influence the model’s computational complexity.
The synergy between these model components is particularly important. The combined efforts of the residual blocks and self-attention mechanism in feature extraction and relationship modeling, along with the contributions of the dropout layer and learning rate scheduler in ensuring the model’s generalization ability and optimization stability, create a robust architecture. This combination allows the model to maintain a high accuracy while effectively generalizing to the complex task of fiber pulse propagation.
Through these ablation experiments, we not only confirmed the necessity of each component in the model design but also highlighted how their interactions critically influence the overall performance. These findings provide valuable insights for future model optimization and design, further validating the effectiveness of the current model architecture in addressing complex nonlinear problems.
4.4. Comparative Analysis of Hidden Layer Width N
The width of the hidden layer is crucial to the performance of the model. In this study, we conducted a detailed comparative analysis of different hidden layer widths
N, as shown in
Figure 10. By adjusting the hidden layer width, we evaluated its impact on model performance, computational complexity (FLOPs), and the number of parameters. The experimental results indicate that as the hidden layer width increases, the model’s loss value exhibits a nonlinear pattern, suggesting that the model may face risks of overfitting or underfitting under certain configurations.
From the perspective of computational complexity, a larger hidden layer width significantly increases FLOPs and the number of parameters. While a wider hidden layer may enhance the model’s representational capacity, it also comes with higher computational costs. Notably, when the hidden layer width was expanded to 1.2 times that of the original model, the FLOPs increased to 6.98G, and the number of parameters rose to 13.11 M. However, this expansion did not result in a significant reduction in the loss value and, in some cases, even led to a decrease in model performance.
Conversely, reducing the hidden layer width (e.g., to 0.2 times that of the original model) significantly reduced both computational complexity and the number of parameters, but also resulted in a higher loss value (0.030561). This suggests that balancing complexity and performance is crucial in model design to avoid performance degradation due to insufficient model capacity.
Overall, this study provides important insights into model optimization through a systematic analysis of the hidden layer width, particularly in scenarios where computational resources are limited. We recommend that in practical applications, the hidden layer width should be carefully selected based on specific task requirements to achieve the best balance between performance and computational cost.
5. Conclusions
In this study, we have demonstrated the effectiveness and efficiency of the DeepONet framework in modeling and predicting the nonlinear dynamics of fiber pulse propagation. Our results indicate that DeepONet can accurately predict pulse propagation dynamics, yielding outcomes that align closely with traditional numerical methods such as the Nonlinear Schrödinger Equation (NLSE).
Accelerated by CUDA, DeepONet generates output pulses of any length and transmission distance in an average time of approximately 0.0014 s, with prediction errors remaining consistent regardless of propagation distance. This efficiency makes it highly suitable for quickly and accurately modeling long-distance fiber pulse propagation. Furthermore, by scaling up the model’s size, its robustness is enhanced, suggesting significant potential for future applications in commercial production.
Importantly, the generalization capability of DeepONet lies in its ability to simultaneously simulate the propagation dynamics of multiple fibers, a feature that other models struggle to achieve. We anticipate that DeepONet could become a standard method for laser dynamic simulation and modeling, delivering notable advancements in both efficiency and accuracy for practical industry applications. The adaptability and computational speed of DeepONet make it a valuable tool for advancing fiber optics research and development, offering a reliable and efficient alternative to traditional simulation methods.