1. Introduction
Since the founding of the Internet at the end of the last century, there has been an exponential increase in both the number of novel applications as well as users. With the explosive growth of massive bandwidth (BW)-demanding applications, such as high-definition video, cloud computing, virtual reality, and video conferencing, optical network usage has increased significantly. Optical networks transmit data in the form of light over optical fibers. Therefore, novel technologies are needed to meet the considerable increase in Internet traffic, which is currently reaching Terabits per second (Tbps) per fiber [
1,
2].
The keystone of long-haul networks, the core networks of the Internet, is optical technologies. One method of expanding the capacity of these networks is by deploying Wavelength Division Multiplexing (WDM) technology [
2,
3]. WDM transmits data simultaneously at multiple optical carrier frequencies. However, traditional WDM optical networks based on fixed wavelengths and channel sizes are not very suitable for the continuous irruption of heterogeneous connection requests caused by the great diversity of Internet applications [
4]. The WDM system uses the Fixed Grid (as shown in
Figure 1, identical spectral widths are assigned to any connections), which features a fixed center frequency and fixed wavelength spacing of typically 50 GHz or 100 GHz. However, in Fixed Grid mode, bandwidths cannot be adjusted flexibly to the requirements of heterogeneous connections [
5].
Therefore, a main drawback of WDM networks is the waste of spectrum when the traffic assigned to optical fiber channels is below its potential capacity. These shortcomings of Fixed Grid wavelength networks necessitate a more flexible spectrum assignment scheme. Elastic Optical Networks (EONs) have been proposed as a means to tackle the future Internet’s enormous capacity and heterogeneous traffic requirements [
6,
7]. EONs improve flexible spectrum allocation by assigning the bandwidth more efficiently than is the case in WDM networks.
EONs allow the bandwidth assignment to be flexibly adapted to the heterogeneous sizes of connections while using advanced modulation formats to even improve the efficient use of the spectrum [
8,
9]. These networks support fractional data rates and flexible, granular traffic via the sub-wavelength and super-wavelength channels [
10]. EONs improve the efficiency of spectrum usage [
3,
10], so that, in principle, more spectrum is available for allocating new connection requests.
A lightpath is an optical connection between two nodes in an optical network. Occasionally, there is no direct fiber connection between both nodes, requiring more than one link to connect them. The same frequency slots should be employed on all hops in the end-to-end path of the connection. This characteristic is known as the wavelength continuity constraint. Also, in optical networks, a set of contiguous spectrum slots should be allocated to each connection, which is known as a contiguous constraint [
11]. To meet the heterogeneous requirements, the spectral granularity in the fibers of the path between both end nodes must be the same. In EONs, the size of the spectral frequency slots (FSs) is typically 12.5 or 6.25 GHz. When compared with traditional WDM networks (
Figure 1), this finer granularity allows a better fitting of connection demands in the available spectrum. The network performance is therefore optimized through the dynamic reallocation of the frequency slots. Typically, in EONs, the size of the connections does not change after the connection has been established. As an example, if three FSs are requested for a connection between two nodes, then the network should allocate three FSs to the connection until that connection is finalized [
12,
13].
In EONs, the size of a connection can vary from one FS (assuming a size of 12.5 GHz, it would be enough to allocate a 10 Gbit/s connection) to a large number of FSs when very high bit-rate (e.g., 400 Gbit/s) connections have to be established [
14,
15].
New connections might be dynamically added to or removed from the network fibers available in the spectrum while the network is in operation. When a new connection is requested, and once the end-to-end path has been found using any routing algorithm, a specific number of contiguous free FSs are assigned to that connection, depending on the required bandwidth. This process is called spectrum assignment (SA). Typically, a “First Fit” approach is used in the SA process [
15], which means assigning the first available slots as far to the left as possible. After the connection has ended, the allocated spectrum is released (this process is illustrated in
Figure 2).
The ability of EONs to accommodate multiple connections of different sizes results in a major improvement in the spectrum efficiency of the network. However, the dynamic nature of the establishing and ending of the connections of random sizes can lead to a disordered spectrum. Depending on the traffic profiles, voids of different sizes can be found at any instant during network operation. This effect is known as spectrum fragmentation [
15], which can prevent the spectrum from being utilized in an optimum way.
Figure 2 also illustrates an example of spectrum fragmentation. In this image, the allocated spectrum in an optical link at three different time instants is shown. After time instant t1, connection C2 (with two FS sizes) and connection C4 (with five FSs) are complete. However, even though seven FSs are released, and there are 11 free slots, it is not possible to allocate a seven-FS connection or even a six-FS connection in this link. The reason is that, for allocating a six-FS connection, assuming one FS is left as the guard band, a contiguous free space with a length of eight FSs is required. If C3 could be re-allocated to the left at t = t2, it would be possible to allocate a connection of up to eight slots.
Spectrum fragmentation is a major problem in Elastic Optical Networks (EONs). If the spectrum fragmentation is not appropriately managed, it can result in network blocking. Therefore, the network may not be able to accommodate future connection requests even though free slots are available.
The use of machine learning (ML) approaches is proposed as a potential solution to mitigate this problem. In recent years, the application of machine learning for spectrum allocation has been investigated in several research publications.
The authors in [
1] attempted to predict the peak traffic in networks that can affect the routing decisions using a Graph Convolutional Network and Generative Adversarial Network (GCN-GAN). The GCN-GAN model addresses optical network traffic problems by predicting the peak traffic in the network that potentially can be used in dynamically optimizing the spectrum allocation algorithm.
To forecast the temporal link dynamics in weighted dynamic networks, the authors of [
12] presented and assessed a non-linear-termed GCN-GAN model. In order to record changing patterns in successive graph time slices and train a generative adversarial model to generate projections of future states, their approach mixes GCNs with Long Short-Term Memory (LSTM) networks. Although they have yet to put their findings into practice, the authors claim that the GCN-GAN beats six contenders, including a generic LSTM.
A Back Propagation Neural Network (BPNN) is another ML-based approach for spectrum provisioning [
13]. This algorithm is relatively simple to implement in computers. The purpose of using Back Propagation in a BPNN is to optimize the spectrum allocation algorithm based on information on previous connections, such as arrival time and hold time. Therefore, a collection of sample data is needed for the development of such a neural network. The proposed neural network consists of several layers. Each layer operates based on a set of variables: weight and threshold. In each layer, each node calculates a weighted sum of the input parameters. The weight variables are the coefficients of each input parameter. Then, the node triggers and excites the next layer if the result of the sum is greater than the threshold variable. If a neuron is triggered, its information can be used in the next layers; otherwise, the output of the neuron is zero and has no effect on the process of the next layer. The value of the weights and thresholds for each neuron is determined during the learning process.
In [
13], the Back Propagation technique is used to train the neural network (NN). In this technique, the input sampled data are given to the NN with an arbitrary set of thresholds and weights, and the output results of the NN are compared with the target value. The difference between these values (called error data) is used for improving the weights and thresholds of the neurons in the NN. This process is extended for multiple cycles, and gradually the difference between the network’s output and the desired values is reduced. The weight values are updated during each cycle (epoch) of training, which is called the step size or the learning rate. In order to accommodate nonlinear changes and attain the anticipated result, a BPNN is used for prediction in EONs.
The authors in [
14] analyze two ML approaches that predict traffic in Elastic Optical Networks showing the benefits of using them with dynamic routing algorithms. First, a Monte Carlo Tree Search (MCTS) is presented for traffic prediction. This technique builds a sparse search tree and searches for the best traffic distribution and path pairs within this tree. Second, an unsupervised self-learning Artificial Neural Network (ANN) is used to forecast traffic fluctuations. The combination of these two methods can be used to provide an optimum arrangement for any specific demand.
Many attempts have been made to efficiently predict network blocking by quantifying spectrum fragmentation in EONs [
9,
16], but it becomes difficult to obtain a direct relationship between fragmentation and blocking. This is mainly due to the wavelength continuity and contiguity constraints, which make it very difficult to have a straight measure of the path’s potential blocking likelihood.
Observing the network and predicting the traffic using machine learning is the new trend in network traffic control [
17]. Survivability and uninterrupted data flow are two critical aspects of a network. This means that machine learning methods play key roles for EONs. These methods are able to process data from traffic, predict blocking quickly, and prevent potential failures [
17].
Several attempts have been made to achieve traffic and blocking prediction using neural networks, KNN, SVM, linear regression, principal component analysis, and linear time series [
17,
18,
19].
The authors in [
20] utilize the CNN model to predict dynamic link blockings in a 6G wireless communication network. Using specific spectral status snapshots, which lead to previous blocking events, while finding hidden patterns by means of a CNN seems to be a feasible solution for predicting connection blocking during EON operation.
In this work, the application of a CNN to predict blocking in EONs is investigated. To the best of our knowledge, this is the first paper that introduces a CNN as a method for blocking prediction in Elastic Optical Networks. This paper’s novelty consists of using synthetic data created from dynamic EON simulations, focusing on connection-blocking prediction based on the spectrum status of network links. Even though it might take hours to train the CNN, it can ultimately analyze and give the output in milliseconds once trained.
The main contributions of this study are:
- -
Introducing CNNs as a method for blocking prediction in Elastic Optical Networks
- -
Proposing a CNN 1D and CNN 2D architecture for EON connections blocking prediction
- -
Generating synthetic data using dynamic EON simulations to train and test machine learning models
- -
Analyzing and implementing a 1D CNN and 2D CNN and comparing these with KNN and SVM methods
We assess that the trained model is able to predict potential connection blocking. Deep learning neural network CNNs (1D CNN and 2D CNN) are implemented and are compared with other machine learning models used to predict connection blocking in EONs (support vector machine and K nearest neighbors).
The remainder of this paper is organized as follows. In
Section 2, the EON network simulator and CNN input data generation are presented.
Section 3 is devoted to the structure and characteristics of CNNs. The main results obtained are presented in
Section 4. Finally,
Section 5 concludes the paper.
2. EON Network Simulator and CNN Input Data Generation
To evaluate and simulate the spectrum allocation in EONs, the optical network between 14 cities in the United States, known as the NSFNET, is modeled. The NSFNET topology (shown in
Figure 3) consists of 14 nodes and 21 bi-directional optical fiber links.
In this work, each fiber is considered to have 160 available spectral frequency slots (FSs). Each slot could correspond to 25 GHz of spectrum, so the total bandwidth used per fiber would correspond approximately to the conventional optical communication bandwidth (C-band). Without loss of generality, the current study could be extended to larger fiber bands or smaller FS sizes.
Finding a suitable route and assigning the appropriate FSs between the lightpath source and destination is performed through the Routing and Spectrum Allocation (RSA) algorithm. According to the connection request parameters, RSA chooses the appropriate route and allocates the required FSs. In the implemented simulator, the contiguity and continuity constraints are considered during the spectrum assignment.
The simulator developed in [
15] is used in this research. This simulator, developed using MATLAB, simulates the EON operation under dynamic load conditions while multiple end-to-end connections with different bandwidths are generated and torn down. The distribution of connections is uniform from 1FS to 10FS, while source-destination pairs are also uniformly distributed among all the network’s nodes. The connections’ inter-arrival time (IAT) is a Poisson process with an average value of IAT = 1 t.u. (time unit). A negative exponential distribution is utilized for the connection holding time (HT). The average value (HT t.u.) is adjusted to set the appropriate network load values. These values should bring the bandwidth-blocking probability to the range of interest (between 1 and 10%), which are typical blocking values of interest in optical network performance studies [
12].
The spectral status of all the network links is continuously monitored and recorded during operation. The network status can be represented using matrices with a size of n × m, where n is the number of links in the network and m represents the number of frequency slots (FSs) within each link. The elements of each snapshot matrix consist of zeros and ones. A value of one indicates that the respective frequency slot is occupied, whereas zero denotes an empty frequency slot (see
Figure 4).
As the network operates over time, new connections are established, and old connections are eliminated, leading to gradual changes in the network’s status at each time interval. Consequently, the status matrix of the networks is modified accordingly. The rows in this matrix correspond to the network links (42 links in the NSFNET) and the columns are the spectral FS (160 FS per link). The dataset consists of 90,000 .csv files. Each one of these files corresponds to the status of all of the network links in every simulation time unit. These files were obtained from successive simulation runs where 4000 connections were generated per simulation run. As explained before, the average IAT = 1 t.u.; therefore, the obtained spectrum matrices correspond to approximately 4000 t.u., and different simulation results are collected and used as input data for blocking prediction.
3. Using CNNs to Predict EON Status
Convolutional Neural Networks (CNNs) are machine learning models, well known for pattern detection and extraction, which are commonly used to predict the behavior of systems whose status changes over time [
20]. This method has proven applicability in time-varying matrices such as video processing, as videos are a sequence of time-labeled pictures represented in the form of matrices. In this sense, CNN models can be applied in predicting the status of EONs.
As mentioned in the previous section, the status of EONs can be represented by a matrix. The value of this matrix changes with changes in the status of the EON over time. A sequence of snapshots of the status of EONs can be combined to build a video-like sequence of matrices.
In this research, we are following the modified KDD (Knowledge Discovery in Databases) methodology [
21,
22,
23]. KDD entails a systematic exploratory investigation and modeling of large datasets. The methodology is depicted in
Figure 5. This method is beneficial for identifying valid, useful, and intelligible patterns in large and complicated datasets.
This research utilizes synthetic data created by the previously described network simulator to train and test the CNN model. We generate a relatively large set of consecutive snapshots of the simulated EON status; some of the snapshots become blocked, while some do not. Then, the data are randomly divided into two groups: the training set, and the test set. Next, the CNN model is trained and it is evaluated with the test data.
Figure 6 shows a series of consecutive snapshots of the status matrix of an EON. At time t = n, a connection-blocking event occurs. We label the snapshots that are close to t = n as leading-to-blocking because connection blocking happens a few t.u. later. The number of these previous snapshots which are labeled as leading-to-blocking is dependent on the connections’ average duration and the IAT. Assuming highly dynamic traffic (for example IAT = 1 t.u. and HT = 10 t.u.), the spectral status of any network link can likely change in a few t.u. On the other hand, if traffic dynamics are smoother, the number of snapshots labeled by each class could be increased as the changes in the spectral status slow down.
To ensure enough time for the network management system to react and prevent the blockage, it helps if the blockage is predicted as early as possible. In this research, for a preliminary analysis, we assume that an arbitrary t.u. of 13 is a reasonable amount of time for the correction and reaction of the network. This arbitrary number is chosen according to the size of the simulated network, to show and analyze the concept. Naturally, it should be adapted according to the real application network dynamic and how fast the network administrator can react.
We pick the three snapshots corresponding to 13 t.u. before blocking events as a training data set for blocking prediction. We close by analyzing three snapshots (instead of only one) to provide the machine learning with more information that besides the status of the network can also analyze the trends and changes in the network and provide a better prediction.
We choose the snapshots as not-leading-to-blocking so that at least 100 t.u. afterward, no blockage occurs. This number is also arbitrary and chosen according to the concept. If this number is too large, the difference between leading- and not-leading-to-blocking would be so significant that applying the machine learning model could not be very beneficial. The idea is that the machine learning model can continuously analyze the network and trigger an alarm if the blocking occurs within 13 t.u. Nevertheless, these values are appropriate for the IAT and HT used during the specific simulations run in this work and should be changed when varying the network-offered traffic profiles. To test and analyze the performance of the machine learning model, we compare the output of the model with the actual simulation output.
As depicted in
Figure 6 and
Figure 7, the snapshots at instants t = n-13, t = n-14, and t = n-15 t.u. before the actual blockage (at t = n) are labeled as leading-to-block. On the other hand, the snapshots that are chosen as not-leading-to-blocking to train the CNN are those for which a blocking event has not occurred until at least 100 t.u. have passed.
A dataset of 90,000 leading-to-blocking and not-leading-to-blocking labeled data is created to train and test the CNN. The dataset consists of 90,000 .csv files. Each one of these files corresponds to the status of all of the network links during one simulation time unit, recorded during different network simulation runs. A Python program is implemented to label the snapshots as leading-to-blocking or not-leading-to-blocking matrices. The database generated is used to feed the machine learning algorithms used in this work and serves to train as well as to test the different models employed.
As explained previously, we use a CNN to observe the pattern of the spectrum allocation in EONs to predict blocking situations. The prediction function might be seen as a binary classifier that divides the input matrices into two classes: not-leading-to-blocking and leading-to-blocking.
4. CNN Structure
In the convolution process, certain features of the input data are extracted [
24,
25]. Such a process makes CNNs highly efficient for applications such as image processing or any form of similar pattern recognition since a specific feature (shape, texture, or pattern) may occur anywhere in the image or the array of the input data.
CNN models consist of multiple layers that can be tuned according to the application. CNNs are typically composed of three types of layers: convolution, pooling, and fully connected. The role of the convolution and pooling layers is feature extraction, while the role of the fully connected layer is to perform the classification task [
24,
25].
The convolution layers are the main component of a CNN model. The input data to a CNN model can be represented in the form of a one-dimensional (1D) or two-dimensional (2D) array of numbers. In the convolution layer, a small grid of parameters called a kernel (filter) is convolved through the input data. The features which have to be extracted and used in the neural network are determined by the kernel matrix. Consequently, the entries of the kernel matrix have an important role in the model and can affect the performance of the system. Therefore, in a CNN model, the kernel matrix is also considered a trainable part of the model, and entries of the kernel matrix are determined and optimized in the training process, along with weights and thresholds in the neural network.
The output of a convolution layer is a matrix called a “feature map”. Typically, the size of a feature map matrix can be very large. Therefore, a downsampling process can be very helpful in reducing the complexity of the subsequent layers and thus the computational process of the whole model. Max-Pooling is a commonly used downsampling technique in CNNs. In this technique, the feature map is divided into several smaller areas called patches. If a certain feature is detected in one of these areas, the value of at least one of the elements in that area increases. Consequently, by just monitoring the maximum value in each patch, it can be determined whether or not a certain feature is present in that patch. In this work, we use the max-pooling technique with a patch size of 2 in the CNN 1D model and a patch size of 2 × 2 in the CNN 2D model.
Each pair of convolution and pooling layers is called a feature map layer. A CNN model might consist of multiple feature map convolution layers, which then hierarchically can detect and extract more complex features.
In CNNs, the classification process is performed by the fully connected (FC) layers, also known as dense layers. After feature extraction and the downsampling of input data by convolution and the pooling layers, the results are mapped onto the final outputs of the network by fully connected layers. These fully connected layers consist of several neurons and each neuron is connected to all of the vectors in the previous layer. Each of these connections transfers the output value of the earlier layers to the neurons in the next layers by a learnable weight value. In this work, we use two FC layers in both our 1D and 2D CNN models, the first of which consists of 30 neurons and the second one of 15 neurons.
Neural Network models (similar to CNNs) can encounter problems with overfitting. To resolve this issue, certain regularization techniques, such as dropout, can be applied in fully connected layers. In the dropout techniques, we randomly disable a fraction of neurons in each layer at each training step by artificially setting the output values of the neuron at zero. The dropout rate denotes the fraction of neurons that are disabled.
The activation function of a neuron in a neural network is responsible for computing the output of a neuron based on its input values and the associated weights assigned to each input. Usually, the output of the convolution and pooling layers passes through activation functions such as ReLU (rectified linear unit), Softmax, and Sigmoid. In this work, the activation function which is used for convolution and pooling is ReLU. Also, the two last layers of our employed CNN, which are fully connected layers, use the ReLU activation function. The Softmax activation function is employed in the output layer because this function provides a better result in classification. This activation function labels each input sequence of snapshots ranging from zero to one representing the probability of leading-to-block.
In the design of a CNN model, the accuracy of models can be increased through calibration. Part of the calibration can be carried out by tuning the hyperparameters. Hyperparameters are the parameters used to tune the construction and consequently the performance and functionality of a neural network. The hyperparameters can also affect the computational efficiency of the neural network model. Examples of such parameters are the number of convolution layers, pooling layers, kernel size, learning rate, number of middle layers nodes, sample length, and number of training epochs.
In this work, models are tuned with the hyperparameters to improve the accuracy of the prediction model. Different hyperparameters are evaluated for the proposed model’s architecture through a trial-and-error sensitivity analysis. Accuracies are compared to different combinations of these parameters (detailed in
Table 1), and the best combination is selected.
GridSearchCV is a technique for finding the optimal parameter values from a given set of parameters in a grid. Accuracy is defined as the ratio of true results to the total number of cases investigated. With the help of the GridSearchCV technique, the best parameters for the models are explored and used.
The size of the filters greatly affects the classification results. Kernels with a larger size in the first layer help the model to converge better and achieve higher classification accuracy.
In this work, the topology of input data as a 1D and 2D input matrix of data is implemented. The performance of the 1D CNN being classified may be negatively impacted by the complexity of the data. On the other hand, the 2D CNN model performs much better in complex datasets.
The following figures depict two different configurations of the proposed CNN models, which are used in this work.
Figure 8 represents the proposed CNN 1D model, and
Figure 9 represents the proposed CNN 2D model. In the 2D CNN, the layer structure is exactly the same as that of the 1D CNN, with the difference being that it uses convolutions in two dimensions and, therefore, max-pooling in 2D. The configuration of the feature map layers of the CNN models is detailed in the picture.
The main difference between the 1D CNN and the 2D CNN is that the kernel moves in one direction and two dimensions, respectively. The computational complexity between 1D and 2D CNN modes differs significantly [
24]. Compact 1D CNNs are highly suited for real-time and low-cost applications, particularly on mobile or handheld devices, due to their minimal processing requirements [
24].
Under comparable circumstances (same design, network, and hyperparameters), a 1D CNN requires considerably lower computational power than a 2D CNN. Deep 2D CNN training typically requires specialized hardware (e.g., GPU farms or cloud computing). On the other hand, training compact 1D CNNs with few hidden layers and neurons is achievable over a conventional computer rather quickly. However, in this work, the 1D CNN, which can perform the required task, should have a relatively larger number of neurons and layers than the 2D CNN. Therefore, the difference in the required computational power for the training process is not much different.
In this research, the proposed CNNs, a Support Vector Machine (SVM), and K Nearest Neighbor (KNN) are explored for the prediction of blocking events. These algorithms are well known for pattern detection and extraction. Based on our experience in tuning neural networks in this research, the type of filter is a critical parameter in the performance of blocking classification. Five types of filters with 700, 120, 50, 20, and 5 spectral snapshots covering each filter are used in the formation of the CNN model. In the initial layer, eight filters with the size of 700 samples are used. Snapshot features are learned in different layers with various patterns.
In this paper, we provide an architecture to minimize the error between the labeled blocking and unblocking patterns and the output prediction layer. Binary categorical cross-entropy [
25] is used as the objective loss function to estimate the difference between actual and predicted label values of the patterns. Adaptive moment estimation (Adam [
26]), a type of Minibatch stochastic gradient descent algorithm, is employed as an optimization algorithm to update the network weights to minimize the output of the objective function [
1].
In order to optimize the architecture of the suggested model, many hyperparameters need to be tested. To do this, many factors, including the CNN network’s learning rate and the number of layers, are assessed via sensitivity try-and-error analysis. The models are trained in a system with an Intel i9 processor, 16GB RAM, and graphic cards NVidia GeForce RTX 2080 Ti using GPU for parallel computing. However, different Python libraries, such as Keras [
26], scikit-learn, etc., are used to implement each task presented in the methodology.
During the training process of the models, we also investigate the effect of the number of epochs on the learning, to make sure the models are trained appropriately and adequately.
Figure 10 shows an example of the effect of the number of training attempts (epochs) on the prediction accuracy and on the training and validation dataset. We observe that after a certain number of epochs (around 200 epochs), the model performs better on the training set; however, we do not observe a significant improvement in the performance of the model on the validation dataset. We can conclude that the improvement after about 200 epochs is insignificant, indicating that the number of epochs is sufficient, and increasing the number of epochs beyond that might cause an overfitting issue.
5. Results
This section describes the results of the model evaluation. The goal of the designed models is to classify the status of the network status snapshots between the leading-to-blocking or not-leading-to-blocking and label them accordingly. Proper metrics need to be considered to evaluate the models. Distinguishing between the two statuses of the network is a binary classification problem. Therefore, a binary classifier metric is beneficial.
The performance of the proposed CNN 1D and CNN 2D models in the blocking prediction is presented in
Table 2. In this table, for comparison purposes, Support Vector Machine (SVM) and K Nearest Neighbor (KNN) algorithms are computed and compared with the proposed CNN models.
Accuracy is the ratio of the number of correct predictions over the total number of predictions. The accuracies are also depicted in
Figure 11.
The best model among all is the proposed CNN 2D model, with an accuracy of 92.17%, followed by the SVM, the CNN 1D, and the KNN. The 2D CNN outperforms with superior classification power compared to the 1D CNN.
6. Discussion
Figure 12 represents the confusion matrix of the CNN 2D model. In binary classification tasks like this research, where the outputs are a sequence of zeros and ones (indicating whether the situation is leading-to-blocking or not), one of the most appropriate assessment metrics is the confusion matrix. The confusion matrix, also referred to as an error matrix, is a tabular representation commonly utilized to evaluate the performance of a classification model on a known set of test data. This matrix is very helpful in gauging the effectiveness of binary classification, by providing a detailed breakdown of predictions into different categories.
The entries of this matrix demonstrate the model’s predictions compared to the actual states. In this matrix, the horizontal axis represents the actual values and the vertical access represents the values predicted by the CNN 2D model. The prediction statuses are broken down into four categories: True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN).
For the evaluation of the proposed CNN 2D model in this study, a dataset of 1750 samples is utilized. Among the 1750 samples, 942 samples were leading-to-blocking and the rest 808 samples were not leading-to-blocking.
Among the 942 samples, 901 samples were correctly predicted as leading-to-blocking (True Positive) when Similarly, 712 samples out of 808 were accurately predicted as not-leading-to-blocking (True Negative). Therefore, the model could predict the leading and not-leading-to-blocking samples with, respectively, 92.65% and 88.12% accuracy.
However, there were errors, as indicated by the misclassification of 91 samples as not-leading-to-blocking when they are leading-to-blocking and 41 samples as leading-to-blocking when they are not-leading-to-blocking, which respectively represent 4.35% and 11.88% of error. Overall, 1613 out of 1750 samples are correctly predicted, representing 92.17% of the dataset.
7. Conclusions
In this work, Convolutional Neural Networks (CNNs) have been proposed as a useful tool for predicting blocking events in EONs. Spectrum snapshots are labeled during EON network simulation, stored, and then used to train and test the CNN model. The proposed model efficiently predicts connection blocking with 92% accuracy.
Even though we use previously recorded simulation results to train and evaluate the CNN models, the proposed solutions can easily be utilized in real networks. In real situations, typically, snapshots of link occupation are recorded continuously; thus, many data about blocking events are available. In the case reported here, the previously generated dataset is utilized to train the CNN model.
The CNN models utilized here can be easily deployed in network control engines and can be used as a tool to predict blocking events during EON operation. This machine learning model can be continuously run during network operation and trigger the network engine to take the appropriate action when potential blocking situations are foreseen. In general, if blockings can be predicted earlier in a network, it is possible to react on the network management system to prevent them.
The proposed model can reduce blocking events by preventively running the appropriate defragmentation mechanisms. Using such a forecasting tool can optimize the process of resource allocation and reduce the network’s operating expenditure (OPEX).
8. Future Work
In this study, our primary focus has been on analyzing the well-known NSF network topology. Extending the investigation to explore the effectiveness of the proposed machine learning models across other network topologies would be a valuable avenue for future research. On the other hand, exploring the performance of our machine learning models over varying prediction durations, whether longer or shorter than those used throughout this work, is also left for future exploration.
We conducted a comprehensive analysis of the proposed Convolutional Neural Network (CNN) architecture, both in 1D and 2D configurations. Alternative structures and forms of CNN including hybrid models can be considered for future work.