Next Article in Journal
Special Issue on Metal Additive Manufacturing and Its Applications: From the Material to Components Service Life
Next Article in Special Issue
A Seismic Phase Recognition Algorithm Based on Time Convolution Networks
Previous Article in Journal
Secure Smart Communication Efficiency in Federated Learning: Achievements and Challenges
Previous Article in Special Issue
Rethinking Academic Conferences in the Age of Pandemic
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

XLNet-Based Prediction Model for CVSS Metric Values

College of Electronic Engineering, National University of Defense Technology, Hefei 230037, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2022, 12(18), 8983; https://doi.org/10.3390/app12188983
Submission received: 22 July 2022 / Revised: 3 September 2022 / Accepted: 5 September 2022 / Published: 7 September 2022
(This article belongs to the Special Issue Recent Advances in Big Data Analytics)

Abstract

:
A plethora of software vulnerabilities are exposed daily, posing a severe threat to the Internet. It is almost impossible for security experts or software developers to deal with all vulnerabilities. Therefore, it is imperative to rapidly assess the severity of the vulnerability to be able to select which one should be given preferential attention. CVSS is now the industry’s de facto evaluation standard, which is calculated with a quantitative formula to measure the severity of a vulnerability. The CVSS formula consists of several metrics related to the vulnerability’s features. Security experts need to determine the values of each metric, which is tedious and time-consuming, therefore hindering the efficiency of severity assessment. To address this problem, in this paper, we propose a method based on a pre-trained model for the prediction of CVSS metric values. More specifically, this method utilizes the XLNet model that is fine-tuned with a self-built corpus to predict the metric values from the vulnerability description text, thus reducing the burden of the assessment procedure. To verify the performance of our method, we compare the XLNet model with other pre-trained models and conventional machine learning techniques. The experimental results show that the method outperforms these models on evaluation metrics, reaching state-of-the-art performance levels.

1. Introduction

1.1. Motivations

Vulnerabilities are weaknesses in the computational logic (e.g., code) of software and hardware components that, when exploited, have a negative effect on confidentiality, integrity, or availability [1,2,3]. New vulnerabilities will be added to the numerous vulnerability databases, with the National Vulnerability Database (NVD) of the United States being among the most well-known. According to NVD statistics, there is a time delay between the Common Vulnerabilities and Exposures (CVEs) in the National Vulnerability Database (NVD) and the Common Vulnerability Scoring System (CVSS) information attached to the published CVEs. Research by Chen, H. and Ruohonen, J. showed that vulnerability assessors’ assessment of the severity of vulnerability lags behind the time of vulnerability exposure by more than 130 days [4,5]. However, there is an urgent need for the repair of high-risk vulnerabilities. For instance, the US BINDING OPERATIONAL DIRECTIVE 19-02 states that federal agencies in the US must remediate critical vulnerabilities within 15 calendar days of initial detection and vulnerabilities with high severity within 30 calendar days of initial detection [6,7]. If vulnerability severity is not evaluated in a timely manner, vulnerability remediation efforts will be greatly hampered in the face of rising vulnerabilities. This study aims to provide a method for rapidly assessing vulnerability metric values in order to aid assessment experts in accelerating the assessment process and reducing the time necessary for evaluation.

1.2. Background

1.2.1. Vulnerability Outbreak Trend

According to NVD statistics, the number of vulnerabilities exhibited a year-over-year increase and entered a period of rapid expansion in 2017. The relevant trend is shown in Figure 1, as of 18 June 2022, the NVD vulnerability database has registered 189,155 vulnerabilities. So far this year, 11,644 vulnerabilities have been accepted, with 2757 vulnerabilities undergoing analysis and 259 vulnerabilities awaiting investigation [8]. Work on vulnerability assessment must be expedited in order to respond to the present circumstances of a widespread outbreak of vulnerabilities.

1.2.2. The Common Vulnerability Scoring System

CVSS Overview

The Common Vulnerability Scoring System (CVSS) is an open framework developed by FIRST.Org, Inc. (FIRST) to characterize and quantify vulnerabilities. CVSS consists of three metric groups: base metric group, temporal metric group, and environmental metric group. Since major vulnerability databases only provide base scores, the base metric group is the most used. The focus of this study is the base metric group. The base metric group reflects the inherent properties of vulnerabilities that remain unchanged over time and across user environments [9]. The base metric group generates scores ranging from 0 to 10. The score and severity mapping table defined by CVSS is shown in Table 1. The CVSS score can also be shown as a vector string, which is a textual and compact way of representing the metric value. The NVD, when using the CVSS, usually gives a string representation of the description and the corresponding vulnerability metric values. The composition and value range of the base metric group and the vector strings corresponding to the metric values for both versions are shown in Table 2 and Table 3. When using CVSS for scoring, the metric values for each metric in Table 2 and Table 3 are selected. The metric values are substituted into the CVSS quantification formula to obtain the base score. Finally, the base score can be converted into a vulnerability rating according to Table 1. The data in Table 1, Table 2 and Table 3 are from the CVSS official website [10,11].

Introduction to CVSS Metrics

  • CVSS v2 base metric group interpretation
    • Attack Vector
      • This metric indicates the method of exploiting the vulnerability. The farther a potential attacker is, the higher its vulnerability score.
    • Access Complexity
      • This metric measures how difficult it is to exploit the vulnerability.
    • Authentication
      • This metric measures how frequently an attacker must authenticate to exploit a vulnerability.
    • Confidentiality Impact
      • This metric quantifies the impact of a successfully exploited vulnerability on confidentiality.
    • Integrity Impact
      • This metric evaluates the effect a successfully exploited vulnerability has on the system’s integrity.
    • Availability Impact
      • This metric assesses the consequences of a successfully exploited vulnerability on the worst-affected component.
  • CVSS v3.1 base metric group interpretation
    • Attack Vector
      • This metric represents the conditions under which exploiting a vulnerability is conceivable. The farther an attacker permitted to exploit a susceptible component is, the higher this metric becomes.
    • Attack Complexity
      • This metric reflects the attacker-uncontrolled circumstances needed to exploit the vulnerability.
    • Privileges Required
      • This metric represents an attacker’s privileges before exploitation. The score is the highest when no privileges are necessary.
    • User Interaction
      • This metric represents the necessity for a human user other than the attacker to be involved in the successful penetration of the susceptible component.
    • Scope
      • This metric measures whether one component’s vulnerability impacts other components’ resources.
    • Confidentiality
      • This metric quantifies the confidentiality of a successfully exploited vulnerability on the component most directly and predictably affected by the attack.
    • Integrity
      • This metric represents a vulnerability’s influence on integrity. Integrity means truthfulness and trustworthiness.
    • Availability
      • This metric assesses a vulnerability’s influence on a component’s availability.

1.2.3. XLNet Model

XLNet is an autoregressive language model that can acquire bidirectional contextual information. To achieve bidirectional contextual information, XLNet mainly adopts three methods, i.e., the permutation language model, two-stream self-attention, and the circulation mechanism [12]. The core of XLNet is the permutation language model. To extract bidirectional contextual information, the algorithm uses the random ordering of the initial input order while maintaining the one-way model of the autoregressive model. For a text of length n , there are n ! different sorting methods; the bidirectional contextual information can be obtained indirectly by considering the whole ranking order of the text. However, computing all the ordering would consume a lot of arithmetic power. Therefore, XLNet only predicts a partial sequence. The mathematical expression of its loss function is as follows:
m a x θ E Z Z T [ t = 1 T   log   p θ ( x z t     x Z < t ) ]
where Z T denotes the collection of total permutations of the text with a length of T , Z is one of the total permutations, x z t denotes the t th element, and x Z < t represents the first t 1 elements.
There remains one oversight to address in applying the permutation language model. For example, predicting 2 in the sequence of [1, 3, 2, 4] requires the semantic position information of 1 and 3, but only the position information of 2. However, predicting 1 in the sequence of [2, 3, 1, 4] requires the semantic information of 2, 3, 1 and the position information. In two different permutations, sometimes only the location information of 2 is needed. Sometimes, both the semantic and location information of 2 are required. In response to this issue, XLNet proposes two-stream self-attention that combines the two types of information. Figure 2 depicts the structure of two-stream self-attention, where h θ represents a unit containing both semantic and location information; g θ represents a unit with only location information.
Moreover, XLNet integrates the current optimal autoregressive language model Trans-former-XL into XLNet and introduces two key techniques of Transformer-XL into XLNet, namely the relative positional encoding scheme and the segment recurrence mechanism.

1.3. Related Work

1.3.1. Limitations of the Common Vulnerability Scoring System

CVSS [13,14,15] (Common Vulnerability Scoring System) is a de facto industry standard meant to measure the severity and urgency of vulnerabilities. A vulnerability metric value is used to represent the severity of a vulnerability. Some objective metric values are straightforward to determine, such as how the attack is launched. However, some metrics are difficult to judge, such as the possible confidentiality, availability, and overall impact of the vulnerability, which is a subjective metric that requires strong experience and expertise [16]; different people may have different judgments, thus making it more time-consuming to assess. In contrast to the non-basic aspects of the CVSS scores, there are also researchers who evaluate vulnerabilities from a different perspective. Exploit Prediction Scoring System (EPSS) improves vulnerability prioritization by combining descriptive information about vulnerabilities (CVEs) with evidence of actual exploitation in the wild in order to assess the likelihood of vulnerabilities being exploited [17,18]. Keskin, Omer et al. evaluated vulnerabilities by considering the functional dependencies between vulnerable assets, other assets, and business processes. The severity of the vulnerabilities assessed based on this approach changed significantly compared to their CVSS base score [19]. These different ideas provide good inspiration for vulnerability assessment work.

1.3.2. Limitation of Previous Studies

Currently, there are two versions of CVSS: v2 [14] and v3 [20]. Although the latest version is v3.1, the v2 version is still widely used and has a certain lifespan. Current research on vulnerability assessment is often limited to a single CVSS version. However, the metric systems of the two versions are different, and the findings of one version cannot be effectively transferred to the other. Current research on vulnerability metrics tends to study metrics individually. However, there may be correlations between the metrics of vulnerabilities. Thus, predicting metrics separately may diminish the effectiveness of prediction. Shahid, M.R. [21], Gong, X. [22], and Costa, J.C. [23] applied pre-trained models and deep learning algorithms to metric prediction work in order to improve the prediction of metric values. Nevertheless, they did not consider the correlation of the metrics’ relationships. Some studies have used word vector techniques to characterize text, although such methods do not consider the influence of context, which may contain rich information that could enhance the final prediction. Khazaei, A. [24], Wang, P. [25], Han, Z. [26], and Liu, K. [27] characterized vulnerability descriptions using traditional word vector algorithms. However, these methods did not incorporate contextual information, and hence the amount of information needed to be enhanced. Other studies directly gave possible values for severity without the intermediate process values, which is not substantially helpful to an industry that relies on CVSS quantitative equations for vulnerability assessment work. Spanos, G. [28], Ali, M. [29], Ameri, K. [30], and Kudjo, P.K. [31] applied traditional machine learning algorithms and deep learning algorithms to CVSS score prediction. These methods allowed CVSS assessment work to become more convenient, although they did not give specific metric values. This did not help in the quantitative formulation of CVSS scores that relied on metric values for scoring.

1.4. Contributions

This study provides a vulnerability metric value prediction method based on the XLNet model to enable rapid vulnerability metric value prediction. The method discovers contextual characteristics from vulnerability descriptions in order to forecast potential metric values. As compared to previous work, the paper’s main contributions are as follows:
  • The concept of transfer learning [24] is presented in the realm of vulnerability assessment. The existing pre-trained model has increased maturity. Compared with the traditional model, the performance has been greatly improved. However, the application is not popular. The study extends the pre-trained model to the subject of vulnerability assessment, therefore generating novel ideas for cyber security research.
  • Traditional machine learning techniques simply assess the influence of word frequency on the outcomes, ignoring the context-based improvement on the final output. This paper employs the XLNet model, which incorporates contextual information and enhances the classification impact of the model.
  • This paper constructs two versions of the CVSS v2 and v3.1 datasets. It concurrently investigates CVSS v2 and v3.1, providing assessment experts with reasonable metric value suggestions and reducing the workload of assessors, therefore speeding up the vulnerability severity assessment.

2. Problem Formulation

Vulnerability metric value prediction is a multi-label text classification problem. A multi-label text classification algorithm is used to obtain possible metric values from vulnerability descriptions. The vulnerability description as textual information cannot be directly input into the classification model. The text information is usually converted into word vector information by the word vector algorithm. Common algorithms include One-hot, TFIDF [32], word2vec [33], and BERT [34].
This paper defines the vulnerability metric value generation model as follows:
y ^ = f 1 ( Θ , x )
where x is a feature extracted from the text information, Θ is an adjustable model parameter, f 1 ( ) is the structure of the generative model required to determine a prediction function, y ^ denotes the probability that the value of the vulnerability metric to be predicted is within [0, 1].
y t = f 2 ( y ^ )
f 2 ( ) is used to transform the probabilistic form of the metric values into a textual form, y t denotes the transformed metric values.
In this paper, the dataset of vulnerability metrics prediction is denoted as D = { X , Y } , where X = [ V 1 , V 2 , , V n ] , and V i is the description of the i th vulnerability. Accordingly, Y = [ y ( 1 ) , y ( 2 ) , , y ( n ) ] is the truth-value label vector, while y ( i ) is the label of V i . In this paper, a multi-label classification text problem is investigated. To get the probability of each label value, we split the multi-label multi-classification text problem into a multi-label text binary classification problem, where y ( i ) = [ y 1 , y 2 , , y m ] , m = 1 k c i , c i refers to the number of categories owned by the i th metric, k represents the total number of metric values to be predicted, and y i { 1 , 0 } denotes whether the i th metric value exists. To achieve vulnerability metric prediction, X is converted into the feature matrix x = [ x 1 , x 2 , , x n ] , where x i   R n are the features extracted from V i , while n is the number of feature dimensions. As stated in (2), the expected result for x is y ^ = f 1 ( Θ , x ) , where y ^ = [ y ^ 1 , y ^ 2 , , y ^ n ] , y ^ i = [ y ( 1 ) , y ( 2 ) , , y ( m ) ] , y ( i ) is the predicted probability corresponding to each metric value. Since vulnerability metric value generation is a multi-label binary classification task, the objective of prediction model parameter optimization is to lower the model’s loss function according to the machine learning convention. Therefore, the mission of this study is to construct a better f 1 ( ) and to find the appropriate Θ in order to achieve the task of predicting vulnerability metric values with better performance.

3. Methods

This study aims to design an efficient machine learning approach using vulnerability description text to predict multiple vulnerability features. This approach will help security analysts to quickly analyze the CVSS metric values of vulnerabilities. As opposed to building multiple prediction models to predict various metric values, this paper proposes a learning method based on the XLNet model, which fine-tunes the XLNet model to improve the model’s learning efficiency and prediction accuracy.

3.1. Methodology Overview

Figure 3 depicts this paper’s two primary phases: XLNet transfer learning and vulnerability metric value prediction. By employing XLNet transfer learning, a fine-tuned model is developed to predict vulnerability metric values. Metric value prediction is a three-step process, i.e., text tokenization, transfer layer token embedding, and metric value prediction. These steps are shown in Figure 3. First, the vulnerability description content is divided into numerous tokens during the tokenization stage, and then tokens are embedded into the XLNet model. Finally, the softmax function to predict the likelihood y ^ of the vulnerability metric values is used. The remainder of this section will provide a detailed description of the framework.

3.2. XLNet Transfer Learning

XLNet transfer learning fine-tunes the pre-trained model utilizing the self-built corpus collected in the study. The pre-trained XLNet is trained by random initialization on multiple pre-trained corpora. XLNet transfer learning begins with downloading the appropriate pre-trained XLNet model. In this study, the pre-trained model is ‘XLNet-base-cased’, which consists of 12 transfer layers. A domain corpus is constructed from 1999–2022 NVD vulnerability descriptions. The input and output relationships for the transfer layer of the XLNET model are shown in Equation (4):
e [ l ] = f X L N e t ( Θ p r e X L N e t   ,   t )
where t = [ t 1 , t 2 , , t n ] is a sequence of the token list with n tokens, which are tokenized based on the vulnerability description text; e [ l ] = [ e 1 [ l ] , e 2 [ l ] , , e n [ l ] ] is the pre-trained XLNet model’s l th layer token embedding; Θ p r e X L N e t represents the trained XLNet model parameters; f X L N e t ( ) is the conversion function of t and e [ l ] , determined by the XLNet structure; e i [ l ] is the l th layer token embedding of the i th token t i , e i [ l ] H X L N e t [ l ] , where H X L N e t [ l ] is the XLNet l th layer’s hidden layer size. By transfer learning, the parameters of XLNet are changed from its pre-trained state Θ p r e X L N e t to its fine-tuned state Θ f i n e t u n e   X L N e t . Compared to training an XLNet model from scratch, using transfer learning on an XLNet model maintains a comprehensive model’s high performance while avoiding high training costs and the lack of domain data [35].

3.3. Vulnerability Metric Prediction

3.3.1. Text Tokenization

Text tokenization is a data preprocessing step in which the description text X = [ V 1 , V 2 , , V n ] is turned into token sequences T = [ t ( 1 ) , t ( 2 ) , , t ( n ) ] ; t ( i ) = [ t 1 , t 2 , , t k ] is the token sequences obtained from the description V i ; and k is the maximum sequence length of the pre-set token. The symbol t j denotes the j th token obtained from the characterization of the description text V i , i { 1 , 2 , , n } , j { 1 , 2 , , k } .

3.3.2. Token Embedding by Fine-Tuned XLNet

When text tokenization is completed, the result is what goes into token embedding. When the fine-tuned XLNet is given a token list t , different transfer layers will give different levels of token embedding. For instance, this is how the token embedding of layer l from the fine-tuned XLNet is shown:
e [ l ] = f X L N e t ( Θ f i n e t u n e d   X L N e t ,   t )
Similar to Equation (4), t = [ t 1 , t 2 , , t k ] is a token sequence that consists of k tokens, while e [ l ] = [ e 1 [ l ] , e 2 [ l ] , , e n [ l ] ] represents the l th layer of token embedding. e i [ l ] is the i th token t i , e i [ l ] H X L N e t [ l ] , where H X L N E T [ l ] is the XLNet l th hidden layer size.

3.3.3. Vulnerability Metrics Prediction Using the Softmax Function

For this research, we utilized the softmax function as a classifier, which can leverage exponential property to translate the prediction result into the range of non-negative integers and then apply the normalization technique to turn the result into the probability between [0, 1]. The following is the formula used: y ^ i = Softmax ( W k e [ L ] + b k ) , where y ^ i is the probability of the vulnerability metric value, W k and b k are the functions’ weights and biases, and e [ L ] is the token embedding of the final layer output.

4. Experiments and Results

4.1. Experimental Data and Experimental Setup

This paper used data from the US National Security Vulnerability Database [1], which contains all security vulnerabilities released from 1999 to May 2022. The vulnerability description information on the web page was used as the dataset’s text item, and the vulnerabilities’ metric values were processed into label items. The data sources are shown in Figure 4. If the metric value existed, 1 was assigned to the label. If it did not exist, the value 0 was assigned. Finally, the collected dataset was represented by D = { X , Y } . After processing, two datasets were obtained for this study: the CVSS version 2.0 dataset containing 174,838 vulnerabilities and the CVSS version 3.1 dataset containing 101,519 vulnerabilities. The datasets were split into the training and test datasets in the following proportions: 80%:20%. The statistics indicate that 97.72% of CVSSv3.1 vulnerability descriptions have fewer than 128 words, with an average of 43.90 words per sample, and 98.62% of CVSS v2 vulnerability descriptions have fewer than 128 words, with an average of 40.99 words per sample. After tokenization, 99% of the descriptions in CVSS v2 and v3 have less than 256 tokens. The pre-trained XLNet model used in the paper is the XLNet-base model, with 12 transfer layers and 768 hidden layers. All vulnerability descriptions of CVSS v2 and v3.1 were used to fine-tune the pre-trained XLNet model. Two NVIDIA GeForce RTX 3090 GPUs were used for fine-tuning and training.

4.2. Hyperparameter Selection

In this paper, we used the grid search method to select the best hyperparameters. From the results in Figure 5, it can be seen that if the learning rate is too high, the network will not converge, resulting in the output lingering around the ideal value. If the learning rate is too low, the network will converge slowly and affect the learning efficiency. According to experiments, the loss function converges better when epochs = 3 and learning rate = 5 × 10−5.

4.3. Comparative Study

To verify the proposed method’s effectiveness, three types of experiments were conducted in this section to compare, respectively, the XLNet model with other pre-trained models, the XLNet model with traditional machine learning algorithms, and the results with other similar studies. Each metric’s best model is highlighted in bold. To measure the performance of each algorithm on this task, the evaluation metrics of accuracy, precision, recall, and F1-score were used in this study.

4.3.1. Pre-trained Models—Effect Analysis

To evaluate the efficacy of different pre-trained models in this research, three pre-trained models—BERT, ROBERTA, and DISTILBERT—which can be used for text classification, were selected and compared with XLNet. The hyperparameter settings of the relevant models were consistent with XLNet. Table 4 compares the effects of the four pre-trained models on the v3.1 dataset. Table 5 compares the impact of the four pre-trained models on the v2 dataset.

4.3.2. Traditional Models—Effect Analysis

To compare pre-trained and conventional machine learning models in vulnerability assessment, five traditional machine learning models, i.e., decision tree, nearest neighbor, multilayer perceptron, plain Bayesian, and logistic regression, were selected and compared with XLNet. Table 6 shows the effect of traditional models and XLNet on the v3.1 dataset. Table 7 shows the impact of conventional models and XLNet on the v2 dataset comparison.

4.3.3. Comparison with Other Similar Works

In this study, the accuracy, precision, recall, and F1 of XLNET are compared to those of similar works. The results of other works were taken from the original papers [21,22,23]. Table 8 and Table 9 display pertinent data. The results demonstrate that the fine-tuned XLNet enhances the performance of vulnerability metric value prediction.

4.4. Analysis of Results

In this study, the experimental results of the XLNet model were compared with several pre-trained models, several traditional machine learning algorithms, and similar studies. Figure 6 and Figure 7 present the essential data in two bar charts to simplify analysis. It can be seen that the XLNet model is superior to other models in assessment metrics, as shown by the aforementioned experimental findings. Comparing this study’s methodology to previous studies demonstrates that it likewise reaches superior performance levels.

5. Discussion

The experimental findings demonstrate that the fine-tuned XLNet model indeed enhanced the vulnerability metric prediction, which benefits both the strength of the pre-trained model and the domain knowledge provided by the fine-tuning. Compared to conventional machine learning and deep learning, the XLNet model acquired substantial knowledge from the large-scale corpus. This information partially compensated for the difficulty created by inadequate data in the downstream tasks, thus significantly improving the downstream tasks. In addition, the findings indicate that the fine-tuned XLNet model is not significantly superior to the logistic regression method in terms of prediction impact, and that the model’s interpretability is weak. In our future studies, we will conduct research on the fusion of pre-trained and traditional models, combining the advantages of pre-trained and classic models to produce superior outcomes. We will also research model interpretability in order to make the model’s output more convincing.

6. Conclusions

Every year, the Internet discloses tens of thousands of vulnerabilities to the public. In order to remedy high-priority vulnerabilities promptly, it is critical to assess the severity of the vulnerability rapidly. Nevertheless, manual assessment of vulnerabilities using the CVSS metric has proved to be time-consuming. To find a faster way of assessing vulnerability severity, this paper proposed a method for vulnerability metric prediction using an XLNet pre-trained model. With this method, the XLNet model was fine-tuned based on a self-built cybersecurity corpus, and then the fine-tuned XLNet model was used to extract semantic features from the vulnerability description text. Subsequently, the CVSS metric values were split, the multi-classification problem was converted into a multi-label classification problem, and finally, multi-label classification was performed based on the extracted text features in order to achieve the purpose of predicting vulnerability metric values. The experimental results on 276,357 actual vulnerabilities demonstrate that XLNet can achieve state-of-the-art performance in CVSS metric value prediction.

Author Contributions

Conceptualization, F.S. and S.K.; methodology, S.K.; software, S.K.; validation, J.Z. and Y.Z.; formal analysis, F.S.; investigation, S.K.; resources, F.S.; data curation, Y.Z.; writing—original draft preparation, S.K.; writing—review and editing, F.S.; visualization, Y.Z.; supervision, F.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key Research and Development Program of China, grant number 2021YFB3100500. This is a project led by Professor Shi Fan, which focuses on the network public nuisance governance.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. National Vulnerability Database. Available online: https://nvd.nist.gov/vuln (accessed on 1 September 2022).
  2. Tang, M.; Alazab, M.; Luo, Y. Big data for cybersecurity: Vulnerability disclosure trends and dependencies. IEEE Trans. Big Data 2017, 5, 317–329. [Google Scholar] [CrossRef]
  3. Viegas, V.; Kuyucu, O. IT Security Controls, 1st ed.; Apress: Berkeley, CA, USA, 2022; p. 193. [Google Scholar]
  4. Chen, H.; Liu, J.; Liu, R.; Park, N.; Subrahmanian, V. VEST: A System for Vulnerability Exploit Scoring & Timing. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19), Macao, China, 10–16 August 2019; pp. 6503–6505. [Google Scholar]
  5. Ruohonen, J. A look at the time delays in CVSS vulnerability scoring. Appl. Comput. Inform. 2019, 15, 129–135. [Google Scholar] [CrossRef]
  6. Binding Operational Directive 19-02—Vulnerability Remediation Requirements for Internet-Accessible Systems. Available online: https://www.cisa.gov/binding-operational-directive-19-02 (accessed on 15 June 2022).
  7. Ahmadi, V.; Arlos, P.; Casalicchio, E. Normalization of severity rating for automated context-aware vulnerability risk management. In Proceedings of the 2020 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C), Online, 17–21 August 2020; pp. 200–205. [Google Scholar]
  8. CVE Status Count. Available online: https://nvd.nist.gov/general/nvd-dashboard (accessed on 15 June 2022).
  9. Kai, S.; Zheng, J.; Shi, F.; Lu, Z. A CVSS-based Vulnerability Assessment Method for Reducing Scoring Error. In Proceedings of the 2021 2nd International Conference on Electronics, Communications and Information Technology (CECIT), Sanya, China, 27–29 December 2021; pp. 25–32. [Google Scholar]
  10. A Complete Guide to the Common Vulnerability Scoring System. Available online: https://www.first.org/cvss/v2/guide (accessed on 2 July 2022).
  11. Common Vulnerability Scoring System v3.1: Specification Document. Available online: https://www.first.org/cvss/v3.1/specification-document (accessed on 1 September 2022).
  12. Yang, Z.; Dai, Z.; Yang, Y.; Carbonell, J.; Salakhutdinov, R.R.; Le, Q.V. Xlnet: Generalized autoregressive pretraining for language understanding. Adv. Neural Inf. Process. Syst. 2019, 32, 1–18. [Google Scholar]
  13. Common Vulnerability Scoring System SIG. Available online: https://www.first.org/cvss/ (accessed on 15 June 2022).
  14. Schiffman, M.; Wright, A.; Ahmad, D.; Eschelbeck, G.; National Infrastructure Advisory Council; Vulnerability Disclosure Working Group; Vulnerability Scoring Subgroup. The Common Vulnerability Scoring System; National Infrastructure Advisory Council: Washington, DC, USA, 2004. [Google Scholar]
  15. Mell, P.; Scarfone, K.; Romanosky, S. Common vulnerability scoring system. IEEE Secur. Priv. 2006, 4, 85–89. [Google Scholar] [CrossRef]
  16. Eiram, C.; Martin, B. The CVSSv2 Shortcomings, Faults, and Failures Formulation; Technical Report; Forum of Incident Response and Security Teams (FIRST): Cary, NC, USA, 2013. [Google Scholar]
  17. Exploit Prediction Scoring System (EPSS). Available online: https://www.first.org/epss/model (accessed on 1 September 2022).
  18. Jacobs, J.; Romanosky, S.; Edwards, B.; Adjerid, I.; Roytman, M. Exploit prediction scoring system (epss). Digit. Threats Res. Pract. 2021, 2, 1–17. [Google Scholar] [CrossRef]
  19. Keskin, O.; Gannon, N.; Lopez, B.; Tatar, U. Scoring Cyber Vulnerabilities based on Their Impact on Organizational Goals. In Proceedings of the 2021 Systems and Information Engineering Design Symposium (SIEDS), Online, 29–30 April 2021; pp. 1–6. [Google Scholar]
  20. Team, C. Common Vulnerability Scoring System v3.0: Specification Document; Forum of Incident Response and Security Teams (FIRST): Cary, NC, USA, 2015. [Google Scholar]
  21. Shahid, M.R.; Debar, H. CVSS-BERT: Explainable Natural Language Processing to Determine the Severity of a Computer Security Vulnerability from its Description. In Proceedings of the 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA), Pasadena, CA, USA, 13–16 December 2021; pp. 1600–1607. [Google Scholar]
  22. Gong, X.; Xing, Z.; Li, X.; Feng, Z.; Han, Z. Joint prediction of multiple vulnerability characteristics through multi-task learning. In Proceedings of the 2019 24th International Conference on Engineering of Complex Computer Systems (ICECCS), Guangzhou, China, 10–13 November 2019; pp. 31–40. [Google Scholar]
  23. Costa, J.C.; Roxo, T.; Sequeiros, J.B.; Proença, H.; Inácio, P.R. Predicting CVSS Metric Via Description Interpretation. IEEE Access 2022, 10, 59125–59134. [Google Scholar] [CrossRef]
  24. Khazaei, A.; Ghasemzadeh, M.; Derhami, V. An automatic method for CVSS score prediction using vulnerabilities description. J. Intell. Fuzzy Syst. 2016, 30, 89–96. [Google Scholar] [CrossRef]
  25. Wang, P.; Zhou, Y.; Sun, B.; Zhang, W. Intelligent prediction of vulnerability severity level based on text mining and XGBboost. In Proceedings of the 2019 Eleventh International Conference on Advanced Computational Intelligence (ICACI), Guilin, China, 7–9 June 2019; pp. 72–77. [Google Scholar]
  26. Han, Z.; Li, X.; Xing, Z.; Liu, H.; Feng, Z. Learning to predict severity of software vulnerability using only vulnerability description. In Proceedings of the 2017 IEEE International conference on software maintenance and evolution (ICSME), Shanghai, China, 17–22 September 2017; pp. 125–136. [Google Scholar]
  27. Liu, K.; Zhou, Y.; Wang, Q.; Zhu, X. Vulnerability severity prediction with deep neural network. In Proceedings of the 2019 5th International Conference on Big Data and Information Analytics (BigDIA), Kunming, China, 8–10 July 2019; pp. 114–119. [Google Scholar]
  28. Spanos, G.; Angelis, L.; Toloudis, D. Assessment of vulnerability severity using text mining. In Proceedings of the 21st Pan-Hellenic Conference on Informatics, Larissa, Greece, 28–30 September 2017; pp. 1–6. [Google Scholar]
  29. Ali, M. Character level convolutional neural network for Arabic dialect identification. In Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018), Santa Fe, NM, USA, 20 August 2018; pp. 122–127. [Google Scholar]
  30. Ameri, K.; Hempel, M.; Sharif, H.; Lopez, J., Jr.; Perumalla, K. CyBERT: Cybersecurity Claim Classification by Fine-Tuning the BERT Language Model. J. Cybersecur. Priv. 2021, 1, 615–637. [Google Scholar] [CrossRef]
  31. Kudjo, P.K.; Chen, J.; Mensah, S.; Amankwah, R.; Kudjo, C. The effect of Bellwether analysis on software vulnerability severity prediction models. Softw. Qual. J. 2020, 28, 1413–1446. [Google Scholar] [CrossRef]
  32. Qaiser, S.; Ali, R. Text mining: Use of TF-IDF to examine the relevance of words to documents. Int. J. Comput. Appl. 2018, 181, 25–29. [Google Scholar] [CrossRef]
  33. Goldberg, Y.; Levy, O. Word2vec Explained: Deriving Mikolov et al.’s negative-sampling word-embedding method. arXiv 2014, arXiv:1402.3722. [Google Scholar]
  34. Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
  35. Yin, J.; Tang, M.; Cao, J.; Wang, H. Apply transfer learning to cybersecurity: Predicting exploitability of vulnerabilities by description. Knowl. Based Syst. 2020, 210, 106529. [Google Scholar] [CrossRef]
Figure 1. Vulnerability severity distribution over time(2000–2022).
Figure 1. Vulnerability severity distribution over time(2000–2022).
Applsci 12 08983 g001
Figure 2. The architecture of two-stream self-attention.
Figure 2. The architecture of two-stream self-attention.
Applsci 12 08983 g002
Figure 3. The architecture of the fine-tuned XLNet model.
Figure 3. The architecture of the fine-tuned XLNet model.
Applsci 12 08983 g003
Figure 4. Dataset collection process.
Figure 4. Dataset collection process.
Applsci 12 08983 g004
Figure 5. Loss curve graph.
Figure 5. Loss curve graph.
Applsci 12 08983 g005
Figure 6. Performance comparison of XLNet with other algorithms on the CVSS v3.1 dataset. (a) Evaluation metric: accuracy; (b) evaluation metric: precision; (c) evaluation metric: recall; (d) evaluation metric: F1-score.
Figure 6. Performance comparison of XLNet with other algorithms on the CVSS v3.1 dataset. (a) Evaluation metric: accuracy; (b) evaluation metric: precision; (c) evaluation metric: recall; (d) evaluation metric: F1-score.
Applsci 12 08983 g006
Figure 7. Performance comparison of XLNet with other algorithms on the CVSS v2 dataset. (a) Evaluation metric: accuracy; (b) evaluation metric: precision; (c) evaluation metric: recall; (d) evaluation metric: F1-score.
Figure 7. Performance comparison of XLNet with other algorithms on the CVSS v2 dataset. (a) Evaluation metric: accuracy; (b) evaluation metric: precision; (c) evaluation metric: recall; (d) evaluation metric: F1-score.
Applsci 12 08983 g007
Table 1. Qualitative severity rating scale.
Table 1. Qualitative severity rating scale.
CVSS v3.1CVSS v2.0
RatingCVSS ScoreRatingCVSS Score
None0.0None0.0
Low0.1–3.9Low0.1–3.9
Medium4.0–6.9Medium4.0–6.9
High7.0–8.9High7.0–10.0
Critical9.0–10.0
Table 2. Possible values for the base metric group of the CVSS v2 standard.
Table 2. Possible values for the base metric group of the CVSS v2 standard.
Metric ValuePossible Values
Attack Vector (AV)Local (L)
Adjacent Network (A)
Network (N)
Access Complexity (AC)High (H)
Medium (M)
Low (L)
Authentication (Au)Single (S)
None (N)
Multiple (M)
Confidentiality Impact (C)None (N)
Partial (P)
Complete (C)
Integrity Impact (I)None (N)
Partial (P)
Complete (C)
Availability Impact (A)None (N)
Partial (P)
Complete (C)
Table 3. Possible values for the base metric group of the CVSS v3.1 standard.
Table 3. Possible values for the base metric group of the CVSS v3.1 standard.
Metric ValuePossible Values
Attack Vector (AV)Physical (P)
Network (N)
Local (L)
Adjacent (A)
Attack Complexity (AC)Low (L)
High (H)
Privileges Required (PR)None (N)
Low (L)
High (H)
User Interaction (UI)Required (R)
None (N)
Scope (S)Unchanged (U)
Changed (C)
Confidentiality (C)None (N)
Low (L)
High (H)
Integrity (I)None (N)
Low (L)
High (H)
Availability(A)None (N)
Low (L)
High (H)
Table 4. Comparison of the effects of the four pre-trained models on the CVSS v3.1 dataset.
Table 4. Comparison of the effects of the four pre-trained models on the CVSS v3.1 dataset.
MetricModelAVACPRUISCIA
AccuracyXLNET0.96250.95620.91610.94400.96420.93030.93580.9423
BERT0.95800.95550.90940.92710.95320.91630.92290.9316
ROBERTA0.96100.95660.91480.94300.96540.93010.93500.9421
DISTILBERT0.95690.95650.90570.92450.95420.91560.92130.9307
PrecisionXLNET0.89320.86140.87770.94050.94860.91320.92560.8927
BERT0.89640.87220.87490.92400.93280.89410.90730.9030
ROBERTA0.88900.87190.87790.94030.95350.91360.92450.9012
DISTILBERT0.89380.88670.87110.92060.93580.89500.90700.9130
RecallXLNET0.86850.77910.85680.93860.92360.90300.91790.8336
BERT0.83630.75640.83860.91800.89880.88460.90270.8107
ROBERTA0.85710.76810.85120.93630.92300.90200.91670.8279
DISTILBERT0.83900.75020.83060.91590.89950.88190.89850.8071
F1XLNET0.88000.81390.86620.93950.93550.90780.92140.8521
BERT0.86070.80170.85420.92080.91460.88910.90490.8309
ROBERTA0.87170.81000.86330.93820.93730.90740.92040.8482
DISTILBERT0.86160.80130.84760.91810.91640.88800.90250.8285
Table 5. Comparison of the effects of the four pre-trained models on the CVSS v2 dataset.
Table 5. Comparison of the effects of the four pre-trained models on the CVSS v2 dataset.
MetricModelAVACAuCIA
AccuracyXLNET0.97110.91670.95980.90720.91670.8946
BERT0.96920.90890.95910.89810.91050.8859
ROBERTA0.97080.91470.96000.90570.91590.8927
DISTILBERT0.96830.90750.95810.89660.91150.8847
PrecisionXLNET0.92160.87140.75710.88370.89280.8736
BERT0.91950.87400.76220.87300.88540.8634
ROBERTA0.92140.86910.76120.88230.89220.8712
DISTILBERT0.92400.87640.76080.87270.88760.8630
RecallXLNET0.89710.81060.75500.88310.89190.8750
BERT0.88770.79870.74310.87250.88420.8656
ROBERTA0.89580.80450.74930.88000.88990.8718
DISTILBERT0.88210.79320.73990.86850.88310.8624
F1XLNET0.90860.83130.75600.88340.89230.8743
BERT0.90220.82190.75210.87280.88480.8644
ROBERTA0.90770.82580.75500.88110.89100.8715
DISTILBERT0.90040.81700.74970.87050.88530.8626
Table 6. The effect of traditional machine learning models and fine-tuned XLNet on the CVSS v3 dataset.
Table 6. The effect of traditional machine learning models and fine-tuned XLNet on the CVSS v3 dataset.
MetricModelAVACPRUISCIA
AccuracyXLNet0.96250.95620.91610.94400.96420.93030.93580.9423
Decision Tree0.94050.94040.86900.90330.95100.88790.89250.9049
K-Nearest Neighbors0.94100.94140.85290.86090.90690.84030.82830.8671
Multilayer Perceptron0.94720.94850.88010.92040.95710.90310.90700.9181
Naive Bayes0.91880.9270.82960.87420.89570.84080.82600.8735
Logistic Regression0.94790.95260.89070.92460.95850.90830.91130.9252
PrecisionXLNet0.89320.86140.87770.94050.94860.91320.92530.8927
Decision Tree0.93990.93870.86770.90330.95060.88740.89220.9045
K-Nearest Neighbors0.93950.93310.84890.86130.90520.83640.82600.8647
Multilayer Perceptron0.94620.94530.87940.92020.95630.90190.90640.9174
Naive Bayes0.91670.91770.83620.87850.90550.85150.83690.8718
Logistic Regression0.94650.94980.88750.92480.95860.90810.91120.9247
RecallXLNet0.86850.77910.85680.93860.92360.90300.91790.8336
Decision Tree0.94050.94040.86900.90330.95100.88790.89250.9049
K-Nearest Neighbors0.94100.94140.85290.86090.90690.84030.82830.8671
Multilayer Perceptron0.94730.94850.88010.92040.95710.90310.90700.9181
Naive Bayes0.91880.92770.82960.87420.89570.84080.82600.8734
Logistic Regression0.94790.95260.89070.92460.95850.90830.91130.9252
F1XLNet0.88000.81390.86620.93950.93550.90780.92140.8521
Decision Tree0.94010.93950.86830.90330.95080.88770.89230.9047
K-Nearest Neighbors0.94000.93480.85030.86110.90600.83740.82650.8654
Multilayer Perceptron0.94650.94650.87970.92030.95650.90210.90650.9177
Naive Bayes0.90810.89350.79430.87000.87560.81900.80900.8676
Logistic Regression0.94540.94420.88530.92390.95680.90530.91000.9230
Table 7. The impact of traditional machine learning models and fine-tuned XLNet on the CVSS v2 dataset.
Table 7. The impact of traditional machine learning models and fine-tuned XLNet on the CVSS v2 dataset.
MetricModelAVACAuCIA
AccuracyXLNet0.97110.91690.95980.90720.91670.8946
Decision Tree0.95740.87670.94060.86300.87250.8505
K-Nearest Neighbors0.94400.85880.92550.81560.82700.8149
Multilayer Perceptron0.92290.72190.92520.66670.68710.6667
Naive Bayes0.90800.87140.90560.81610.80850.8065
Logistic Regression0.96330.89730.95330.88050.88990.8692
PrecisionXLNet0.92160.87140.75710.88370.89280.8736
Decision Tree0.95690.87550.94000.86240.87200.8497
K-Nearest Neighbors0.94120.85640.92070.81290.82380.8128
Multilayer Perceptron0.87310.55250.88110.46110.48700.4533
Naive Bayes0.90950.86980.90380.82260.81520.8143
Logistic Regression0.96220.89740.95110.87870.88800.8672
RecallXLNet0.89700.81060.75500.88310.89190.8750
Decision Tree0.95740.87670.94060.86300.87250.8505
K-Nearest Neighbors0.94400.85880.92540.81560.82700.8149
Multilayer Perceptron0.92290.72190.92520.66670.68710.6667
Naive Bayes0.90800.87140.90560.81600.80850.8065
Logistic Regression0.89730.95330.88050.88990.86920.8973
F1XLNet0.90860.83130.75600.88340.89230.8743
Decision Tree0.95710.87600.94030.86270.87220.8500
K-Nearest Neighbors0.94180.85680.92240.81390.82480.8134
Multilayer Perceptron0.89630.61690.90120.54060.56600.5371
Naive Bayes0.88190.86190.87120.79500.78300.7855
Logistic Regression0.96180.89320.95040.87860.88840.8675
Table 8. Comparison of related methods on the CVSS v3.1 dataset.
Table 8. Comparison of related methods on the CVSS v3.1 dataset.
MetricModelAVACPRUISCIA
AccuracyOur method0.96250.95620.91610.94400.96420.93030.93580.9423
Shahid M. R. [21]0.91150.96070.83790.93210.95450.87040.87350.8894
Costa J. C. [23] 0.91410.95200.86420.93330.96400.86710.87610.8881
PrecisionOur method0.89310.86130.87770.94050.94860.91320.92530.8927
Shahid M. R.0.90900.95700.83920.93180.95530.87140.87360.8868
Costa J. C. ////////
Recall Our method0.86850.77910.85680.93860.92360.90300.91790.8336
Shahid M. R.0.91150.96070.83790.93210.95450.87040.87350.8894
Costa J. C. ////////
F1Our method0.88000.81390.86620.93950.93550.90780.92140.8521
Shahid M. R.0.90890.95740.83780.93190.95480.86810.87310.8863
Costa J. C. ////////
Table 9. Comparison of related methods on the CVSS v2 dataset.
Table 9. Comparison of related methods on the CVSS v2 dataset.
MetricModelAVACAuCIA
AccuracyOur Method0.9710.9170.9600.9070.9170.895
1-L CNN [22]/// //
2-L CNN/// //
1-L BiLSTM/// //
2-L BiLSTM/// //
1-L Attention-Based BiLSTM/// //
2-L Attention-Based BiLSTM/// //
PrecisionOur Method0.9220.8710.7570.8830.8930.874
1-L CNN0.8870.8060.8360.7720.7780.768
2-L CNN0.8190.6600.7370.6940.7060.683
1-L BiLSTM0.8780.7960.8260.7940.8030.787
2-L BiLSTM0.9030.7720.8430.7300.7490.719
1-L Attention-Based BiLSTM0.8870.8110.8390.7970.8080.798
2-L Attention-Based BiLSTM0.8920.8230.8350.7980.8140.801
RecallOur Method0.8970.8110.7550.8830.8920.875
1-L CNN0.8920.7790.8470.7710.7750.767
2-L CNN0.8370.6450.7020.6640.7020.679
1-L BiLSTM0.8910.7870.8440.7910.8010.782
2-L BiLSTM0.9130.7710.8550.7150.7420.697
1-L Attention-Based BiLSTM0.8960.8050.8520.7940.8070.793
2-L Attention-Based BiLSTM0.9010.8170.8480.7970.8130.798
F1Our Method0.9090.8310.7560.8830.8920.874
1-L CNN0.8730.7950.8160.7660.7670.765
2-L CNN0.8200.6340.7100.6550.6970.678
1-L BiLSTM0.8790.7850.8180.7910.8010.783
2-L BiLSTM0.9050.7680.8250.7160.7440.702
1-L Attention-Based BiLSTM0.8870.8040.8400.7950.8070.795
2-L Attention-Based BiLSTM0.8920.8140.8390.7970.8130.799
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Shi, F.; Kai, S.; Zheng, J.; Zhong, Y. XLNet-Based Prediction Model for CVSS Metric Values. Appl. Sci. 2022, 12, 8983. https://doi.org/10.3390/app12188983

AMA Style

Shi F, Kai S, Zheng J, Zhong Y. XLNet-Based Prediction Model for CVSS Metric Values. Applied Sciences. 2022; 12(18):8983. https://doi.org/10.3390/app12188983

Chicago/Turabian Style

Shi, Fan, Shaofeng Kai, Jinghua Zheng, and Yao Zhong. 2022. "XLNet-Based Prediction Model for CVSS Metric Values" Applied Sciences 12, no. 18: 8983. https://doi.org/10.3390/app12188983

APA Style

Shi, F., Kai, S., Zheng, J., & Zhong, Y. (2022). XLNet-Based Prediction Model for CVSS Metric Values. Applied Sciences, 12(18), 8983. https://doi.org/10.3390/app12188983

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop