XLNet-Based Prediction Model for CVSS Metric Values

Shi, Fan; Kai, Shaofeng; Zheng, Jinghua; Zhong, Yao

doi:10.3390/app12188983

Open AccessArticle

XLNet-Based Prediction Model for CVSS Metric Values

by

Fan Shi

,

Shaofeng Kai

^*

,

Jinghua Zheng

and

Yao Zhong

College of Electronic Engineering, National University of Defense Technology, Hefei 230037, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(18), 8983; https://doi.org/10.3390/app12188983

Submission received: 22 July 2022 / Revised: 3 September 2022 / Accepted: 5 September 2022 / Published: 7 September 2022

(This article belongs to the Special Issue Recent Advances in Big Data Analytics)

Download

Browse Figures

Versions Notes

Abstract

:

A plethora of software vulnerabilities are exposed daily, posing a severe threat to the Internet. It is almost impossible for security experts or software developers to deal with all vulnerabilities. Therefore, it is imperative to rapidly assess the severity of the vulnerability to be able to select which one should be given preferential attention. CVSS is now the industry’s de facto evaluation standard, which is calculated with a quantitative formula to measure the severity of a vulnerability. The CVSS formula consists of several metrics related to the vulnerability’s features. Security experts need to determine the values of each metric, which is tedious and time-consuming, therefore hindering the efficiency of severity assessment. To address this problem, in this paper, we propose a method based on a pre-trained model for the prediction of CVSS metric values. More specifically, this method utilizes the XLNet model that is fine-tuned with a self-built corpus to predict the metric values from the vulnerability description text, thus reducing the burden of the assessment procedure. To verify the performance of our method, we compare the XLNet model with other pre-trained models and conventional machine learning techniques. The experimental results show that the method outperforms these models on evaluation metrics, reaching state-of-the-art performance levels.

Keywords:

cybersecurity; vulnerability assessment; pre-trained model; fine-tuning; CVSS; machine learning; predictability

1. Introduction

1.1. Motivations

Vulnerabilities are weaknesses in the computational logic (e.g., code) of software and hardware components that, when exploited, have a negative effect on confidentiality, integrity, or availability [1,2,3]. New vulnerabilities will be added to the numerous vulnerability databases, with the National Vulnerability Database (NVD) of the United States being among the most well-known. According to NVD statistics, there is a time delay between the Common Vulnerabilities and Exposures (CVEs) in the National Vulnerability Database (NVD) and the Common Vulnerability Scoring System (CVSS) information attached to the published CVEs. Research by Chen, H. and Ruohonen, J. showed that vulnerability assessors’ assessment of the severity of vulnerability lags behind the time of vulnerability exposure by more than 130 days [4,5]. However, there is an urgent need for the repair of high-risk vulnerabilities. For instance, the US BINDING OPERATIONAL DIRECTIVE 19-02 states that federal agencies in the US must remediate critical vulnerabilities within 15 calendar days of initial detection and vulnerabilities with high severity within 30 calendar days of initial detection [6,7]. If vulnerability severity is not evaluated in a timely manner, vulnerability remediation efforts will be greatly hampered in the face of rising vulnerabilities. This study aims to provide a method for rapidly assessing vulnerability metric values in order to aid assessment experts in accelerating the assessment process and reducing the time necessary for evaluation.

1.2. Background

1.2.1. Vulnerability Outbreak Trend

According to NVD statistics, the number of vulnerabilities exhibited a year-over-year increase and entered a period of rapid expansion in 2017. The relevant trend is shown in Figure 1, as of 18 June 2022, the NVD vulnerability database has registered 189,155 vulnerabilities. So far this year, 11,644 vulnerabilities have been accepted, with 2757 vulnerabilities undergoing analysis and 259 vulnerabilities awaiting investigation [8]. Work on vulnerability assessment must be expedited in order to respond to the present circumstances of a widespread outbreak of vulnerabilities.

1.2.2. The Common Vulnerability Scoring System

CVSS Overview

The Common Vulnerability Scoring System (CVSS) is an open framework developed by FIRST.Org, Inc. (FIRST) to characterize and quantify vulnerabilities. CVSS consists of three metric groups: base metric group, temporal metric group, and environmental metric group. Since major vulnerability databases only provide base scores, the base metric group is the most used. The focus of this study is the base metric group. The base metric group reflects the inherent properties of vulnerabilities that remain unchanged over time and across user environments [9]. The base metric group generates scores ranging from 0 to 10. The score and severity mapping table defined by CVSS is shown in Table 1. The CVSS score can also be shown as a vector string, which is a textual and compact way of representing the metric value. The NVD, when using the CVSS, usually gives a string representation of the description and the corresponding vulnerability metric values. The composition and value range of the base metric group and the vector strings corresponding to the metric values for both versions are shown in Table 2 and Table 3. When using CVSS for scoring, the metric values for each metric in Table 2 and Table 3 are selected. The metric values are substituted into the CVSS quantification formula to obtain the base score. Finally, the base score can be converted into a vulnerability rating according to Table 1. The data in Table 1, Table 2 and Table 3 are from the CVSS official website [10,11].

Introduction to CVSS Metrics

CVSS v2 base metric group interpretation
- Attack Vector
  - This metric indicates the method of exploiting the vulnerability. The farther a potential attacker is, the higher its vulnerability score.
- Access Complexity
  - This metric measures how difficult it is to exploit the vulnerability.
- Authentication
  - This metric measures how frequently an attacker must authenticate to exploit a vulnerability.
- Confidentiality Impact
  - This metric quantifies the impact of a successfully exploited vulnerability on confidentiality.
- Integrity Impact
  - This metric evaluates the effect a successfully exploited vulnerability has on the system’s integrity.
- Availability Impact
  - This metric assesses the consequences of a successfully exploited vulnerability on the worst-affected component.
CVSS v3.1 base metric group interpretation
- Attack Vector
  - This metric represents the conditions under which exploiting a vulnerability is conceivable. The farther an attacker permitted to exploit a susceptible component is, the higher this metric becomes.
- Attack Complexity
  - This metric reflects the attacker-uncontrolled circumstances needed to exploit the vulnerability.
- Privileges Required
  - This metric represents an attacker’s privileges before exploitation. The score is the highest when no privileges are necessary.
- User Interaction
  - This metric represents the necessity for a human user other than the attacker to be involved in the successful penetration of the susceptible component.
- Scope
  - This metric measures whether one component’s vulnerability impacts other components’ resources.
- Confidentiality
  - This metric quantifies the confidentiality of a successfully exploited vulnerability on the component most directly and predictably affected by the attack.
- Integrity
  - This metric represents a vulnerability’s influence on integrity. Integrity means truthfulness and trustworthiness.
- Availability
  - This metric assesses a vulnerability’s influence on a component’s availability.

1.2.3. XLNet Model

XLNet is an autoregressive language model that can acquire bidirectional contextual information. To achieve bidirectional contextual information, XLNet mainly adopts three methods, i.e., the permutation language model, two-stream self-attention, and the circulation mechanism [12]. The core of XLNet is the permutation language model. To extract bidirectional contextual information, the algorithm uses the random ordering of the initial input order while maintaining the one-way model of the autoregressive model. For a text of length

n

, there are

n!

different sorting methods; the bidirectional contextual information can be obtained indirectly by considering the whole ranking order of the text. However, computing all the ordering would consume a lot of arithmetic power. Therefore, XLNet only predicts a partial sequence. The mathematical expression of its loss function is as follows:

\underset{θ}{m a x} E_{Z \sim Z_{T}} [\sum_{t = 1}^{T} \log p_{θ} (x_{z_{t}} ∣ x_{Z_{< t}})]

(1)

where

Z_{T}

denotes the collection of total permutations of the text with a length of

T

,

Z

is one of the total permutations,

x_{z_{t}}

denotes the

t

th element, and

x_{Z_{< t}}

represents the first

t - 1

elements.

There remains one oversight to address in applying the permutation language model. For example, predicting 2 in the sequence of [1, 3, 2, 4] requires the semantic position information of 1 and 3, but only the position information of 2. However, predicting 1 in the sequence of [2, 3, 1, 4] requires the semantic information of 2, 3, 1 and the position information. In two different permutations, sometimes only the location information of 2 is needed. Sometimes, both the semantic and location information of 2 are required. In response to this issue, XLNet proposes two-stream self-attention that combines the two types of information. Figure 2 depicts the structure of two-stream self-attention, where

h_{θ}

represents a unit containing both semantic and location information;

g_{θ}

represents a unit with only location information.

Moreover, XLNet integrates the current optimal autoregressive language model Trans-former-XL into XLNet and introduces two key techniques of Transformer-XL into XLNet, namely the relative positional encoding scheme and the segment recurrence mechanism.

1.3. Related Work

1.3.1. Limitations of the Common Vulnerability Scoring System

CVSS [13,14,15] (Common Vulnerability Scoring System) is a de facto industry standard meant to measure the severity and urgency of vulnerabilities. A vulnerability metric value is used to represent the severity of a vulnerability. Some objective metric values are straightforward to determine, such as how the attack is launched. However, some metrics are difficult to judge, such as the possible confidentiality, availability, and overall impact of the vulnerability, which is a subjective metric that requires strong experience and expertise [16]; different people may have different judgments, thus making it more time-consuming to assess. In contrast to the non-basic aspects of the CVSS scores, there are also researchers who evaluate vulnerabilities from a different perspective. Exploit Prediction Scoring System (EPSS) improves vulnerability prioritization by combining descriptive information about vulnerabilities (CVEs) with evidence of actual exploitation in the wild in order to assess the likelihood of vulnerabilities being exploited [17,18]. Keskin, Omer et al. evaluated vulnerabilities by considering the functional dependencies between vulnerable assets, other assets, and business processes. The severity of the vulnerabilities assessed based on this approach changed significantly compared to their CVSS base score [19]. These different ideas provide good inspiration for vulnerability assessment work.

1.3.2. Limitation of Previous Studies

Currently, there are two versions of CVSS: v2 [14] and v3 [20]. Although the latest version is v3.1, the v2 version is still widely used and has a certain lifespan. Current research on vulnerability assessment is often limited to a single CVSS version. However, the metric systems of the two versions are different, and the findings of one version cannot be effectively transferred to the other. Current research on vulnerability metrics tends to study metrics individually. However, there may be correlations between the metrics of vulnerabilities. Thus, predicting metrics separately may diminish the effectiveness of prediction. Shahid, M.R. [21], Gong, X. [22], and Costa, J.C. [23] applied pre-trained models and deep learning algorithms to metric prediction work in order to improve the prediction of metric values. Nevertheless, they did not consider the correlation of the metrics’ relationships. Some studies have used word vector techniques to characterize text, although such methods do not consider the influence of context, which may contain rich information that could enhance the final prediction. Khazaei, A. [24], Wang, P. [25], Han, Z. [26], and Liu, K. [27] characterized vulnerability descriptions using traditional word vector algorithms. However, these methods did not incorporate contextual information, and hence the amount of information needed to be enhanced. Other studies directly gave possible values for severity without the intermediate process values, which is not substantially helpful to an industry that relies on CVSS quantitative equations for vulnerability assessment work. Spanos, G. [28], Ali, M. [29], Ameri, K. [30], and Kudjo, P.K. [31] applied traditional machine learning algorithms and deep learning algorithms to CVSS score prediction. These methods allowed CVSS assessment work to become more convenient, although they did not give specific metric values. This did not help in the quantitative formulation of CVSS scores that relied on metric values for scoring.

1.4. Contributions

This study provides a vulnerability metric value prediction method based on the XLNet model to enable rapid vulnerability metric value prediction. The method discovers contextual characteristics from vulnerability descriptions in order to forecast potential metric values. As compared to previous work, the paper’s main contributions are as follows:

The concept of transfer learning [24] is presented in the realm of vulnerability assessment. The existing pre-trained model has increased maturity. Compared with the traditional model, the performance has been greatly improved. However, the application is not popular. The study extends the pre-trained model to the subject of vulnerability assessment, therefore generating novel ideas for cyber security research.
Traditional machine learning techniques simply assess the influence of word frequency on the outcomes, ignoring the context-based improvement on the final output. This paper employs the XLNet model, which incorporates contextual information and enhances the classification impact of the model.
This paper constructs two versions of the CVSS v2 and v3.1 datasets. It concurrently investigates CVSS v2 and v3.1, providing assessment experts with reasonable metric value suggestions and reducing the workload of assessors, therefore speeding up the vulnerability severity assessment.

2. Problem Formulation

Vulnerability metric value prediction is a multi-label text classification problem. A multi-label text classification algorithm is used to obtain possible metric values from vulnerability descriptions. The vulnerability description as textual information cannot be directly input into the classification model. The text information is usually converted into word vector information by the word vector algorithm. Common algorithms include One-hot, TFIDF [32], word2vec [33], and BERT [34].

This paper defines the vulnerability metric value generation model as follows:

\hat{y} = f_{1} (Θ, x)

(2)

where

x

is a feature extracted from the text information,

Θ

is an adjustable model parameter,

f_{1} (\cdot)

is the structure of the generative model required to determine a prediction function,

\hat{y}

denotes the probability that the value of the vulnerability metric to be predicted is within [0, 1].

y_{t} = f_{2} (\hat{y})

(3)

f_{2} (\cdot)

is used to transform the probabilistic form of the metric values into a textual form,

y_{t}

denotes the transformed metric values.

In this paper, the dataset of vulnerability metrics prediction is denoted as

D = {X, Y}

, where

X = [V_{1}, V_{2}, \dots, V_{n}]

, and

V_{i}

is the description of the

i

th vulnerability. Accordingly,

Y = [y^{(1)}, y^{(2)}, \dots, y^{(n)}]

is the truth-value label vector, while

y^{(i)}

is the label of

V_{i}

. In this paper, a multi-label classification text problem is investigated. To get the probability of each label value, we split the multi-label multi-classification text problem into a multi-label text binary classification problem, where

y^{(i)} = [y_{1}, y_{2}, \dots, y_{m}]

,

m = \sum_{1}^{k} c_{i}

,

c_{i}

refers to the number of categories owned by the

i

th metric,

k

represents the total number of metric values to be predicted, and

y_{i} \in {1, 0}

denotes whether the

i

th metric value exists. To achieve vulnerability metric prediction,

X

is converted into the feature matrix

x = [x_{1}, x_{2}, \dots, x_{n}]

, where

x_{i} \in R^{n}

are the features extracted from

V_{i}

, while

n

is the number of feature dimensions. As stated in (2), the expected result for

x

is

\hat{y} = f_{1} (Θ, x)

, where

\hat{y} = [{\hat{y}}_{1}, {\hat{y}}_{2}, \dots, {\hat{y}}_{n}]

,

{\hat{y}}_{i} = [y (1), y (2), \dots, y (m)]

,

y (i)

is the predicted probability corresponding to each metric value. Since vulnerability metric value generation is a multi-label binary classification task, the objective of prediction model parameter optimization is to lower the model’s loss function according to the machine learning convention. Therefore, the mission of this study is to construct a better

f_{1} (\cdot)

and to find the appropriate

Θ

in order to achieve the task of predicting vulnerability metric values with better performance.

3. Methods

This study aims to design an efficient machine learning approach using vulnerability description text to predict multiple vulnerability features. This approach will help security analysts to quickly analyze the CVSS metric values of vulnerabilities. As opposed to building multiple prediction models to predict various metric values, this paper proposes a learning method based on the XLNet model, which fine-tunes the XLNet model to improve the model’s learning efficiency and prediction accuracy.

3.1. Methodology Overview

Figure 3 depicts this paper’s two primary phases: XLNet transfer learning and vulnerability metric value prediction. By employing XLNet transfer learning, a fine-tuned model is developed to predict vulnerability metric values. Metric value prediction is a three-step process, i.e., text tokenization, transfer layer token embedding, and metric value prediction. These steps are shown in Figure 3. First, the vulnerability description content is divided into numerous tokens during the tokenization stage, and then tokens are embedded into the XLNet model. Finally, the softmax function to predict the likelihood

\hat{y}

of the vulnerability metric values is used. The remainder of this section will provide a detailed description of the framework.

3.2. XLNet Transfer Learning

XLNet transfer learning fine-tunes the pre-trained model utilizing the self-built corpus collected in the study. The pre-trained XLNet is trained by random initialization on multiple pre-trained corpora. XLNet transfer learning begins with downloading the appropriate pre-trained XLNet model. In this study, the pre-trained model is ‘XLNet-base-cased’, which consists of 12 transfer layers. A domain corpus is constructed from 1999–2022 NVD vulnerability descriptions. The input and output relationships for the transfer layer of the XLNET model are shown in Equation (4):

e^{[l]} = f_{X L N e t} (Θ_{p r e - X L N e t}, t)

(4)

where

t = [t_{1}, t_{2}, \dots, t_{n}]

is a sequence of the token list with

n

tokens, which are tokenized based on the vulnerability description text;

e^{[l]} = [e_{1}^{[l]}, e_{2}^{[l]}, \dots, e_{n}^{[l]}]

is the pre-trained XLNet model’s

l

th layer token embedding;

Θ_{p r e - X L N e t}

represents the trained XLNet model parameters;

f_{X L N e t} (\cdot)

is the conversion function of

t

and

e^{[l]}

, determined by the XLNet structure;

e_{i}^{[l]}

is the

l

th layer token embedding of the

i

th token

t_{i}

,

e_{i}^{[l]} \in ℝ^{H_{X L N e t}^{[l]}}

, where

H_{X L N e t}^{[l]}

is the XLNet

l

th layer’s hidden layer size. By transfer learning, the parameters of XLNet are changed from its pre-trained state

Θ_{p r e - X L N e t}

to its fine-tuned state

Θ_{f i n e - t u n e X L N e t}

. Compared to training an XLNet model from scratch, using transfer learning on an XLNet model maintains a comprehensive model’s high performance while avoiding high training costs and the lack of domain data [35].

3.3. Vulnerability Metric Prediction

3.3.1. Text Tokenization

Text tokenization is a data preprocessing step in which the description text

X = [V_{1}, V_{2}, \dots, V_{n}]

is turned into token sequences

T = [t^{(1)}, t^{(2)}, \dots, t^{(n)}]

;

t^{(i)} = [t_{1}, t_{2}, \dots, t_{k}]

is the token sequences obtained from the description

V_{i}

; and

k

is the maximum sequence length of the pre-set token. The symbol

t_{j}

denotes the

j

th token obtained from the characterization of the description text

V_{i}

,

i \in {1, 2, \dots, n}

,

j \in {1, 2, \dots, k}

.

3.3.2. Token Embedding by Fine-Tuned XLNet

When text tokenization is completed, the result is what goes into token embedding. When the fine-tuned XLNet is given a token list

t

, different transfer layers will give different levels of token embedding. For instance, this is how the token embedding of layer

l

from the fine-tuned XLNet is shown:

e^{[l]} = f_{X L N e t} (Θ_{f i n e - t u n e d X L N e t}, t)

(5)

Similar to Equation (4),

t = [t_{1}, t_{2}, \dots, t_{k}]

is a token sequence that consists of

k

tokens, while

e^{[l]} = [e_{1}^{[l]}, e_{2}^{[l]}, \dots, e_{n}^{[l]}]

represents the

l

th layer of token embedding.

e_{i}^{[l]}

is the

i

th token

t_{i}

,

e_{i}^{[l]} \in ℝ^{H_{X L N e t}^{[l]}}

, where

H_{X L N E T}^{[l]}

is the XLNet

l

th hidden layer size.

3.3.3. Vulnerability Metrics Prediction Using the Softmax Function

For this research, we utilized the softmax function as a classifier, which can leverage exponential property to translate the prediction result into the range of non-negative integers and then apply the normalization technique to turn the result into the probability between [0, 1]. The following is the formula used:

{\hat{y}}_{i} = Softmax (W^{k} e^{[L]} + b^{k})

, where

{\hat{y}}_{i}

is the probability of the vulnerability metric value,

W^{k}

and

b^{k}

are the functions’ weights and biases, and

e^{[L]}

is the token embedding of the final layer output.

4. Experiments and Results

4.1. Experimental Data and Experimental Setup

This paper used data from the US National Security Vulnerability Database [1], which contains all security vulnerabilities released from 1999 to May 2022. The vulnerability description information on the web page was used as the dataset’s text item, and the vulnerabilities’ metric values were processed into label items. The data sources are shown in Figure 4. If the metric value existed, 1 was assigned to the label. If it did not exist, the value 0 was assigned. Finally, the collected dataset was represented by

D = {X, Y}

. After processing, two datasets were obtained for this study: the CVSS version 2.0 dataset containing 174,838 vulnerabilities and the CVSS version 3.1 dataset containing 101,519 vulnerabilities. The datasets were split into the training and test datasets in the following proportions: 80%:20%. The statistics indicate that 97.72% of CVSSv3.1 vulnerability descriptions have fewer than 128 words, with an average of 43.90 words per sample, and 98.62% of CVSS v2 vulnerability descriptions have fewer than 128 words, with an average of 40.99 words per sample. After tokenization, 99% of the descriptions in CVSS v2 and v3 have less than 256 tokens. The pre-trained XLNet model used in the paper is the XLNet-base model, with 12 transfer layers and 768 hidden layers. All vulnerability descriptions of CVSS v2 and v3.1 were used to fine-tune the pre-trained XLNet model. Two NVIDIA GeForce RTX 3090 GPUs were used for fine-tuning and training.

4.2. Hyperparameter Selection

In this paper, we used the grid search method to select the best hyperparameters. From the results in Figure 5, it can be seen that if the learning rate is too high, the network will not converge, resulting in the output lingering around the ideal value. If the learning rate is too low, the network will converge slowly and affect the learning efficiency. According to experiments, the loss function converges better when epochs = 3 and learning rate = 5 × 10⁻⁵.

4.3. Comparative Study

To verify the proposed method’s effectiveness, three types of experiments were conducted in this section to compare, respectively, the XLNet model with other pre-trained models, the XLNet model with traditional machine learning algorithms, and the results with other similar studies. Each metric’s best model is highlighted in bold. To measure the performance of each algorithm on this task, the evaluation metrics of accuracy, precision, recall, and F1-score were used in this study.

4.3.1. Pre-trained Models—Effect Analysis

To evaluate the efficacy of different pre-trained models in this research, three pre-trained models—BERT, ROBERTA, and DISTILBERT—which can be used for text classification, were selected and compared with XLNet. The hyperparameter settings of the relevant models were consistent with XLNet. Table 4 compares the effects of the four pre-trained models on the v3.1 dataset. Table 5 compares the impact of the four pre-trained models on the v2 dataset.

4.3.2. Traditional Models—Effect Analysis

To compare pre-trained and conventional machine learning models in vulnerability assessment, five traditional machine learning models, i.e., decision tree, nearest neighbor, multilayer perceptron, plain Bayesian, and logistic regression, were selected and compared with XLNet. Table 6 shows the effect of traditional models and XLNet on the v3.1 dataset. Table 7 shows the impact of conventional models and XLNet on the v2 dataset comparison.

4.3.3. Comparison with Other Similar Works

In this study, the accuracy, precision, recall, and F1 of XLNET are compared to those of similar works. The results of other works were taken from the original papers [21,22,23]. Table 8 and Table 9 display pertinent data. The results demonstrate that the fine-tuned XLNet enhances the performance of vulnerability metric value prediction.

4.4. Analysis of Results

In this study, the experimental results of the XLNet model were compared with several pre-trained models, several traditional machine learning algorithms, and similar studies. Figure 6 and Figure 7 present the essential data in two bar charts to simplify analysis. It can be seen that the XLNet model is superior to other models in assessment metrics, as shown by the aforementioned experimental findings. Comparing this study’s methodology to previous studies demonstrates that it likewise reaches superior performance levels.

5. Discussion

The experimental findings demonstrate that the fine-tuned XLNet model indeed enhanced the vulnerability metric prediction, which benefits both the strength of the pre-trained model and the domain knowledge provided by the fine-tuning. Compared to conventional machine learning and deep learning, the XLNet model acquired substantial knowledge from the large-scale corpus. This information partially compensated for the difficulty created by inadequate data in the downstream tasks, thus significantly improving the downstream tasks. In addition, the findings indicate that the fine-tuned XLNet model is not significantly superior to the logistic regression method in terms of prediction impact, and that the model’s interpretability is weak. In our future studies, we will conduct research on the fusion of pre-trained and traditional models, combining the advantages of pre-trained and classic models to produce superior outcomes. We will also research model interpretability in order to make the model’s output more convincing.

6. Conclusions

Every year, the Internet discloses tens of thousands of vulnerabilities to the public. In order to remedy high-priority vulnerabilities promptly, it is critical to assess the severity of the vulnerability rapidly. Nevertheless, manual assessment of vulnerabilities using the CVSS metric has proved to be time-consuming. To find a faster way of assessing vulnerability severity, this paper proposed a method for vulnerability metric prediction using an XLNet pre-trained model. With this method, the XLNet model was fine-tuned based on a self-built cybersecurity corpus, and then the fine-tuned XLNet model was used to extract semantic features from the vulnerability description text. Subsequently, the CVSS metric values were split, the multi-classification problem was converted into a multi-label classification problem, and finally, multi-label classification was performed based on the extracted text features in order to achieve the purpose of predicting vulnerability metric values. The experimental results on 276,357 actual vulnerabilities demonstrate that XLNet can achieve state-of-the-art performance in CVSS metric value prediction.

Author Contributions

Conceptualization, F.S. and S.K.; methodology, S.K.; software, S.K.; validation, J.Z. and Y.Z.; formal analysis, F.S.; investigation, S.K.; resources, F.S.; data curation, Y.Z.; writing—original draft preparation, S.K.; writing—review and editing, F.S.; visualization, Y.Z.; supervision, F.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key Research and Development Program of China, grant number 2021YFB3100500. This is a project led by Professor Shi Fan, which focuses on the network public nuisance governance.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

National Vulnerability Database. Available online: https://nvd.nist.gov/vuln (accessed on 1 September 2022).
Tang, M.; Alazab, M.; Luo, Y. Big data for cybersecurity: Vulnerability disclosure trends and dependencies. IEEE Trans. Big Data 2017, 5, 317–329. [Google Scholar] [CrossRef]
Viegas, V.; Kuyucu, O. IT Security Controls, 1st ed.; Apress: Berkeley, CA, USA, 2022; p. 193. [Google Scholar]
Chen, H.; Liu, J.; Liu, R.; Park, N.; Subrahmanian, V. VEST: A System for Vulnerability Exploit Scoring & Timing. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19), Macao, China, 10–16 August 2019; pp. 6503–6505. [Google Scholar]
Ruohonen, J. A look at the time delays in CVSS vulnerability scoring. Appl. Comput. Inform. 2019, 15, 129–135. [Google Scholar] [CrossRef]
Binding Operational Directive 19-02—Vulnerability Remediation Requirements for Internet-Accessible Systems. Available online: https://www.cisa.gov/binding-operational-directive-19-02 (accessed on 15 June 2022).
Ahmadi, V.; Arlos, P.; Casalicchio, E. Normalization of severity rating for automated context-aware vulnerability risk management. In Proceedings of the 2020 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C), Online, 17–21 August 2020; pp. 200–205. [Google Scholar]
CVE Status Count. Available online: https://nvd.nist.gov/general/nvd-dashboard (accessed on 15 June 2022).
Kai, S.; Zheng, J.; Shi, F.; Lu, Z. A CVSS-based Vulnerability Assessment Method for Reducing Scoring Error. In Proceedings of the 2021 2nd International Conference on Electronics, Communications and Information Technology (CECIT), Sanya, China, 27–29 December 2021; pp. 25–32. [Google Scholar]
A Complete Guide to the Common Vulnerability Scoring System. Available online: https://www.first.org/cvss/v2/guide (accessed on 2 July 2022).
Common Vulnerability Scoring System v3.1: Specification Document. Available online: https://www.first.org/cvss/v3.1/specification-document (accessed on 1 September 2022).
Yang, Z.; Dai, Z.; Yang, Y.; Carbonell, J.; Salakhutdinov, R.R.; Le, Q.V. Xlnet: Generalized autoregressive pretraining for language understanding. Adv. Neural Inf. Process. Syst. 2019, 32, 1–18. [Google Scholar]
Common Vulnerability Scoring System SIG. Available online: https://www.first.org/cvss/ (accessed on 15 June 2022).
Schiffman, M.; Wright, A.; Ahmad, D.; Eschelbeck, G.; National Infrastructure Advisory Council; Vulnerability Disclosure Working Group; Vulnerability Scoring Subgroup. The Common Vulnerability Scoring System; National Infrastructure Advisory Council: Washington, DC, USA, 2004. [Google Scholar]
Mell, P.; Scarfone, K.; Romanosky, S. Common vulnerability scoring system. IEEE Secur. Priv. 2006, 4, 85–89. [Google Scholar] [CrossRef]
Eiram, C.; Martin, B. The CVSSv2 Shortcomings, Faults, and Failures Formulation; Technical Report; Forum of Incident Response and Security Teams (FIRST): Cary, NC, USA, 2013. [Google Scholar]
Exploit Prediction Scoring System (EPSS). Available online: https://www.first.org/epss/model (accessed on 1 September 2022).
Jacobs, J.; Romanosky, S.; Edwards, B.; Adjerid, I.; Roytman, M. Exploit prediction scoring system (epss). Digit. Threats Res. Pract. 2021, 2, 1–17. [Google Scholar] [CrossRef]
Keskin, O.; Gannon, N.; Lopez, B.; Tatar, U. Scoring Cyber Vulnerabilities based on Their Impact on Organizational Goals. In Proceedings of the 2021 Systems and Information Engineering Design Symposium (SIEDS), Online, 29–30 April 2021; pp. 1–6. [Google Scholar]
Team, C. Common Vulnerability Scoring System v3.0: Specification Document; Forum of Incident Response and Security Teams (FIRST): Cary, NC, USA, 2015. [Google Scholar]
Shahid, M.R.; Debar, H. CVSS-BERT: Explainable Natural Language Processing to Determine the Severity of a Computer Security Vulnerability from its Description. In Proceedings of the 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA), Pasadena, CA, USA, 13–16 December 2021; pp. 1600–1607. [Google Scholar]
Gong, X.; Xing, Z.; Li, X.; Feng, Z.; Han, Z. Joint prediction of multiple vulnerability characteristics through multi-task learning. In Proceedings of the 2019 24th International Conference on Engineering of Complex Computer Systems (ICECCS), Guangzhou, China, 10–13 November 2019; pp. 31–40. [Google Scholar]
Costa, J.C.; Roxo, T.; Sequeiros, J.B.; Proença, H.; Inácio, P.R. Predicting CVSS Metric Via Description Interpretation. IEEE Access 2022, 10, 59125–59134. [Google Scholar] [CrossRef]
Khazaei, A.; Ghasemzadeh, M.; Derhami, V. An automatic method for CVSS score prediction using vulnerabilities description. J. Intell. Fuzzy Syst. 2016, 30, 89–96. [Google Scholar] [CrossRef]
Wang, P.; Zhou, Y.; Sun, B.; Zhang, W. Intelligent prediction of vulnerability severity level based on text mining and XGBboost. In Proceedings of the 2019 Eleventh International Conference on Advanced Computational Intelligence (ICACI), Guilin, China, 7–9 June 2019; pp. 72–77. [Google Scholar]
Han, Z.; Li, X.; Xing, Z.; Liu, H.; Feng, Z. Learning to predict severity of software vulnerability using only vulnerability description. In Proceedings of the 2017 IEEE International conference on software maintenance and evolution (ICSME), Shanghai, China, 17–22 September 2017; pp. 125–136. [Google Scholar]
Liu, K.; Zhou, Y.; Wang, Q.; Zhu, X. Vulnerability severity prediction with deep neural network. In Proceedings of the 2019 5th International Conference on Big Data and Information Analytics (BigDIA), Kunming, China, 8–10 July 2019; pp. 114–119. [Google Scholar]
Spanos, G.; Angelis, L.; Toloudis, D. Assessment of vulnerability severity using text mining. In Proceedings of the 21st Pan-Hellenic Conference on Informatics, Larissa, Greece, 28–30 September 2017; pp. 1–6. [Google Scholar]
Ali, M. Character level convolutional neural network for Arabic dialect identification. In Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018), Santa Fe, NM, USA, 20 August 2018; pp. 122–127. [Google Scholar]
Ameri, K.; Hempel, M.; Sharif, H.; Lopez, J., Jr.; Perumalla, K. CyBERT: Cybersecurity Claim Classification by Fine-Tuning the BERT Language Model. J. Cybersecur. Priv. 2021, 1, 615–637. [Google Scholar] [CrossRef]
Kudjo, P.K.; Chen, J.; Mensah, S.; Amankwah, R.; Kudjo, C. The effect of Bellwether analysis on software vulnerability severity prediction models. Softw. Qual. J. 2020, 28, 1413–1446. [Google Scholar] [CrossRef]
Qaiser, S.; Ali, R. Text mining: Use of TF-IDF to examine the relevance of words to documents. Int. J. Comput. Appl. 2018, 181, 25–29. [Google Scholar] [CrossRef]
Goldberg, Y.; Levy, O. Word2vec Explained: Deriving Mikolov et al.’s negative-sampling word-embedding method. arXiv 2014, arXiv:1402.3722. [Google Scholar]
Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
Yin, J.; Tang, M.; Cao, J.; Wang, H. Apply transfer learning to cybersecurity: Predicting exploitability of vulnerabilities by description. Knowl. Based Syst. 2020, 210, 106529. [Google Scholar] [CrossRef]

Figure 1. Vulnerability severity distribution over time(2000–2022).

Figure 2. The architecture of two-stream self-attention.

Figure 3. The architecture of the fine-tuned XLNet model.

Figure 4. Dataset collection process.

Figure 5. Loss curve graph.

Figure 6. Performance comparison of XLNet with other algorithms on the CVSS v3.1 dataset. (a) Evaluation metric: accuracy; (b) evaluation metric: precision; (c) evaluation metric: recall; (d) evaluation metric: F1-score.

Figure 7. Performance comparison of XLNet with other algorithms on the CVSS v2 dataset. (a) Evaluation metric: accuracy; (b) evaluation metric: precision; (c) evaluation metric: recall; (d) evaluation metric: F1-score.

Table 1. Qualitative severity rating scale.

CVSS v3.1		CVSS v2.0
Rating	CVSS Score	Rating	CVSS Score
None	0.0	None	0.0
Low	0.1–3.9	Low	0.1–3.9
Medium	4.0–6.9	Medium	4.0–6.9
High	7.0–8.9	High	7.0–10.0
Critical	9.0–10.0

Table 2. Possible values for the base metric group of the CVSS v2 standard.

Metric Value	Possible Values
Attack Vector (AV)	Local (L)
	Adjacent Network (A)
	Network (N)
Access Complexity (AC)	High (H)
	Medium (M)
	Low (L)
Authentication (Au)	Single (S)
	None (N)
	Multiple (M)
Confidentiality Impact (C)	None (N)
	Partial (P)
	Complete (C)
Integrity Impact (I)	None (N)
	Partial (P)
	Complete (C)
Availability Impact (A)	None (N)
	Partial (P)
	Complete (C)

Table 3. Possible values for the base metric group of the CVSS v3.1 standard.

Metric Value	Possible Values
Attack Vector (AV)	Physical (P)
	Network (N)
	Local (L)
	Adjacent (A)
Attack Complexity (AC)	Low (L)
Attack Complexity (AC)	High (H)
Privileges Required (PR)	None (N)
	Low (L)
	High (H)
User Interaction (UI)	Required (R)
User Interaction (UI)	None (N)
Scope (S)	Unchanged (U)
Scope (S)	Changed (C)
Confidentiality (C)	None (N)
	Low (L)
	High (H)
Integrity (I)	None (N)
	Low (L)
	High (H)
Availability(A)	None (N)
	Low (L)
	High (H)

Table 4. Comparison of the effects of the four pre-trained models on the CVSS v3.1 dataset.

Metric	Model	AV	AC	PR	UI	S	C	I	A
Accuracy	XLNET	0.9625	0.9562	0.9161	0.9440	0.9642	0.9303	0.9358	0.9423
	BERT	0.9580	0.9555	0.9094	0.9271	0.9532	0.9163	0.9229	0.9316
	ROBERTA	0.9610	0.9566	0.9148	0.9430	0.9654	0.9301	0.9350	0.9421
	DISTILBERT	0.9569	0.9565	0.9057	0.9245	0.9542	0.9156	0.9213	0.9307
Precision	XLNET	0.8932	0.8614	0.8777	0.9405	0.9486	0.9132	0.9256	0.8927
	BERT	0.8964	0.8722	0.8749	0.9240	0.9328	0.8941	0.9073	0.9030
	ROBERTA	0.8890	0.8719	0.8779	0.9403	0.9535	0.9136	0.9245	0.9012
	DISTILBERT	0.8938	0.8867	0.8711	0.9206	0.9358	0.8950	0.9070	0.9130
Recall	XLNET	0.8685	0.7791	0.8568	0.9386	0.9236	0.9030	0.9179	0.8336
	BERT	0.8363	0.7564	0.8386	0.9180	0.8988	0.8846	0.9027	0.8107
	ROBERTA	0.8571	0.7681	0.8512	0.9363	0.9230	0.9020	0.9167	0.8279
	DISTILBERT	0.8390	0.7502	0.8306	0.9159	0.8995	0.8819	0.8985	0.8071
F1	XLNET	0.8800	0.8139	0.8662	0.9395	0.9355	0.9078	0.9214	0.8521
	BERT	0.8607	0.8017	0.8542	0.9208	0.9146	0.8891	0.9049	0.8309
	ROBERTA	0.8717	0.8100	0.8633	0.9382	0.9373	0.9074	0.9204	0.8482
	DISTILBERT	0.8616	0.8013	0.8476	0.9181	0.9164	0.8880	0.9025	0.8285

Table 5. Comparison of the effects of the four pre-trained models on the CVSS v2 dataset.

Metric	Model	AV	AC	Au	C	I	A
Accuracy	XLNET	0.9711	0.9167	0.9598	0.9072	0.9167	0.8946
	BERT	0.9692	0.9089	0.9591	0.8981	0.9105	0.8859
	ROBERTA	0.9708	0.9147	0.9600	0.9057	0.9159	0.8927
	DISTILBERT	0.9683	0.9075	0.9581	0.8966	0.9115	0.8847
Precision	XLNET	0.9216	0.8714	0.7571	0.8837	0.8928	0.8736
	BERT	0.9195	0.8740	0.7622	0.8730	0.8854	0.8634
	ROBERTA	0.9214	0.8691	0.7612	0.8823	0.8922	0.8712
	DISTILBERT	0.9240	0.8764	0.7608	0.8727	0.8876	0.8630
Recall	XLNET	0.8971	0.8106	0.7550	0.8831	0.8919	0.8750
	BERT	0.8877	0.7987	0.7431	0.8725	0.8842	0.8656
	ROBERTA	0.8958	0.8045	0.7493	0.8800	0.8899	0.8718
	DISTILBERT	0.8821	0.7932	0.7399	0.8685	0.8831	0.8624
F1	XLNET	0.9086	0.8313	0.7560	0.8834	0.8923	0.8743
	BERT	0.9022	0.8219	0.7521	0.8728	0.8848	0.8644
	ROBERTA	0.9077	0.8258	0.7550	0.8811	0.8910	0.8715
	DISTILBERT	0.9004	0.8170	0.7497	0.8705	0.8853	0.8626

Table 6. The effect of traditional machine learning models and fine-tuned XLNet on the CVSS v3 dataset.

Metric	Model	AV	AC	PR	UI	S	C	I	A
Accuracy	XLNet	0.9625	0.9562	0.9161	0.9440	0.9642	0.9303	0.9358	0.9423
	Decision Tree	0.9405	0.9404	0.8690	0.9033	0.9510	0.8879	0.8925	0.9049
	K-Nearest Neighbors	0.9410	0.9414	0.8529	0.8609	0.9069	0.8403	0.8283	0.8671
	Multilayer Perceptron	0.9472	0.9485	0.8801	0.9204	0.9571	0.9031	0.9070	0.9181
	Naive Bayes	0.9188	0.927	0.8296	0.8742	0.8957	0.8408	0.8260	0.8735
	Logistic Regression	0.9479	0.9526	0.8907	0.9246	0.9585	0.9083	0.9113	0.9252
Precision	XLNet	0.8932	0.8614	0.8777	0.9405	0.9486	0.9132	0.9253	0.8927
	Decision Tree	0.9399	0.9387	0.8677	0.9033	0.9506	0.8874	0.8922	0.9045
	K-Nearest Neighbors	0.9395	0.9331	0.8489	0.8613	0.9052	0.8364	0.8260	0.8647
	Multilayer Perceptron	0.9462	0.9453	0.8794	0.9202	0.9563	0.9019	0.9064	0.9174
	Naive Bayes	0.9167	0.9177	0.8362	0.8785	0.9055	0.8515	0.8369	0.8718
	Logistic Regression	0.9465	0.9498	0.8875	0.9248	0.9586	0.9081	0.9112	0.9247
Recall	XLNet	0.8685	0.7791	0.8568	0.9386	0.9236	0.9030	0.9179	0.8336
	Decision Tree	0.9405	0.9404	0.8690	0.9033	0.9510	0.8879	0.8925	0.9049
	K-Nearest Neighbors	0.9410	0.9414	0.8529	0.8609	0.9069	0.8403	0.8283	0.8671
	Multilayer Perceptron	0.9473	0.9485	0.8801	0.9204	0.9571	0.9031	0.9070	0.9181
	Naive Bayes	0.9188	0.9277	0.8296	0.8742	0.8957	0.8408	0.8260	0.8734
	Logistic Regression	0.9479	0.9526	0.8907	0.9246	0.9585	0.9083	0.9113	0.9252
F1	XLNet	0.8800	0.8139	0.8662	0.9395	0.9355	0.9078	0.9214	0.8521
	Decision Tree	0.9401	0.9395	0.8683	0.9033	0.9508	0.8877	0.8923	0.9047
	K-Nearest Neighbors	0.9400	0.9348	0.8503	0.8611	0.9060	0.8374	0.8265	0.8654
	Multilayer Perceptron	0.9465	0.9465	0.8797	0.9203	0.9565	0.9021	0.9065	0.9177
	Naive Bayes	0.9081	0.8935	0.7943	0.8700	0.8756	0.8190	0.8090	0.8676
	Logistic Regression	0.9454	0.9442	0.8853	0.9239	0.9568	0.9053	0.9100	0.9230

Table 7. The impact of traditional machine learning models and fine-tuned XLNet on the CVSS v2 dataset.

Metric	Model	AV	AC	Au	C	I	A
Accuracy	XLNet	0.9711	0.9169	0.9598	0.9072	0.9167	0.8946
	Decision Tree	0.9574	0.8767	0.9406	0.8630	0.8725	0.8505
	K-Nearest Neighbors	0.9440	0.8588	0.9255	0.8156	0.8270	0.8149
	Multilayer Perceptron	0.9229	0.7219	0.9252	0.6667	0.6871	0.6667
	Naive Bayes	0.9080	0.8714	0.9056	0.8161	0.8085	0.8065
	Logistic Regression	0.9633	0.8973	0.9533	0.8805	0.8899	0.8692
Precision	XLNet	0.9216	0.8714	0.7571	0.8837	0.8928	0.8736
	Decision Tree	0.9569	0.8755	0.9400	0.8624	0.8720	0.8497
	K-Nearest Neighbors	0.9412	0.8564	0.9207	0.8129	0.8238	0.8128
	Multilayer Perceptron	0.8731	0.5525	0.8811	0.4611	0.4870	0.4533
	Naive Bayes	0.9095	0.8698	0.9038	0.8226	0.8152	0.8143
	Logistic Regression	0.9622	0.8974	0.9511	0.8787	0.8880	0.8672
Recall	XLNet	0.8970	0.8106	0.7550	0.8831	0.8919	0.8750
	Decision Tree	0.9574	0.8767	0.9406	0.8630	0.8725	0.8505
	K-Nearest Neighbors	0.9440	0.8588	0.9254	0.8156	0.8270	0.8149
	Multilayer Perceptron	0.9229	0.7219	0.9252	0.6667	0.6871	0.6667
	Naive Bayes	0.9080	0.8714	0.9056	0.8160	0.8085	0.8065
	Logistic Regression	0.8973	0.9533	0.8805	0.8899	0.8692	0.8973
F1	XLNet	0.9086	0.8313	0.7560	0.8834	0.8923	0.8743
	Decision Tree	0.9571	0.8760	0.9403	0.8627	0.8722	0.8500
	K-Nearest Neighbors	0.9418	0.8568	0.9224	0.8139	0.8248	0.8134
	Multilayer Perceptron	0.8963	0.6169	0.9012	0.5406	0.5660	0.5371
	Naive Bayes	0.8819	0.8619	0.8712	0.7950	0.7830	0.7855
	Logistic Regression	0.9618	0.8932	0.9504	0.8786	0.8884	0.8675

Table 8. Comparison of related methods on the CVSS v3.1 dataset.

Metric	Model	AV	AC	PR	UI	S	C	I	A
Accuracy	Our method	0.9625	0.9562	0.9161	0.9440	0.9642	0.9303	0.9358	0.9423
	Shahid M. R. [21]	0.9115	0.9607	0.8379	0.9321	0.9545	0.8704	0.8735	0.8894
	Costa J. C. [23]	0.9141	0.9520	0.8642	0.9333	0.9640	0.8671	0.8761	0.8881
Precision	Our method	0.8931	0.8613	0.8777	0.9405	0.9486	0.9132	0.9253	0.8927
	Shahid M. R.	0.9090	0.9570	0.8392	0.9318	0.9553	0.8714	0.8736	0.8868
	Costa J. C.	/	/	/	/	/	/	/	/
Recall	Our method	0.8685	0.7791	0.8568	0.9386	0.9236	0.9030	0.9179	0.8336
	Shahid M. R.	0.9115	0.9607	0.8379	0.9321	0.9545	0.8704	0.8735	0.8894
	Costa J. C.	/	/	/	/	/	/	/	/
F1	Our method	0.8800	0.8139	0.8662	0.9395	0.9355	0.9078	0.9214	0.8521
	Shahid M. R.	0.9089	0.9574	0.8378	0.9319	0.9548	0.8681	0.8731	0.8863
	Costa J. C.	/	/	/	/	/	/	/	/

Table 9. Comparison of related methods on the CVSS v2 dataset.

Metric	Model	AV	AC	Au	C	I	A
Accuracy	Our Method	0.971	0.917	0.960	0.907	0.917	0.895
	1-L CNN [22]	/	/	/		/	/
	2-L CNN	/	/	/		/	/
	1-L BiLSTM	/	/	/		/	/
	2-L BiLSTM	/	/	/		/	/
	1-L Attention-Based BiLSTM	/	/	/		/	/
	2-L Attention-Based BiLSTM	/	/	/		/	/
Precision	Our Method	0.922	0.871	0.757	0.883	0.893	0.874
	1-L CNN	0.887	0.806	0.836	0.772	0.778	0.768
	2-L CNN	0.819	0.660	0.737	0.694	0.706	0.683
	1-L BiLSTM	0.878	0.796	0.826	0.794	0.803	0.787
	2-L BiLSTM	0.903	0.772	0.843	0.730	0.749	0.719
	1-L Attention-Based BiLSTM	0.887	0.811	0.839	0.797	0.808	0.798
	2-L Attention-Based BiLSTM	0.892	0.823	0.835	0.798	0.814	0.801
Recall	Our Method	0.897	0.811	0.755	0.883	0.892	0.875
	1-L CNN	0.892	0.779	0.847	0.771	0.775	0.767
	2-L CNN	0.837	0.645	0.702	0.664	0.702	0.679
	1-L BiLSTM	0.891	0.787	0.844	0.791	0.801	0.782
	2-L BiLSTM	0.913	0.771	0.855	0.715	0.742	0.697
	1-L Attention-Based BiLSTM	0.896	0.805	0.852	0.794	0.807	0.793
	2-L Attention-Based BiLSTM	0.901	0.817	0.848	0.797	0.813	0.798
F1	Our Method	0.909	0.831	0.756	0.883	0.892	0.874
	1-L CNN	0.873	0.795	0.816	0.766	0.767	0.765
	2-L CNN	0.820	0.634	0.710	0.655	0.697	0.678
	1-L BiLSTM	0.879	0.785	0.818	0.791	0.801	0.783
	2-L BiLSTM	0.905	0.768	0.825	0.716	0.744	0.702
	1-L Attention-Based BiLSTM	0.887	0.804	0.840	0.795	0.807	0.795
	2-L Attention-Based BiLSTM	0.892	0.814	0.839	0.797	0.813	0.799

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shi, F.; Kai, S.; Zheng, J.; Zhong, Y. XLNet-Based Prediction Model for CVSS Metric Values. Appl. Sci. 2022, 12, 8983. https://doi.org/10.3390/app12188983

AMA Style

Shi F, Kai S, Zheng J, Zhong Y. XLNet-Based Prediction Model for CVSS Metric Values. Applied Sciences. 2022; 12(18):8983. https://doi.org/10.3390/app12188983

Chicago/Turabian Style

Shi, Fan, Shaofeng Kai, Jinghua Zheng, and Yao Zhong. 2022. "XLNet-Based Prediction Model for CVSS Metric Values" Applied Sciences 12, no. 18: 8983. https://doi.org/10.3390/app12188983

APA Style

Shi, F., Kai, S., Zheng, J., & Zhong, Y. (2022). XLNet-Based Prediction Model for CVSS Metric Values. Applied Sciences, 12(18), 8983. https://doi.org/10.3390/app12188983

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

XLNet-Based Prediction Model for CVSS Metric Values

Abstract

1. Introduction

1.1. Motivations

1.2. Background

1.2.1. Vulnerability Outbreak Trend

1.2.2. The Common Vulnerability Scoring System

CVSS Overview

Introduction to CVSS Metrics

1.2.3. XLNet Model

1.3. Related Work

1.3.1. Limitations of the Common Vulnerability Scoring System

1.3.2. Limitation of Previous Studies

1.4. Contributions

2. Problem Formulation

3. Methods

3.1. Methodology Overview

3.2. XLNet Transfer Learning

3.3. Vulnerability Metric Prediction

3.3.1. Text Tokenization

3.3.2. Token Embedding by Fine-Tuned XLNet

3.3.3. Vulnerability Metrics Prediction Using the Softmax Function

4. Experiments and Results

4.1. Experimental Data and Experimental Setup

4.2. Hyperparameter Selection

4.3. Comparative Study

4.3.1. Pre-trained Models—Effect Analysis

4.3.2. Traditional Models—Effect Analysis

4.3.3. Comparison with Other Similar Works

4.4. Analysis of Results

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI