Towards automatic discovery and assessment of vulnerability severity in cyber–physical systems

Despite their wide proliferation, complex cyber–physical systems (CPSs) are subject to cybersecurity vulnerabilities and potential attacks. Vulnerability assessment for such complex systems are challenging, partly due to the discrepancy among mechanisms used to evaluate their cyber-security weakness levels. Several sources do report these weaknesses like the National Vulnerability Database (NVD), as well as manufacturer websites besides other security scanning advisories such as Cyber Emergency Response Team (CERT) and Shodan databases. However, these multiple sources are found to face inconsistency issues, especially in terms of vulnerability severity scores. We advocate an artificial intelligence based approach to streamline the computation of vulnerability severity magnitudes. This approach decreases the error rate induced by manual calculation processes, that are traditionally used in cybersecurity analysis. Popular repositories such as NVD and SecurityFocus are employed to validate the proposed approach, assisted with a query method to retrieve vulnerability instances. In doing so, we report discovered correlations among reported vulnerability scores to infer consistent magnitude values of vulnerability instances. The method is applied to a case study featuring a CPS application to illustrate the automation of the proposed vulnerability scoring mechanism, used to mitigate cybersecurity weaknesses.


Introduction
Modern breakthroughs in information and communication technology facilitate the integration of digital and physical environments to improve the degree of automation in industrial processes enabled by cyber-physical systems (CPS).Nonetheless, CPS components are subject to vulnerabilities across the multitude of firmware versions [1].Unwanted vulnerability occurrences are expected to be discovered.Meanwhile, their magnitude will be graded to determine a mechanism for patching prioritization.This analysis supports cybersecurity operators to anticipate cyber attacks from emerging threats and to prevent intrusion opportunities [2].
New measurements make it possible to quantify cybersecurity issues to support vulnerability-mitigation decisions.These measurements are captured from a range of cybersecurity repositories available online.The Common Vulnerabilities and Exposures (CVE) [3] repository is a prime database cumulating vulnerability reports that are further augmented with the Common Vulnerability Scoring System (CVSS) [4] scores.Other analytical measurements are provided by the National Vulnerability Database (NVD) [5].However, vulnerability-mitigation decisions that rely on CVE or NVD records as primary data sources, can be biased and discriminating other sources of data [6,7].For instance, in evaluating vulnerabilities increases the chances of threats to materialize into actual cyberattacks [14].Automation of vulnerability scoring is therefore anticipated to narrow the gap for zero-day attacks.To design such an autonomous scoring system, several deficiencies must be examined, such as how to infer important measurements used to control vulnerability metrics at an appropriate scale for reported vulnerabilities.In addition, differences between existing CVSS versions produce incompatible metric measurements.Previous study did not adequately address these difficulties.Diverse businesses utilize distinct CVSS versions to evaluate instances of vulnerability [12], resulting in conflicting outcomes.For example, NVD uses CVSS version 3 scores to rate vulnerability instances reported only from 2015 onwards.These challenges of applying CVSS scores to support vulnerability analysis and management are illustrated in Fig. 1.Considering a random vulnerability instance , NVD, the corresponding manufacturer, and a third-party analyser provide their severity scores as    ,    and   which can be inconsistent.Despite CVSS popularity [12,13], inconsistency among reported scores for the same vulnerability instance does occur.Particularly, when considering other CVSS temporal and environmental metrics, whereby vulnerability properties evolve across time and deployment environments.Hence, additional sources of relevant data, including manufacturer-provided data, online reviews from relevant security sources and forums, are expected to consolidate further existing CVSS scores [10].
Data inconsistence also increases the difficulty of vulnerability retrieval.For example, using MTU as a single keyword in the NVD search engine returns vulnerabilities that are relevant to two diverse categories of devices, namely Maximum Transfer Unit (e.g., vulnerability instance CVE-2005-0065) and Master Terminal Unit (e.g., vulnerability instance CVE-2015-0990).Therefore, to retrieve Master Terminal Unit vulnerabilities only, we need to refine further the query with keywords like SCADA-server or vendor-specific modules.Meanwhile, vendor names in Common Platform Enumeration (CPE) [15] metadata may appear with variations.For example, the vendor Schneider Electric SE has variant forms like 'schneider-electric', 'chneider-electric', and 'schneider-electic' in CPE database.
We propose a vulnerability scoring system to quantify the severity of a reported incidence of vulnerability.The computed scores facilitate situational awareness through quantitative indicators that are transformed into actionable intelligence.This method automates vulnerability investigation while addressing compatibility concerns between multiple CVSS versions.Standard CVSS criteria are used as a scoring basis to evaluate the exploitability of vulnerabilities and the consequences of maliciously exploits.In pursuit of these aims, we correlate vulnerability scores published by several online cybersecurity data sources, including NVD, vendor websites, and technical reports from third party reviewers (e.g., Cyber Emergency Response Team (CERT) [16] and Microsoft Security Response Center (MSRC) [17], to consolidate severity scores of vulnerability instances.Accordingly, we produce ground facts for our Machine Learning (ML)-based vulnerability-severity computation algorithm.These instances are then used to train our ML model, which we evaluate using vulnerabilities reported in vulnerability repositories such as NVD and SecurityFocus.In addition to NVD and SecurityFocus, our suggested approaches can incorporate additional other data sources such as CERT.We also propose a new query logic to identify relevant vulnerability instances, while excluding possible false positives based on other keywords.The evaluation study of CPS vulnerability and related factors shows an enhanced level of automation in cybersecurity assessments [14].
The main contributions of this paper are outlined as follows: • A novel machine-learning based structure for vulnerability assessment that infers CVSS severity scores of reported vulnerability instances.This proposed technique addresses compatibility issues of CVSS scores using a majority voting system, as part of the proposed machine-learning model.The approach can be customized to accommodate a preferred CVSS version, in order to allow a common computational semantic that improves consistency in vulnerability assessment.• A query generation method that takes system configuration information as input and exports the best matching query tags in the format similar to CPE metadata.• A CPS vulnerability analysis case study that validates the proposed machine-learning based vulnerability-assessment approach.
The rest of this paper is organized as follows: In Section 2, we provide some background and formally state the problem addressed in this paper, followed by Section 3 which discusses vulnerability data sources, standard vulnerability-severity metrics and related vulnerabilityassessment processes used in the CVSS mechanism.In Section 4, we reveal our vulnerability assessment prototype, which correlates existing CVSS scores against other security-alert indicators, as well as reconciles different CVSS versions using some text-mining techniques on a corpus of vulnerability reports.In Section 5, we evaluate our vulnerabilitydiscovery and assessment methodology in CPS contexts using mainly NVD and Shodan [18] through some analysis.In Section 6, we provide some concluding remarks and discuss some future research directions.

Related works
Correlation studies between multiple cybersecurity data sources can combine various perspectives from different stakeholders, to connect multifaceted analysis into broader statistical associations.CVE, NVD, CERT and SecurityFocus are widely used vulnerability-analytics databases for uniquely identified vulnerability recordings.These databases are further correlated to data sources like ExploitDB [19].An example of this is the study carried out by Allodi and Massacci [20] correlates.They correlate NVD to ExploitDB, Symantec AttackSignature and ThreatExplorer.By doing so, they enhance CVSS scoring practice by computing the temporal attributes based on the existence of public proof-of-concept (PoC) exploits.Geer and Roytman [21] also correlate NVD database to ExploitDB and Metasploit [22] to support penetration testers.Fang et al. employ SecurityFocus and NVD to predict the exploitability and exploitation of vulnerabilities, while taking PoCs extracted from ExploitDB as ground truth [13].Rodriguez et al. [23] compare the original release dates of multiple data sources, including NVD, SecurityFocus, ExploitDB and three vendors Cisco, Wireshark and Microsoft.They observe that the vulnerability instances published in NVD are 1-7 days delayed compared to other data sources.
A variety of approaches apply text-mining techniques in industrial blogs like Twitter [24] and security papers.This is exemplified in the work undertaken by Zhu and Dumitras who apply NLP to extract malware detection features from research papers automatically [25].Chen et al. [24], Bullough et al. [26] and Sabottke et al. [27] extract Y. Jiang and Y. Atif vulnerability-related data by crawling Twitter and extracting tweets that contain CVE as a keyword.These works highlight that statistical interpretations of CVE and NVD datasets need to be combined with other live security-related data sources, such as Twitter, the dark web and product vendors across deployed infrastructures, to raise indicators' reliability and precision.However, these works contribute to the wider field of software vulnerability analysis.Correlation studies considering different terminology used in cybersecurity addressing specifically CPS domains are limited.In our method, we extract relevant entities of vulnerable components and vendor information in CVE vulnerability reports, which we map against the Common Platform Enumeration (CPE) [15] as well as vendor websites, in order to generate a dictionary for CPS components and vendors.
The retrieved information from cybersecurity data sources supports further pattern recognition and trend analysis.Using AI techniques, large amounts of such open-source vulnerability data [11] can be analyzed.More specifically, machine-learning techniques like text-mining are applied to automatically classify disclosed vulnerabilities and guide predictive analytics of the security gap.The effectiveness of AI techniques has been illustrated in a study by Bozorgi et al. [28].They employ SVM (referring to Support Vector Machine) to predict timeto-exploit indicators of reported vulnerabilities on the Open Source Vulnerability Database (OSVDB) and CVE.Targeting CVSS base score generation, Gawron et al. [9] apply Neural Networks and Naive Bayes algorithms, while Yamamoto et al. [29] deploy supervised LDA (referring to Latent Dirichlet Allocation) for CVSS metrics classification.Nevertheless, using correlated cybersecurity data sources also raises potential inconsistencies, such as the disparity between scores for the same vulnerability instances [10].One drawback of previous AI-based CVSS computing approaches is that they directly adopt the vulnerability reports and CVSS scores from NVD as training grounds, which may induce a bias in their model.Instead, we correlate vulnerability instances in NVD to corresponding vendor reports and third-party cybersecurity analyzers such as CERT reports to consolidate data sources.Then we integrate relevant information into a unified structure as our training grounds.To be more specific, we use the vulnerability descriptions in NVD as training input.We then apply majority voting on inconsistent scores before using them as training grounds.In doing so, our approach streamlines the computation of vulnerability severity to address such inconsistencies upstream, in order to optimize security investments and to shorten the potential risk window.Based on our previous work [30], we extended CVSS base-score computation experiments to include more vulnerability data sources.This additional experimental study illustrates the capacity of our mechanism to extend to new data sources such as SecurityFocus.By including Shodan database, we also extract more vulnerability instances for our CPS cybersecurity case study.

Background
In this section, we introduce the data sources, the metrics used to calculate the vulnerability severity, and the severity score computing process.

Vulnerability data sources
MITRE Corporation publishes the CVE industry-standard to assign an identifier to each discovered vulnerability.In addition, it maintains a publicly accessible database of all identifiers through CVE Numbering Authorities (CNA) [31].A typical CVE entry includes the following fields: a unique identifier, a brief description of the reported vulnerability, and any pertinent references about the vulnerability.The unique CVE identifier, or CVE ID, is the key that differentiates one security vulnerability from another.In doing so, CVE IDs provide a reliable way of communicating across these different databases to get more information about the reported security flaws.
NVD builds upon the information included in CVE entries to provide an enhanced information for each entry, such as severity scores (calculated based on CVSS standard) and impact ratings.NVD converts the unstructured CVE data into structured JSON (or JavaScript Object Notation) or XML (or Extensible Markup Language) formats [6].As part of its enhanced information, NVD also provides advanced searching features such as by OS, by vendor name, by product name, by version number, and by vulnerability type and severity.Among these extra features, affected product names and versions have matching string entries in CPE entries.Vulnerability category features are provided in Common Weakness Enumeration (CWE) [32] repository, which abstracts the observed faults and flaws into common groups of vulnerabilities with additional information about expected effects, behaviors, and further implementation details.The vulnerability severity score is calculated following the CVSS version 3 and version 2 standards.
SecurityFocus is a widely used vulnerability database and also features a security news portal [13].Even though this database is shut down in January 2021, still its historical reports are applicable to validate our experimental analysis.Besides vulnerability descriptions, SecurityFocus also addresses whether a vulnerability has a PoC exploit.Note that SecurityFocus is not dependent upon CVE data sources [23].Actually, a BugTraq vulnerability report may refer to several CVE vulnerability instances.A statistic analysis by Fang et al. highlight that although the amount of vulnerabilities reported in SecurityFocus is less than the number of vulnerabilities found in NVD, the fraction of exploited vulnerabilities in SecurityFocus (37.008%) is much higher than the proportion in NVD (6.676%) [13].They also observe that the vulnerability reports in SecurityFocus contain higher coverage and more reference significance in predictive cybersecurity analysis, leading to their experiment results where SecurityFocus performs well than NVD under an actual environment.
Industrial Control System CERT (ICS-CERT) [33] is a branch in US-CERT that focuses on control systems' security.The ICS-CERT advisories add further analysis on reported vulnerabilities in CVE, particularly on risk evaluation, affected products, and mitigations such as workarounds or official patches.
In the following example, we present the differences between the aforementioned vulnerability data sources, especially between NVD, SecurityFocus and ICS-CERT.The SecurityFocus historical reports were downloaded in December 2020 before shutting down.The vulnerability report under BugTraq ID 108727 refers to three CVE reports, namely CVE-2019-6580, CVE-2019-6581, and CVE-2019-6582, respectively.NVD assigns CVSS V3 base scores 9.8, 8.8, and 7.7 to these three disclosed vulnerabilities.Yet, ICS-CERT and the vendor Siemens [34] assign the same CVSS V3 base scores to CVE-2019-6581 and CVE-2019-6582, but a different score 8.8 to CVE-2019-6580.The inconsistency of CVE-2019-6580 scores is due to different views on the metric of whether privileges are required to exploit this vulnerability.Here we compare the discussion section in SecurityFocus and the description section of one vulnerability instance CVE-2019-6580 in NVD or CVE.We observe that the summary given by SecurityFocus highlights vulnerability types and potential threats targeting the vulnerability, while NVD emphasizes the affected products and the impact of the vulnerability.
• SecurityFocus discussion: "Siemens Siveillance VMS is prone to multiple authorization-bypass vulnerabilities.Attackers can exploit these issues to bypass certain security restrictions and perform certain unauthorized actions.This may aid in further attacks.Finally, Shodan is a data source mainly targeting CPS or IoT (referring to Internet of Things) security, including SCADA (referring to Supervisory Control and Data Acquisition) [35].CPS and IoT systems include devices like webcams, routers, and servers.Relevant information like ports and vulnerabilities of these devices can be fetched through Shodan website or Shodan API (referring to Application Programming Interface).Interestingly, these are currently internet-connected devices, sending (public) live data from different locations across the World.Unlike NVD, where vulnerability reports are published, Shodan crawls IP addresses, made available on device respective websites and APIs.Returned data from Shodan can be cross-referenced with NVD for vulnerability analysis [36].

Vulnerability severity metrics
The Forum for Incident Response and Security Teams or FIRST initiated the development of the CVSS calculator while reporting cybersecurity incidents.The current CVSS Version 3 follows a sequence of three versions of the CVSS index calculator.Vulnerabilities are first assigned a unique identifier and their severity is rated by combining CVSS property metrics.CVSS score involves three groups of properties.The Base group describe static properties that are not subject to temporal or deployment environments.In contrast, the Temporal and Environmental groups of properties are respectively describing score variations across time or deployment contexts.However, we emphasize base score properties in this research and consider the latest version of CVSS base properties, that is Version 3 or V3.These base properties are further grouped under three classifications, namely exploitability    , scope    , and impact    properties, which we discuss next.

CVSS exploitability property:
This property quantifies the likelihood as well as the effort and intricacy to be invested for exploiting a component that would be exposed to a given vulnerability.Hence, this property combines the following metrics: AttackVector (AV), AttackComplexity (AC), PrivilegesRequired (PR) and UserInteraction (UI).The Attack Vector metric measures the likelihood for an attack scenario targeting the component to occur through this vulnerability.The effort that needs to be invested may vary across these scenarios, quantified as part of the Attack Complexity metric.Along the path of an attack scenario, some credentials or privileges may be required.The level of these requirements is measured by the Privileges Required metric, for an agent with authority to be granted access to the component.And, the level of participation that is expected in order to exploit and compromised the vulnerable component is measured by the User Interaction metric.

CVSS scope property:
The propagation of a vulnerability from a targeted component to eventually grant access to others within an asset configuration is measured by ScopeChange (or S).The Scope metric is used to measure the extent to which other components than the vulnerable one, can be accessed.

CVSS impact property:
This property groups metrics along the (CIA) triad, to quantify the magnitude of potential losses of Confidentiality (C), Integrity (I) and/or Availability (A).Measurements along these metrics categorize the severity levels impacted by the vulnerability as none-(N), low-(L) or high-(H).

Severity score computing process
CVSS combines the above properties to infer vulnerability level rating its severity based on a rule-based algorithm, which is further depicted in Eq. ( 1) that use measurements of Exploitability, Scope and Impact property metrics to generate the Base score of a vulnerability.Considering a component  of a CPS asset , exploitability and impact property measurements are extracted as illustrated in Eqs. ( 2) and (3), separately.To infer a score of a vulnerability  for a component , a function measures the corresponding base properties:   ,   and   , as illustrated by the   function illustrated by Eq. (1).
This can be illustrated briefly by the vulnerability instance CVE-2021-37172 for example.This vulnerability instance affects Siemens PLC (or Programmable Logic-Controller) product running SIMATIC S7-1200 CPU family with firmware version number 4.5.0 (or the vulnerable component), by allowing a threat agent to bypass authentication and download arbitrary programs to this PLC (or the vulnerable CPS asset).This vulnerability has a CVSS version 3 base score of 7.5, which is further composed of an exploitability score of 3.9 as well as an impact score of 3.6.

Discovering vulnerability severity
In this section, we present a ML-based method to discover CVSS scores of reported vulnerability instances with no assigned score.Our proposed approach automatically generates severity scores for vulnerability instances, which decreases the potential of manual errors and requires less effort from human experts.We start with a brief overview of the system structure, followed by a detailed introduction of each element of the vulnerability-severity computing system.

System overview
As illustrated in Fig. 2, we first collect vulnerability data from opensource cybersecurity repositories.Simultaneously, we adopt majority voting techniques [37] to deal with inconsistent scores retrieved from different CVSS scored reports across multiple repository sources.We employ the reconciled scores as the training ground for our proposed ML models, together with vulnerability reports.Then we streamline score prediction by using a ML pipeline that classifies these instances considering various CVSS-metric labels.Meanwhile, CVSS metrics from different CVSS versions are stored in a knowledge base.Thus, the corresponding metric-set is retrieved through the user's query.The same goes for measurements and severity scales.In doing so, one can select any CVSS version to compute the corresponding score and vector for vulnerability instances.Our proposed vulnerability severity computing system contains a series of steps chained together through ML computational cycles.Each integrated ML cycle involves mainly three steps.Step 1 refers to obtaining the data.Step 2 performs data pre-processing to prepare the data for training/testing processes on a machine-learning algorithm.And finally, in Step 3 we output a predicted severity score.By preprocessed data, we mean both the training/testing instances and the classification measurements.Note that training and testing processes are not differentiated in Fig. 2 to facilitate readability.

Data collection
We employ multiple ways to collect vulnerability data from online public repositories, as depicted in Fig. 3.More specifically, NVD data feeds are directly downloaded and stored in a local database in JSON format.The JSON format is an open standard file format used for interchanging data, consisting of human-readable text (i.e., not binary) attributes' value pairs.JSON objects can be nested inside other JSON objects, while each nested object has a unique access path across the tree-like structure.To ensure that the local files and online NVD data feeds are synchronized, we set up a scheduler to perform hourly data retrieval and update the data through an existing Python library APScheduler (referring to Advanced Python Scheduler).We choose the hourly schedule to mirror NVD data considering that the "recent " and "modified" feeds in NVD are updated every two hours, while the rest are updated nightly.Besides NVD, we apply web crawling and web scraping techniques [38] to grasp vulnerability information published in SecurityFocus, ICS CERT and vendor websites.Web crawling refers to the process of browsing and indexing contents from web pages.Examples of relevant built-in Python functions include urllib.requestthat downloads html pages and urllib.errorthat handle exceptions.Web scraping means locating and collecting certain information and are supported by tools like HTML parser Beautiful Soup [39].After fetching, parsing and extracting targeted information, we store the retrieved data in a proper format tailored to the data usage.For instance, we store the data extracted from SecurityFocus in local files with fields like Bugtraq-ID, CVE-ID, title, publish date, affected product, etc., in CSV (or comma-separated values) format.Finally, we query Shodan API to get CPS relevant vulnerabilities.

Majority voting for inconsistent scores
Relying upon NVD scores alone as the model training ground can bring bias in vulnerability assessment [6,7].This is because a small percentage of score records in NVD is assumed to have errors due to the manual scoring process [12].Besides statistical vulnerability patterns mined from CVE reports, other data sources like vendors and thirdparty security analyzers (e.g., ICS CERT and MSRC) provide different perspectives for vulnerability scoring.We set up a majority voting [37] module using Python whereby the score that the majority of data sources ([ 1 , … ,   , … ,   ] where 0⟨ ≤ , ⟩2) in the pipeline agree on is delivered as true score or ground truth score.In the cases where only two score sources are found, or [ 1 ,  2 ], and these two scores are inconsistent, we take the average of these scores. We

Vulnerability severity computing
Cybersecurity data is classified using a pipeline of ML algorithms, in order to fill CVSS score gaps.Using text-mining approaches [11], retrieved vulnerability reports from existing cybersecurity repositories are contrasted against vulnerability descriptors.Subsequently, new vulnerability reports are classified along CVSS-metric property groups, using a ML algorithm, which is trained from a set of historical instances of reported data  .Considering  vulnerability instances from this Algorithm 1 Vulnerability CVSS Base-Score Computing.

5:
Set End For

20:
The resulting predicted score   =   (  ) 21: End procedure data set, (  ,   ) (0 <  ≤ ) represents a mapping between a vulnerability report   and a vector   describing the ground truth employed by the ML algorithm.[N, L, N, N, U, N,  H, N]).
Algorithm 1 shows the base score computation of vulnerability severity.Considering the vulnerability computing illustration shown in Fig. 2, Lines 3-9 in Algorithm 1 represent the procedure for Step1.Lines 10-13 show the process for Step2.And finally, Lines 14-20 unfold the procedure for Step 3. The classes   with  (  ) ( (  ) > 2) amount of measurements, such as  , is simplified into multiple binary classification problems, to differentiate between classes.Assume the employed ML model (e.g.SVM) is  (), multi-class categorization is achieved through a ''one-against-all'' method whereby  (  ) (  ) = arg max The classification of CVSS measurements into class labels calibrates severity scores from property attributes.A high label of AttackComplexity (AC) for example, pertains to the attribute value of 0.44, and 0.77 attribute score pertains to low label.These numerical values are use in the CVSS calculation process.

Evaluation metrics
The contrast between severity predictions and originally labeled ones is used for training and testing the classification performance.We employ Accuracy, Balanced Accuracy metrics [40] as well as F1score [41] to assess this contrast.The performance implication accounts for unbalanced classes, such as AccessVector(AV) classes for example, where Network category has much larger sample size than Physical category, as depicted in Fig. 4.This observation is emphasized in Table 2 and Table 3. AccessVector(AV) classification may involve multi-class associations, whereby micro-average is used to compute the mean of value across class associations.Micro-average differs from macro-average in the sense that micro-average aggregates the weighted contributions of all classes, while macro-average take the average contributions of all classes.And therefore, micro-average is preferable for multi-class categorization problems with class imbalance.The same approach is employed for other multi-class occurrences.Binary classifiers like the one employed for UserInteraction (UI) uses a confusion matrix to infer the balanced-accuracy and F1-score values of the classification.

Performance evaluation
We validate our approach following two evaluation experiments, that use respectively data for retrieved reports in existing repositories and data from crawled websites.We also published codes for ML algorithm implementation and vulnerability data correlation in two Github projects [42,43], to enhance further the reproducibility [44,45] of our proposed methods.Python package pipeline in Scikit-learn library has been used to implement the machine-learning pipeline including features extraction and other data processes.Severity scores from different CVSS versions are thus transformed in a streamlined way.Processing NVD vulnerability reports' data starts from tokenisation and subsequent feature extractions using CountVectorizer [46] and TdidfTransforer [47] utilities.Subsequently, TF-IDF (referring to Term Frequency-Inverse Document Frequency) values are calculated, to generate a TF-IDF matrix from word features.Train_test_split procedure is used to randomly divide data records into training (75%) and testing (25%) datasets, following a random distribution.
Machine learning classifiers classify new vulnerability reports within predicted severity patterns.The results obtained from our case studies use LogisticRegression (LR) classifier, besides a 5-fold stratified crossvalidation applied to the CVSS training dataset to reduce overfitting occurrences.CVSS classifier prediction performances for the testing datasets are illustrated in Table 1.CVSS V3 metric classifications reach an overall higher performance than CVSS V2 counterparts.However, the larger set of metrics offsets the CVSS V3 error rate.
The outcomes assure satisfactory performances when contrasted to closely related CVSS classification researches from Gawron et al. [9] as well as Yamamoto et al. [29].Gawron  The performance of the model proposed by Gawron et al. uses only an accuracy metric, that may not adequately capture unbalanced classification instances.Nevertheless, our accuracy is higher on average.For example, the accuracy for Attack Vector classifier is 90.36% when using only NVD vulnerability entries, or 93.68% when using both NVD and SecurityFocus entries.In comparison, Attack Vector classifier based on Neural Network in [9] has an accuracy of 88.9% on testing data and 80.3% on validation data.The other Attack Vector classifier based on Naive Bayes in [9] achieves an accuracy of 90.8% on testing data and 92.3% on validation data.Yamamoto et al. train their CVSS version 2 classifiers on vulnerability instances disclosed in NVD from 1999 till 2014.They employed several ML algorithms, including Naive Bayes, LDA, SLDA (referring to supervised LDA), and Latent Semantic Indexing.

Security news website
In this validation experiment, we crawl vulnerability reports from SecurityFocus and map the reports to the corresponding CVE indexes.This step is done in December 2020, before SecurityFocus's shut down in January 2021.Yet, our proposed methods are still valid in the aspect that utilizing multiple vulnerability data sources enriches the features and may enhance the performance of the classification models.These external descriptive reports are added as text features together with NVD reports for model training.The results are also listed in Table 1.We observe that by adding more text features, the performance of our CVSS scorer improves.

Cyber-physical systems vulnerability assessment
Vulnerabilities in CPS infrastructure are assessed to enumerate and rank their severity to prevent threat-induced anomalies, or intrusion attempts [1,2].Here we present a vulnerability analysis case study of several prominent CPS components.This case study is composed of three main steps.First, we query and filter CPS relevant vulnerabilities from online cybersecurity data sources.Then, we compute the CVSS V3 base scores and corresponding vectors for retrieved vulnerability instances.Finally, we perform an analysis to explore the statistical patterns of existing CPS vulnerabilities.

CPS taxonomy
Jang-Jaccard, Julian, and Surya Nepal propose three categories of vulnerabilities, namely hardware, software, and network infrastructure and protocol vulnerabilities [48].Hardware derived vulnerabilities are mostly seen in the form of unauthentic or illegal hardware clones.One example of hardware vulnerability is no physical-access protection which an attacker might exploit to gain unauthorized physical access.And hence, exploiting hardware base vulnerabilities enables the threat agents to access or alter physical elements of a computer server (e.g., a hard drive) or a network (e.g., a router).Software oriented vulnerabilities exist in system firmware or application software.An outdated software with flaws in source code might be exploited by a bypass threat that is further materialized by a code-injection attack triggered by malicious actors.Network infrastructure and protocol vulnerabilities frequently appear in network protocols such as TCP (referring to transmission control protocol).
Considering the nature of CPS, we define a CPS as composed of software (e.g., firmware, toolset, software library, etc.) and hardware (e.g., a hard drive).Note that software further subsumes operating system (OS) (e.g., a Windows system).OS functionally manages software components and acts as an interface between application software and hardware.For example, a buffer overflow attack might exploit an OS that contains resource management error.It may trigger further Denial of Service (DoS) and result in loss of control of this OS.Once an OS is shut down, the application software components are deactivated.Hardware, software, and OS are assembled and used in different ways within CPS fabrics, creating various binaries with potential backdoors [1,2].Software is embedded in hardware and thus relies on this hardware component's electricity supply and CPU.Hardware-dependent software is one such example [49].Meanwhile, software monitors, controls, and actuates hardware components.Furthermore, CPS relies on a proper network connection to transfer data and complete feedback loops.
We evaluate our streamlined vulnerability-severity scoring mechanism through vulnerability analysis practices on several prominent CPSs such as PLCs, RTUs (or Remote Terminal Units), MTUs (or Master Terminal Units) and HMIs (or Human Machine Interfaces).A PLC is a crucial CPS asset that controls industrial devices to keep production processes in order.A RTU transmits telemetry data from sensing devices that are associated with physical power components to a MTU system.Finally, a HMI is either a standalone device or embedded communication interface to visualize and monitor MTU activities and RTU information flow [2].

CPS vulnerability filter
Using a Python script, we retrieve CPS-relevant vulnerability instances from multiple online cybersecurity data sources, including NVD, vendor websites, ICS CERT, and SecurityFocus.In addition, we employ Shodan query APIs to obtain vulnerability instances.The retrieval workflow is illustrated in Fig. 3. Following this vulnerability retrieval workflow, we further added a query-keywords generator and a CPS vulnerability filter, as illustrated in Fig. 5.
Using the proposed retrieval and filter workflow, we extracted vulnerable component entities from CVE vulnerability reports using an open-source NER (referring to Named Entity Recognition) model [50], as illustrated in Fig. 5.We then map these retrieved entities with the CPE as well as vendor websites to generate a list of terms related to these components.We retrieved vendor information for each extracted CPS vulnerability instance using NER and correlated against the CPE database.To do so, we obtained vendor HTML links from CVE reference maps, based on which we crawl the vendor websites fetching CPS vulnerability related data.This step aims at reconciling potential inconsistent product names.Finally, these terms are combined with the corresponding component versions of interest, that are then used as tags to query vulnerability instances from public vulnerability data sources.We also conducted manual checks on CPS-related vendor metadata, and optimized our search engine outcomes to detect hidden metadata for each vendor, in order to decrease possible false negatives.
More specifically, the query generating process is composed of three major steps and presented in Fig. 6.Note that we only show the processing details for CPE to ensure readability, although we process data extracted from CPE, as well as NVD and vendor reports.• In the first step, we parse the metadata from CPE, NVD vulnerability report and vendor reports, and extract vendor (e.g., "microsoft "), product (e.g., "codeql") and version (e.g., "1.0.0") information from these metadata.We canonize these extracted items into one entity as "Vendor Product Version"(e.g., "microsoft codeql 1.0.0"),which results in a dictionary of 816875 such entities.We further generate a dictionary using a shortened metadata as key, and then using our generated entity as value, which is stored in ProductDB shown in Fig. 5. • In the second step, we generate a list of query tags ranked by matching similarities.To start with, we canonize system configuration information that is usually a result of system scan into "Vendor Product Version"(e.g., "vmware woskstation player 15.0.3")value pairs.We use the vendor information (e.g., "vmware") to filter out entities in ProductDB from other vendors, then we generate an initial query tag list selected from the remaining entities if they partially share the tokens of software and version information (e.g."woskstation player 15.0.3").Subsequently, we measure the similarities between the system information string (e.g., "vmware woskstation player 15.0.3") and strings in the initialized query tag list.By doing so, we generate a new dictionary using CPE metadata as key and similarity as value (e.g., 'cpe:2.3:a:vmware:workstation_player:15.0.3': 100).We rank this query tag list from higher similarity to lower similarity, and send out the first five (can be customized to other numbers) query tags as results.We summarize this query generation process in Algorithm 2. In our approach, we compute the Levenshtein distance to calculate the difference between two strings, and instantiated our method by utilizing the python package fuzzy.ratiofrom [51].
The Levenshtein distance refers to the minimum number of the required single-character editing to change one string into the other [52].• In the last step, we allow manual check and query selection to decrease possible false positives based on other keywords that distinguish them from CPS-related concepts.If we adopt one of the query tags and use CPE-based query, the correlated database would return vulnerability instances that share the same CPE metadata.If we find that all the generated query tags are not correct, we switch to report-or vendor-based query and retrieve reports that contain the system configuration information string.

CPS vulnerabilities
We first investigate in Shodan database to extract product names, versions and vendors of industrial PLC, RTU, MTU and HMI equipments.The reason we started with Shodan investigation is that Shodan contains open ports of connected ICS devices nearly in real time.It also covers the most commonly used CPS-based CI equipments, and therefore provides actual device names, versions and vendors for our case study analysis.For example, using PLC as the query tag, we gather products like Mitsubishi Q PLC.We use these 4 lists of CI product features as input for our query generator to generate queries for our correlated database.
By querying NVD, we obtain respectively 257, 445, 107, and 258 vulnerability reports related to PLC, RTU, MTU and HMI.These retrieved 1067 CPS related vulnerabilities extend till November 3, 2021.Note that some vulnerabilities appear in more than one type of CPS components.One example is the vulnerability instance CVE-2019-0708 appearing in both PLC and HMI vulnerability groups.We removed duplicated vulnerabilities and kept 870 instances in the analysis corpus when we need to assess general CPS vulnerability features.
We further analyze these identified CPS vulnerabilities to get their CVSS V2 and V3 scores assigned by NVD.All of these CPS vulnerabilities are assigned CVSS V2 scores and relevant labels like V2 access vector.In contrast, 319 (71.69%)RTU vulnerabilities, 121 (47.08%)PLC vulnerabilities, 47 (43.93%)MTU vulnerabilities, and 121 (46.90%)HMI vulnerabilities are not assigned CVSS V3 scores.We conduct an investigation of the CVSS V2 labels assigned by NVD to our retrieved CPS vulnerabilities.Table 2 lists the exploitability and impact distributions of these vulnerability instances under CVSS V2 metrics.Vulnerabilities exist in the four types of CPS show similar distributions in terms of access vector, access complexity, authentication, and availability impact.There is a higher probability that exploiting PLC vulnerabilities may bring lower confidentiality impact, but higher availability impact.Generally, CPS vulnerabilities have higher exploitability compared to the overall reported vulnerabilities, especially in terms of required authentication and complexity of such exploits.
Vulnerability in endpoint communication of CPE devices may expose the critical data to unauthorized threat actors, and can be exploited by attacks like MiTM attacks.One such example is unencrypted protocol.This vulnerability allows spoofing attacks to happen against the Modbus communication between the PLC controller and the EcoStruxure software in the engineering workstation.Another common weakness in CPS devices is improper memory access control that allow read or write operations of memory locations, which may cause out-of-bounds read and/or write.One example of such vulnerabilities is CVE-2020-15782 that exemplifies weakness of improper operation restrictions within the bounds of a memory buffer (or cwe119).This vulnerability has been identified in a list of Siemens SIMATIC firmware, which allows attackers with network access and download rights to a PLC to bypass existing protections in the PLC, such as PLC sandbox, and obtain read-write memory access remotely while staying undetected.A PLC sandbox refers to a protected area of memory where engineering code could run.
We distinguish some specific CPS manufacturers or vendor vulnerabilities reported by Schneider Electric SE, Siemens AG, and Mitsubishi.More specifically, we discovered 29 vulnerabilities from Schneider Electric SE products and 39 vulnerabilities from Siemens AG products.We also identified 12 vulnerabilities from Mitsubishi.We also observe some frequently published products that are affected by CI vulnerabilities.
One typical example is OpenSSL that appears in 120 CPS vulnerability instances.OpenSSL [53] is a library implementing the SSL/TLS protocol.SSL (referring to secure sockets layer) is the old name of TLS (referring to transport layer security).We also found 51 vulnerability instances related to Simatic PLCs and HMIs developed by Siemens AG.

Characteristics analysis of CPS vulnerabilities
As we discussed earlier, more than 57% of extracted CPS vulnerability instances are not scored under the CVSS V3 mechanism.The scoring system shown in Section 4 is used to compute scores for these vulnerabilities, in order to bridge the gap of missing CVSS V3 information.We also calculate CVSS V3 scores for the vulnerabilities with inconsistent scores assigned.We design this re-computation step considering two factors, (i) CVSS V3 is only applied to vulnerabilities disclosed within and after 2015 in some data sources like NVD, and (ii) inconsistent scores are provided by multiple score sources.Subsequently, the diversity of their sub-scores is inspected to reflect CVSS V3 metric scores through property vectors evaluations.Exploitability, Scope and Impact base metric attributes for CPS vulnerabilities are contrasted against actual values and illustrated in Table 3 Exploitability property attributes of CPS vulnerability contrast with CVE counterparts in average, showing that a significant amount (90.48%) of attacks originate from Network-based sources, particularly for RTU vulnerabilities.There are limited occurrences of adjacent network-based attacks against CPS.However, local attacks occur more frequently in CPS.A large amount of CPS vulnerabilities (98.72%) are prone to exploitability by malicious actors without privilege or user interaction.Change of scope is observed in 7.69% instances of CPS vulnerabilities, resulting in severe consequences.A higher diversity among possible impact values is observed compared to exploitability and scope property attributes.Confidentiality and availability are more impacted than the integrity of vulnerable CPS components.Nevertheless, impact of CPS vulnerabilities show polarization distributions when using CVSS V3 as assessment metrics, which is the opposite when using CVSS V2 as metrics.Impact of CPS vulnerabilities are mostly none or partial under CVSS V2 mechanisms.In contrast, CVSS V3 suggests that CPS vulnerabilities exploitation result in either low or high compact.

Conclusion and future works
Discovering and evaluating vulnerabilities in CPS networks are both crucial and challenging processes.We proposed to raise the efficiency of vulnerability-severity scoring systems following CVSS standards, to rate the severity of a reported vulnerability instance.Our approach reconciles inconsistent vulnerability severity scores that are contributed from different cybersecurity analysers, and also decrease potential conflicts resulting from various CVSS mechanisms.We employed majority voting technique to decide the score of inconsistent reports for the same vulnerabilities in different cybersecurity repositories.We then used these compatible vulnerability instances as ground truth to train a machine-learning model as a scoring basis.The performance of the proposed model is shown to obtain high accuracy and micro F1-score thresholds compared to similar studies.A case study involving CPS vulnerability reports from multiple repositories is illustrated to validate the proposed vulnerability assessment model.A query-filter logic is used to customize retrieved vulnerability instances.The outcomes are contrasted against reported CVE instances to further analyze the characteristics of CPS vulnerabilities.The results of our case study also indicate that vulnerability patterns are diverse when relying on different cybersecurity data sources, which may mislead cybersecurity decision making in the perspective of patch prioritization or budget allocation.And hence, a vulnerability analysis approach that correlates multiple data sources is necessary to enhance further cybersecurity awareness.
The proposed research can be further extended by adjusting the majority voting tie while involving experts' supervision in the assessment loop.This approach includes security experts to provide some startup settings, along with computational intelligence techniques to adjust these settings dynamically.Another possible future direction involves arithmetic means of different scores to weigh several sources, to evaluate the reliability of scores provided from these sources.Finally, we plan to investigate the correlations between vulnerability severity with the attack surface of the system on which the vulnerability assessment is applied.This last planned work is also closely related to the environmental property of CVSS metrics.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Fig. 1 .
Fig. 1.Potential time delay of scoring and inconsistent scores.
give an example using the vulnerability instance CVE-2018-7791 for which a CVSS V3 base-score of 9.8 is assigned by NVD and vendor Schneider Electric with the vector AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/ A:H.Nevertheless, ICS CERT assigns this vulnerability with a score of 7.7 with the vector AV:N/AC:H/PR:N/UI:N/S:U/C:H/I:H/A:L.Similarly, the inconsistence comes as a result of different measurement for attack complexity.Using our majority voting approach, we choose a final score of 9.8 as the true score.Another example of score inconsistencies is the vulnerability instance CVE-2014-0754 which is assigned a CVSS V2 base-score 10.0 by NVD, VulDB, as well as ICS CERT with a vector AV:N/AC:L/Au:N/C:C/I:C/A:C.Note that the first three CVSS V2 metrics differ from CVSS V3 metrics.AV means Access Vector, which is equivalent to the Attack Vector in CVSS V3.AC refers to Access Complexity, which is equivalent to the Attack Complexity in CVSS V3.Au denotes Authentication which measures the number of times an attacker needs to authenticate oneself to the targeted component in order to exploit the vulnerability.Yet, a different score 9.3 is assigned by the vendor Schneider Electric with a vector AV:N/AC:M/Au:N/C:C/I:C/A:C.The inconsistency occurs due to different measurements for Access Complexity of this instance, whereby Schneider Electric assigns medium complexity, while the other three parties assign low complexity.Using our majority voting approach, we choose a final score of 10.0 as a true score.
et al. apply Naive Bayes and Neural Networks algorithms onto CVE vulnerability reports published before and within 2016 to train CVSS version 3 classifiers.Their training dataset is adjusted to uneven the influence from data imbalance.

Table 1
Evaluation of CVSS categorization.
M580 PLCs from Schneider Electric.Vulnerable Modicon PLCs employ Schneider Electric UMAS protocol that operates over the Modbus protocol which lacks encryption and proper authentication mechanisms.

Table 3
. CPS component attributes are evaluated individually in Columns 3-6 (or Columns PLC, RTU, MTU, HMI), and averaged in Column 7 (or Y. Jiang and Y. Atif CPS vulnerability measurements distribution of CVSS version 3 base metrics.Column 8 (or Column CVE) shows the overall rate of published CVE reports that have assigned CVSS V3 scores, by dividing the vulnerabilities with certain labeled measurement (e.g., Network) against all the disclosed vulnerabilities till November 3, 2021.By doing so, we show how the significant characteristics of CPS vulnerabilities diverge when considering different cybersecurity data sources.