Prioritizing Risk Assessment with Security Scores

Exploit Prediction Scoring System (EPSS) and Common Vulnerability Scoring System (CVSS) are two widely used scoring systems in cybersecurity, each answering a different question about vulnerabilities. EPSS tells you how likely a vulnerability is to be exploited, while CVSS tells you how bad a vulnerability is. Used together, they provide a more complete picture for vulnerability management and remediation prioritization.

The EPSS is a model to measure the exploitability of a vulnerability. In terms of Risk, a product of Threat, Vulnerability, and Impact, the EPSS predicts the amount of threat a particular vulnerability poses. The EPSS model provides a probability score that ranges between 0 and 1. The higher the score, the greater the likelihood of the vulnerability being exploited. Therefore, it provides a critical metric that can be used to make informed decisions regarding vulnerability management and resource allocation for remediation.

CVSS provides a standardized way to measure the technical severity of a vulnerability. It evaluates factors such as how easy the vulnerability is to exploit and the potential impact on confidentiality, integrity, and availability if it is exploited. The result is a numerical score from 0.0 to 10.0, often grouped into qualitative ratings like Low, Medium, High, or Critical. CVSS helps organizations understand how serious a vulnerability could be, but it does not indicate how likely it is to be exploited in the real world.

How does EPSS work?

EPSS uses a machine learning model[2] to estimate the probability of a CVE being exploited in the next 30 days. The deep dive into the model and its design is mentioned in the research paper published about EPSS. For a high-level overview, it achieves this in a series of steps:

Step 1: Gathering Vulnerability information

EPSS collects data from multiple vulnerability databases to answer the question “How easily can a vulnerability be exploited?” The more variety of data collected, the better the results. It gathers an expanding list of metrics:

Vendor (CPE, via NVD)
Age of the vulnerability (Days since CVE published in MITRE CVE list)
References with categorical labels defining their content (MITRE CVE List, NVD)
Normalized multiword expressions extracted from the description of the vulnerability (MITRE CVE List)
Weakness in the vulnerability (CWE, via NVD)
CVSS metrics (base vector from CVSS 3.x, via NVD)
CVE is listed/discussed on a list or website (CISA KEV, Google Project Zero, Trend Micro’s Zero Day Initiative (ZDI), with more being added)
Publicly available exploit code (Exploit-DB, GitHub, MetaSploit)
Offensive security tools and scanners: Intrigue, sn1per, jaeles, nuclei

All of this information will be useful in estimating the exploitation likelihood of a CVE and allow them to gauge the ever-evolving threat landscape.

Step 2: Monitoring Exploitation Activity

Observing the exploitation activity of a CVE and feeding it back to the model trains the model to adapt and learn quickly. Timing of the activity is critical since the threat landscape can vary extremely based on this information. Public exploits can either raise awareness in the attackers leading to more exploitation or make them cautious about exploiting the CVE further as organizations may prioritize remediating it.

Step 3: Modeling and Prediction

The model and exact workings are laid out in excruciating detail in the research paper[2]. The crux of it is that the model is trained to identify patterns between the vulnerability information and the exploitation activity gathered in the previous steps and use these learnings to predict the exploitability of a new vulnerability. The model is currently trained on 14 months of data with 12 months of data being the training set and the remaining 2 months as the testing set.

Step 4: Tweaking for Performance

The performance of the model is measured by collecting the following data:

Estimate of future events: How likely it is for a vulnerability to be exploited (and therefore prioritized for remediation)?
The actual events: How often was the vulnerability exploited in the wild?

Like any machine learning model, the following are the key metrics to track when optimizing the performance of the model:

True Positives (TP): Prioritized vulnerabilities that were exploited in the wild (correctly prioritized).
False Positives (FP): Prioritized vulnerabilities that were not exploited (incorrectly prioritized).
False Negatives (FN): Vulnerabilities that were not prioritized but exploited (incorrectly delayed).
True Negatives (TN): Vulnerabilities that were neither prioritized nor exploited (correctly delayed).

An organization can set an EPSS threshold that can decide whether a vulnerability must be prioritized for patching. Say an EPSS of 10% is the threshold set, this means that any vulnerability that is predicted to have a probability score for exploitation greater than 0.1 will be prioritized for patching. The EPSS model proposes simpler and relevant metrics an organization can keep track of over time:

1. Efficiency

Also known as Precision in machine learning terminology.
Percent of prioritized vulnerabilities that were exploited.
This considers how efficiently resources were spent.
Calculated as TP/(TP + FP).

2. Coverage

Also known as Recall in machine learning terminology
Percent of exploited vulnerabilities that were prioritized for remediation.
This provides an estimate of how secure the current remediation strategy is.
Calculated as TP/(TP + FN).

3. Effort

Measured by the proportion of vulnerabilities being prioritized by the organization.
Heavily depends on the time and resources (staffing/budget) available for the organization.

CVSS

The Common Vulnerability Scoring System (CVSS), is the latest evolution of the industry-standard framework for measuring technical severity of vulnerabilities. CVSS produces a numerical score from 0.0 to 10.0 that reflects how damaging a vulnerability could be if exploited, based on standardized, transparent criteria.

CVSS v4.0 introduces a clearer and more flexible structure by organizing metrics into Base, Threat, Environmental, and Supplemental groups. The Base metrics describe inherent exploitability and impact, while Threat metrics incorporate real-world exploit maturity. Environmental metrics allow organizations to tailor scores to their specific environments, and Supplemental metrics provide additional context such as safety without changing the numeric score.

One of the most important changes in v4.0 is the refinement of exploitability and impact modeling. New metrics like Attack Requirements and improved User Interaction values better capture modern attack conditions. The former “Scope” concept has been replaced with more explicit distinctions between impacts to the vulnerable system and downstream systems, improving accuracy and interpretability.

CVSS v4.0 emphasizes that severity scores should not be used in isolation. The specification encourages clear labeling of which metric groups are applied and reinforces that CVSS measures severity—not likelihood or priority. As a result, CVSS works best when paired with additional data sources, such as EPSS, threat intelligence, and asset criticality, to guide real-world remediation decisions.

Components of CVSS

CVSS comprises several metrics that collectively determine the severity score of a vulnerability. These metrics are divided into three groups: Base, Temporal, and Environmental.

Base Metrics

Base Score: This metric represents the intrinsic characteristics of a vulnerability and its potential impact if exploited. It is calculated based on several sub-metrics:

Attack Vector: Describes how an attacker can exploit the vulnerability, such as locally (L), adjacent network (A), or network (N).
Attack Complexity: Reflects the level of complexity required for a successful exploitation, ranging from low (L) to high (H).
Privileges Required: Indicates the privileges an attacker must possess to exploit the vulnerability, such as none (N), low (L), or high (H).
User Interaction: Represents whether user interaction is required to exploit the vulnerability, categorized as none (N) or required (R).
Scope: Determines whether the vulnerability impacts the confidentiality, integrity, or availability of the system.
Confidentiality, Integrity, and Availability Impact: These metrics assess the potential impact of the vulnerability on each of these security attributes, ranging from none (N) to high (H).

Temporal Metrics

Temporal Score: Reflects the characteristics of a vulnerability that may change over time, such as the availability of exploit code or patches.
Exploit Code Maturity: Evaluates the maturity level of available exploit code, ranging from not defined (X) to functional (F).
Remediation Level: Represents the availability of remediation measures, such as an official fix or workaround.
Report Confidence: Reflects the confidence level in the accuracy and reliability of vulnerability reports.

Environmental Metrics

Environmental Score: Customizes the base score to reflect the impact of the vulnerability on a specific environment.
Confidentiality, Integrity, and Availability Requirements: Defines the importance of these security attributes to the affected organization.
Modified Impact: Adjusts the impact metrics based on the organization’s security requirements and configurations.

Calculating CVSS

CVSS scores are calculated using mathematical formulas that combine the values of various metrics within each group. The resulting score ranges from 0.0 to 10.0, with higher scores indicating greater severity.

CVSS scores help organizations prioritize vulnerability remediation efforts based on the potential impact and exploitability of vulnerabilities. For example, vulnerabilities with scores closer to 10.0 are considered more critical and require immediate attention, while those with lower scores may be addressed with less urgency.

Comparing EPSS and CVSS

Conclusion

Together, EPSS and CVSS provide a balanced view of vulnerability risk, combining technical severity with real-world exploitation likelihood. Using both scores enables organizations to move from theoretical risk assessment to practical, threat-informed prioritization. At Pervaziv AI, we have incorporated both these scores into many of our risk assessment workflows and we continue to evaluate the efficacy of these scores as we explore further.