**Note: ***this feature is only available for the Professional Analytics licence*

## What is Key Driver Analysis?

Key Driver Analysis is a powerful method to understand what motivates the decision-making of your respondents. It is typically used to gain insights into which factors are the most important in relation to the selected outcome metric, such as: NPS, Customer satisfaction score, Purchase Intent, etc.

Example: Company X is continually measuring it’s NPS score throughout various touchpoints with their clients. They can use Key Driver Analysis to analyze and understand which of the underlying aspects of their business have the most significant impact on the changes in their NPS.

The* Drivers* are the factors that drive the specified outcome. In the NPS example above, you can additionally ask your respondents to rate different aspects of your business, such as:

- Pricing competitiveness.
- Product quality.
- Product range.
- Customer support.
- Personalized approach.

By using Key Driver Analysis you can identify which of those factors have the biggest impact when it comes to the company’s NPS.

The identification of key drivers allows organizations to focus their efforts and resources on the factors that have the most substantial impact on achieving their goals.

## How to read the results?

For Key Driver Analysis results interpretation, first you need to identify:

**Outcome metric**(dependent variable): what is the outcome that we want to optimize (e.g. NPS)**Predictors**(independent variables). Which factors do we analyze against that outcome metric.

All selected predictors are assessed on the following 2 dimensions:

**Importance****(Y-axis)**: how big is the relative impact on the outcome variable (e.g. NPS).**Performance****(X-axis)**: how high is the respondents’ score for this variable (normalized mean).

The **further to the right** the variable is on the chart, the **more satisfied** the respondents are with that aspect, according to survey results.

The **higher up** a variable appears on the chart, the **stronger the impact** of that variable is on the outcome metric (NPS).

Always consider the **Model Accuracy Score** before using the analysis results. As a general rule, the higher the model accuracy the better. **It is not advised to make business decisions based on the results if the model accuracy is below 50%. **Read more in-depth information in “Model evaluation metrics” section below.

You can notice **4 quadrants** on the chart, created with reference lines based on the Importance and Performance score mean values:

**Key weaknesses**: usually the most important quadrant to focus on. The**top left**quadrant contains drivers that are**important but poorly rated**. In the example above, you can find*individual approach*in that quadrant, meaning that this aspect has a big impact on the NPS score, however, it is currently poorly rated by your respondents. Based on this analysis, it might be a good idea to put business priority on improving the individual approach of your services.**Key strengths**: the**top right**quadrant contains aspects that are**important and highly rated**. This indicates that those key drivers are already performing well and have a positive important effect on the outcome metric (NPS). In the example below*pricing*is positively contributing to the company´s NPS.**Unimportant weaknesses**: the**bottom left**quadrant contains aspects that are**not important and poorly rated.**Although respondents rank these drivers poorly, they are not an important factor in determining the company´s NPS. In the example below it would not be recommended to invest more in the*product range*or*customer support.***Unimportant strengths**: the**bottom right**quadrant contains aspects that are**not important but highly rated**. Although these drivers received a good rating, they do not influence the outcome metric (NPS). In the example below it is not advised for the company to invest more in*product quality*.

## How to use the Key Driver Analysis method in Survalyzer?

The minimum required number of interviews for the calculation to work is 50. KDA calculation will not be started if the number of interviews is lower.

### 1. Start Key Driver Analysis

You can start the Key Driver Analysis in your Professional Analytics report by navigating to: (1) the *Tables *tab, (2) the *Add content *button, (3) the *Methods* section.

This will start the wizard that will guide you through all the steps.

### 2. Outcome Variable

First, you start by selecting the **Outcome Variable** (dependent variable). This is the primary outcome or metric of interest that is influenced by various independent variables.

**Only number based variables (integer and real)** **can be used as the Outcome variable.**

In this example we will use NPS as the outcome variable:

### 3. Predictor Variables

After having determined the Outcome Variable, you can select multiple **Predictor Variables **(independent variables).

The Independent variables are the factors or predictors that are examined to asses their impact on the dependent variable (NPS in this example), aiming to identify the key drivers that influence the observed outcome.

The predictors will be assessed on two dimensions:

**Importance**: how high is the relative impact on the outcome.**Performance**: how high is the respondents’ score for this variable.

Please note that all selected predictor variables:

- should have the same answer value meaning: e.g., the higher the value, the better the score.
- should have a scale with equal distance between each possible variable value
- don’t have to have the same scale. Scales are normalized and you can use predictor variables with different scales.
- minimum of 2 and a maximum of 25 predictor variables need to be selected.

After selection, the desired labels (text displayed in the table and chart) can be set per each predictor. You can copy-paste labels in batch from Excel or Word for faster input.

**Note**: if you select a variable without a defined range within the survey itself, you need to manually set a minimum and maximum possible value for that variable before you can continue:

### 4. Method (calculation algorithm)

Next, you can select one of 6 available methods (calculation algorithms):

The Fast Tree method should provide good results for common Key Driver Analysis scenarios. However, depending on the character of your data, you may achieve better results (higher model accuracy) by testing and selecting the best calculation algorithm for your case.

Read more about available methods in the section below.

### 5. Filters

In the next wizard step, **filters **can be applied. For instance, you can determine that only interviews with the interview state ‘Completed’ are included.

In order to execute Key Driver Analysis, default invisible filters are always applied, that only include interviews for which all outcome and predictor variables contain values (so values are not empty or N/A).

Bear in mind that **the minimum required number of interviews for analysis is 50**, so the number of filtered records that is shown at the bottom of your screen should be at least 50 to perform the analysis.

### 6. Results

After a brief moment the results are calculated and shown. Please note, that these are actual final results for your selected dataset. You can go back and adjust your analysis if needed to compare the results.

If you want to learn more about results interpretation, you can find an explanation regarding model metrics in the section below.

### 7. Add chart & table to the report

After completing the wizard and clicking on ‘Save chart and table’, you will be navigated to the *Charts *tab, where you can make adjustments to the chart if desired:

The minimum required number of interviews for the calculation to work is 50. KDA calculation will not be started if the number of interviews is lower.

If your top50 interviews includes missing or N/A values for the selected variables, the calculation won’t be started in the Report Builder top50 view. You need to change to “All data” view or view it in the public report.

When you are satifisfied with how the data is displayed proceed and:

- Navigate to the
*Report pages*tab. - Navigate to
*Add content*. - Click on the created chart to add it to your report.
- Click on the created table to add it to your report (we recommend to add it to a layout row below the chart)

## Key Driver Analysis: details explained

### Performance score calculation

Performance score (X-axis) is a normalized mean value of that variable. In simple terms, it is a mean value converted to percentage value, ranging from 0 to 100%. **Normalizing allows you to compare variables with different scales.**

Performance score calculation = (Mean value – Min value) / (Max value – Min value).

Examples:

The respondents rank your company’s pricing on a scale 0 – 10 and the average is 7.8. Performance score is 78 (%).

The product quality is ranked on a 1 – 5 scale and the average rating is a 3.8. Performance score is 70 (%).

### Importance score calculation

Importance score calculation comes from ml.net machine learning regression model. It is a result of a two-step approach:

- Machine learning model is built with regression algorithm to predict the value of the outcome metric from a set of related features (predictor variables). Regression algorithms model the dependency of the outcome metric on its related features to determine how the outcome will change as the values of the features are varied.
- Permutation Feature Importance (PFI) technique is used to to assess the importance of individual features (variables) in a predictive model. At a high level, the way it works is by randomly shuffling data one feature at a time for the entire dataset and calculating how much the performance metric of interest (e.g. R-squared) decreases. The larger the change, the more important that feature is.

This two-step approach is used to avoid multicollinearity bias that might occur in pure regression analysis.

Please note that Importance score is relative. It is related to the specific model created and may differ with any change to the model (e.g. adding / removing predictors, changing selected algorithm, etc.)

Further reading:

**Methods (calculation algorithms) available**

FastTree method should provide good results for common Key Driver Analysis scenarios. However, depending on the character of your data, you may achieve better results by testing and selecting the best calculation algorithm for your case.

Method | Description |
---|---|

FastTree(default) | A decision tree-based algorithm that is used for both regression and classification problems. It is known for its speed and accuracy, and is particularly useful when working with large datasets. It is a good choice when you need to train a model quickly and don’t have a lot of time to spend on feature engineering |

FastForest | A random forest-based algorithm that is used for regression problems. It is similar to FastTree, but instead of building a single decision tree, it builds an ensemble of trees and averages their predictions. This helps to reduce overfitting and improve the accuracy of the model. It is a good choice when you have a large dataset and want to avoid overfitting |

OLS (Ordinary Least Squares) | A linear regression algorithm that is used to model the relationship between a dependent variable and one or more independent variables. It is a simple and widely used algorithm that is easy to interpret. It is a good choice when you have a small dataset and want to model a linear relationship between variables |

SDCA (Stochastic Dual Coordinate Ascent) | A linear regression algorithm that is used to solve large-scale optimization problems. It is particularly useful when working with sparse datasets, as it can handle large numbers of features. It is a good choice when you have a large dataset with many features and want to optimize for sparsity |

L-BFGS Poisson | A linear regression algorithm that is used to model count data. It is based on the Poisson distribution, which is commonly used to model count data. It is a good choice when you have count data and want to model the relationship between the count data and one or more independent variables |

Online Gradient Descent | A linear regression algorithm that is used to model the relationship between a dependent variable and one or more independent variables. It is particularly useful when working with large datasets that cannot fit into memory, as it can update the model parameters incrementally. It is a good choice when you have a large dataset and want to train a model in an online setting |

Further reading: How to choose an ML.NET algorithm – ML.NET

### Model evaluation metrics

Metric | Description |
---|---|

Model accuracy | In our case, it is R-squared converted to percentage value. |

R ^{2}(R-squared) | Represents the predictive power of the model as a value between -inf and 1. As a general rule, the closer to 1, the better quality. You should consider models of R^{2} above 0.5. Negative R^{2} means that the model is worse than random predictions, should be disregarded entirely, and may also indicate a problem with the input data. However, sometimes low R-squared values (such as 0.5) can also be entirely normal for certain research cases and very high R-squared values (such as 0.99) are not always good. See further reading link below. |

MAE(Mean Absolute Error) | Measures how close the predictions are to the actual outcomes. It is the average of all the model errors, where model error is the absolute distance between the predicted outcome value and the correct outcome value. The closer to 0.00, the better quality. |

MSE(Mean Squared Error) | Tells you how close a regression line is to a set of test data values by taking the distances from the points to the regression line and squaring them. The squaring gives more weight to larger differences. It is always non-negative, and values closer to 0.00 are better. |

RMSE(Root Mean Squared Error) | RMSE is the square root of MSE. Has the same units as the predicted outcome, similar to the MAE though giving more weight to larger differences. It is always non-negative, and values closer to 0.00 are better. |

Further reading: Evaluation metrics for Regression and Recommendation