Linear (and other types of) regression are often used in what is referred to as ‘driver modelling’ in customer satisfaction studies. The goal of such research is often to determine the relative importance of various sub-components of the product or service in terms of predicting and explaining overall satisfaction. Driver modelling can also be used to determine the drivers of value, likelihood to recommend, etc. A common problem is that the independent variables are correlated, making it diffcult to get a good estimate of the importance of the ‘drivers’. This problem is well known under conditions of severe multicollinearity, and alternatives like the Shapley-value approach have been proposed to mitigate this issue. This paper shows that Shapley-value may even have benefts in conditions of mild collinearity. The study compares linear regression, random forests and gradient boosting with the Shapley-value approach to regression and shows that the results are more consistent with bivariate correlations. However, Shapley-value regression does result in a small decrease in k-fold validation results.
QC 20211215