Explainable Machine Learning: Approximating Shapley Values for Dependent Predictors

Kasperek, Jan

Explainable Machine Learning: Approximating Shapley Values for Dependent Predictors

Auteur(s)

Kasperek, Jan

Institut de statistique

Editeur(s)

Matei, Alina

Institut de statistique

Date de parution

2024

Nombre de page

61

Résumé

Modern Machine Learning algorithms often outperform classical statistical methods in predictive accuracy. This comes at the expense of model interpretability. As businesses and institutions increasingly rely on Machine Learning to support and automate decision making processes to reap the benefits of more accurate predictions, explaining these model outputs becomes more important. A universally applicable approach to explaining such complex models is based on the Shapley value, a concept originating from game theory. However, its calculation is very computer-intensive, so approximations have to be used. The state-of-the-art approach, Kernel SHAP, assumes independence of the predictors, which is unrealistic in practice. Recent research has developed improvements to incorporate dependencies between predictors. After a review of the theoretical underpinnings, the original KernelSHAP method is compared with improved versions in realistic settings, using three real-world datasets. While the improved versions are found to have smaller approximation error to exact Shapley values, they are also more computationally demanding. Further improvements are discussed and possible research directions are suggested. The thesis is structured as follows: After introducing explainable machine learning in chapter 1, the Shapley value and its applications to model explainability are explored in chapter 2. Chapter 3 presents methods to approximate Shapley values as well as recent improvements to these methods, which are tested on real datasets in chapter 4. Some possible directions for future research are pointed out in chapter 5, before giving a final conclusion in chapter 6. Code for the experiments of chapter 4 is found in the appendix.

Notes

Thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Statistics

Supervisor:
Prof. tit. Dr. Alina Matei
Université de Neuchâtel
Faculty of Science
Institute of Statistics

Identifiants

https://libra.unine.ch/handle/123456789/32728

Type de publication

master thesis

Dossier(s) à télécharger

Kasperek_Explainable Machine Learning_2024.pdf (5.1 MB)

google-scholar

Options

Explainable Machine Learning: Approximating Shapley Values for Dependent Predictors