Interpretable Machine Learning

I’m currently working on discovering some facts about making machine learning interpretable. The resources I’ve found include a book by Christoph Molnar. I’m using his iml package in R to explore particularly Shapley Values. For an individual prediction, these numbers show the influence of each feature explaining how the prediction differs from the average. Shapley values are of interest because they meet some conditions of “fairness”.

Efficiency – they divide up exactly the total value.
Symmetry – if two features add the same value to every set they’re not part of, then their Shapley values are the same.
Null Player – if a feature adds nothing to any set, its Shapley value is 0.
Linearity – the Shapley value of a model which is a linear combination of two models (that is, \(av(S) + b\), for all subsets S of features) scales and shifts the same way.

Why consider this method of evaluating how important features are? First, it can be applied to compare which features are important in any two models. Second, we want to do this with a solution method that meets the fairness criterion. Though there are many criteria from cooperative game theory that possess some fairness, the Shapley value is the unique one woth the four conditions above.

I found so far that his computation of the Shapley Value is not the precise one, but an approximation. To reduce the complexity of the calculation, it’s based on picking some, but not all, random permutations of the features and finding the successive marginal contributions. For the real Shapley value, you need to use all the permutations. That’s computationally very intensive.

It also seems that his version of the Shapley value is not Efficient, meaning that summing the features’ Shapley values does not come out to the difference. I created a function that fixes that, and also produces the normalized Shapley Values– which give the percentage of the difference we can assign to each feature.

Interpretable Machine Learning

Bruce Hartman

2025-03-31