As machine learning (ML) models and artificial intelligence (AI) enter critical domains like medicine and financial markets the inability of humans to understand these models and their decisions becomes increasingly problematic. This problem becomes even more important as more complex ML algorithms, often called black-box ML algorithms, are used. Given the general need for providing interpretable results of black-box ML algorithms, we investigated currently available methodologies. This domain of interpretable or explainable machine learning and artificial intelligence is often called explainable artificial intelligence (XAI). We are aware that machine learning and artificial intelligence, are different, but in this article, we took the liberty to use the terms interchangeably. In addition, we also use interchangeably the terms explainability and interpretability.
Generally, interpretability of ML results can be achieved through two main approaches:
- Post-hoc interpretability ML methodologies. The post-hoc interpretability approaches provide interpretation and/or justification for predictions obtained by black-box model. More on different post-hoc interpretability ML methodologies can be found below.
- Intrinsically interpretable ML models. This approach is focused on the use of interpretable ML models which can be explained through model evaluation such as linear models, naive Bayes, logistic regression, decision trees and others.
The first step to achieve explainability is making a decision between interpretable or the black-box ML algorithm. This choice boils down to a trade-off issue between selecting more interpretable and less accurate ML model on one side or less interpretable and more accurate ML model on the other side.
In this article, we will focus on methodologies and approaches used for interpretation of black-box algorithms. Below are some of the most common post-hoc interpretability ML methodologies.
- Model-agnostic methodologies. Model-agnostic methodologies are independent of what type of ML algorithm is used for prediction. They provide a lot of flexibility and freedom to select the most appropriate algorithm. Below, we provide a list of the common model-agnostic methodologies and some model diagnostic tools that can be considered as potentially model agnostic. Of course, this is not the final list and if you are aware of other methodologies please share them with us.
- Local Interpretable Model-agnostic Explanations (LIME). LIME is a leader among model-agnostic methodologies. LIME fits an interpretable local model (for example linear model) around a single predicted observation trying to imitate how the global model behaves at the locality around that specific observation. The main advantage of LIME is that provides a lot of flexibility as a user can apply it to any classifier. It can be applied to tabular, text and image data. LIME is implemented in Python and R. For those who want to learn more about the LIME methodology here is the original paper.
- Partial Dependence Plots is another model-agnostic approach for interpretation of how each feature used in the black-box ML affects the model’s predictions. Partial dependence plot visualizes the effect of an independent feature on the model prediction values after averaging all other features used by the model. It is a simple diagnostic tool that nicely illustrates how each feature affects the ML model’s predictions. Partial dependence plots are also implemented in both Python and R. Detailed explanation of partial dependence plots can be found in the following paper and there is also a nice explanation of partial dependence plots in Elements of Statistical Learning.
- Permutation Variable Importance. The main idea behind this approach is to measure what happens to the model accuracy after you permute each feature used for model building. Any large decrease in model’s performance indicates an indirect large feature importance. Originally, permutation variable importance has been used for evaluation of random forest models, but it can potentially be extended to any ML algorithm.
In general, the main advantage of model-agnostic methodologies is their flexibility which allows the practitioner to apply their preferred ML algorithms. On the other side, it is important to keep in mind that all three methodologies presented above will illustrate how target feature correlates with independent features and not the underlying “causes”. The post-hoc methodologies try to build our “trust” in the ML models and not to provide the underlying “causes”.
- Model-specific methodologies are dependent on the type of ML model used. Methodologies presented below were often used for monitoring of learning process, especially in case of neural networks. However, they recently have been found useful for interpreting predictions of neural networks. The model-specific methodologies we present here are specific to neural networks.
- Layer-wise relevance propagation (LRP) redistributes the prediction backwards through the neural network from the observation using local redistribution rules until it finally assigns a relevance score to each input variable. LRP also has Python implementation and there is also an LRP tutorial. Here is also the original paper describing the LRP approach.
- Simple Taylor decomposition is very similar to Layer-wise relevance propagation (LRP) except a different decomposition function (Simple Taylor decomposition) is used for redistribution of the predicted value back to the input feature.
- Sensitivity Analysis explains the model’s prediction based on the model’s locally evaluated gradient which is partial derivative. Similarly to partial dependence plots, sensitivity analysis quantifies the importance of each input variable (e.g., image pixel) while all other variables are fixed. Sensitivity analysis assumes that the most important input features are those to which the prediction is most sensitive. There is also a Python implementation.
- Guided backpropagation is a non-conserving backward propagation technique. It similar to the regular backpropagation approach, but in case of guided backpropagation, the negative gradients are also suppressed. Negative gradients indicate that specific neuron has a negative influence on the class that we are trying to visualize. This way guided backpropagation puts weight only on the signals that contribute to the positive class scoring. For a more detailed explanation of the guided-backpropagation please refer to the original paper.
- Deconvolution similarly to guided backpropagation is also a non-conserving backward propagation technique. Contrary to guided backpropagation, deconvolution relies on max-pooling layers to direct the propagated signal to the appropriate locations in the image. Here is also the original paper describing the methodology.
An in-depth overview of all model-specific methodologies presented in this article can be also found in “Methods for Interpreting and Understanding Deep Neural Networks” by Gregoire Montavon, Wojciech Samek, Klaus-Robert Muller.