Tommy Blanchard’s post “Performance metrics aren’t everything” reinforces some key themes related to XAI and Enterprise AI:
- System utility outweighs model performance:
Optimizing a model in isolation usually sub-optimizes the resulting system. Real world systems are often a mix of multiple models, traditional programing implementing top-down rules and humans in the loop. Choices that maximize lab bench metrics may harm overall system results.
- Feature engineering matters:
We should engineer features for total system value. Features that support generalizability, explanations and trouble-shooting are valuable and are worth the effort to engineer, even if doing so causes no improvement in the AUC.
- Explanations are complicated:
We should match the explanation to audience and think consciously about what makes a good explanation.
… whether a model is good or not typically relies on much more than [insert your favorite performance metric here]. Yes, if your model is predicting at chance, it’s almost certainly useless. But even slightly above chance it might be immensely useful (conversely, even with perfect predictions it might be useless). The usefulness of a predictive model is a function of what it enables you to do that you wouldn’t be able to do without it. What actions or interventions will the project as a whole allow that would not otherwise happen, and how valuable are those?
… There is a lot that a data scientist can do to alter these costs and payoffs, and sometimes that’s a better place to focus effort than getting an extra 0.00001 precision in your model … For example, often a model is expected to not only make a prediction, but give some idea of why that prediction was made. This is what makes tools like LIME that explain where a prediction has come from so useful. Unfortunately, they don’t completely solve the issue because explanations are complicated. Some explanations might be less useful than others. … Additional work of picking out those features that yield explanations of interest (or grouping features together in a way that makes sense) is also necessary. Doing this work well might create a tool that drastically improves the efficiency of some process. Doing it poorly might mean your model directs effort to the wrong places, creating cost and no value. These are both in the realm of possibilities regardless of how accurate your model is … Real projects aren’t Kaggle competitions where the only thing that matters is predictive accuracy.
— Tommy Blanchard Performance metrics aren’t everything