Explanation via illustrating the essence

Suitcase words “contain a variety of meanings packed into them.”  Explanation is certainly a suitcase word.  When we say we want an “explanation” of how a machine learning system works we can unpack that in a multitude of ways.  It might mean we want:

  • a step-by-step recapitulation of process that produced a specific result 
  • a description of how an observation would need to change to generate a meaningfully different result
  • a holistic description of the broad system the encompasses multiple paths from different observations to different results

The three examples above aren’t all the possibilities.  

For systems that associate concepts with observations there is another type of “explanation” that is interesting: we can ask the system to give an example that illustrates its notion of the “essence” of that concept.  

Neural networks are often used to classify observations by tagging them with conceptual labels.  For example we might train a network to recognize objects in photos such that when a photo contains a knife and fork the system applies the “knife” and “fork” labels.  

It turns out that “neural networks that were trained to discriminate between different kinds of images have quite a bit of the information needed to generate images too” as a Google team illustrated 

The team points out “we train networks by simply showing them many examples of what we want them to learn, hoping they extract the essence of the matter at hand (e.g., a fork needs a handle and 2-4 tines), and learn to ignore what doesn’t matter (a fork can be any shape, size, color or orientation)”.

Therefore an image generated from the neural network that only contains elements that will trigger the model to recognize a given concept and leaves out what doesn’t matter, will in some sense illustrate the “essence” of that concept from the model’s point of view.

This is interesting not just as a fun visual exercise.  Explanations can do more than just make users comfortable with a model.  Explanations can be used to improve the robustness and correctness of a model.  

For example the Google team generated images to explain the model’s notion of the essence of a “dumbbell”. 


This revealed “that the neural net isn’t quite looking for the thing we thought it was … There are dumbbells in there alright, but it seems no picture of a dumbbell is complete without a muscular weightlifter there to lift them. In this case, the network failed to completely distill the essence of a dumbbell. Maybe it’s never been shown a dumbbell without an arm holding it. Visualization can help us correct these kinds of training mishaps.”

By generating an explanation they discovered limitations in the training data set which led to a model that worked great on that training data but which would not generalize.  This is an underappreciated advantage of XAI.

One thought on “Explanation via illustrating the essence

Leave a Reply