There are more things in heaven and earth, Horatio,
Than are dreamt of in our philosophy
– Hamlet (1.5.167-8)
Horatio needs to dream bigger, there is more variety in the world than we assume. When building robust machine learning systems our tendency to underestimate variation will trip us up repeatedly if we let it.
Machine learning systems learn to deliver useful results through inductive reasoning: making broad generalizations from specific observations. For our systems to be useful in the real world they need inductive strength, they need to generalize.
Many argue that the path to inductive strength is more data, make your training data set big enough and you are home free. Certainly it is true that more data is more better. However, if our training set for predicting which animals will bark includes a million videos of Chihuahuas and none of Great Danes then any system we train will have poor inductive strength and won’t generalize. “Volume” isn’t enough we also need “variety”.
We all have a cognitive biases that encourage us to assume the completeness of what is familier. As Hamlet pointed out to Horatio and Taleb pointed out to everyone – unexpected exceptions to our expectations happen and we should accept that reality.
So, “Do we have sufficient variety?” is a critical question. If you are using a black box algorithm, exploring this question can be challenging. However, with good explainability you have more tools at our disposal to form an answer.
Here is an example from a neural net being trained to recognize if drapes appear in an image:
“Das, Batra, and their colleagues then try to get a sense of how the network makes its decisions by investigating where in the pictures it chooses to look. What they have found surprises them: When answering the question about drapes, the network doesn’t even bother looking for a window. Instead, it first looks to the bottom of the image, and stops looking if it finds a bed. It seems that, in the dataset used to train this neural net, windows with drapes may be found in bedrooms.”
– Is Artificial Intelligence Permanently Inscrutable?
This makes a good example because it is an obvious and amusing error. However, more subtle errors in forming your training set can generate results that are just as egregious and generate as much of a barrier to delivering a system that works as well in production as it did in the lab.
Robustness and adaptability depends on verifying you have sufficient variety in your training data and explainability is a foundational aid in that verification.