The search for spy planes teaches us about AI explainability, generalizability and troubleshooting

Screenshot 2018-02-25 11.42.55

Can you automate the recognition of a surveillance plane by its flight path?  With  machine learning yes you can.  Understanding how BuzzFeed News accomplished this makes a fascinating case study and provides lessons in how explainability provides collateral benefits such as broader generalizability and easier troubleshooting.

Here is a summary of how they did it:

First we made a series of calculations to describe the flight characteristics of almost 20,000 planes in the four months of Flightradar24 data: their turning rates, speeds and altitudes flown, the areas of rectangles drawn around each flight path, and the flights’ durations. We also included information on the manufacturer and model of each aircraft, and the four-digit squawk codes emitted by the planes’ transponders.

Then we turned to an algorithm called the “random forest,” training it to distinguish between the characteristics of two groups of planes: almost 100 previously identified FBI and DHS planes, and 500 randomly selected aircraft.

… We then used its model to assess all of the planes, calculating a probability that each aircraft was a match for those flown by the FBI and DHS.

… The algorithm was not infallible: Among other candidates, it flagged several skydiving operations that circled in a relatively small area, much like a typical surveillance aircraft. But as an initial screen for candidate spy planes, it proved very effective.
—  BuzzFeed News Trained A Computer To Search For Hidden Spy Planes

They also shared their notes with more technical details including their explainability analysis:

Screenshot 2018-02-25 14.06.08

MeanDecreaseAccuracy measures the overall decrease in accuracy of the model if each variable is removed. MeanDecreaseGini measures the extent to which each variable plays a role in partitioning the data into the defined classes.

So these two charts show that the steer1 and steer2 variables, quantifying the frequency of turning hard to the left, and squawk_1, the most common squawk code broadcast by a plane’s transponder, were the most important to the model.
—  github notes for BuzzFeed News Trained A Computer To Search For Hidden Spy Planes

Thinking about this case study from an XAI perspective some takeaways come to mind.

Sometimes can correct categorization errors without explainability

In their initial analysis there was a consistent categorization error that was noticed through inspection of the results.  Planes used for skydiving were being categorized as surveillance planes.

We suspect that this error was found through spot checking of results and explainability had no particular role in finding.  Also, we suspect this issue could be corrected without the assistance of explainability.  For example, by taking a list of skydiving drop zones and generating a feature that indicated if a plane repeatedly traveled near a dropzone in a single flight.  However, just making this one correction would be missing an opportunity to more generally improve the model.

Explainability allows you to better project generalizability and improve feature engineering

Without explainability it is difficult to project additional insights about generalizability.  In this particular case by looking at the explanations of the incorrect skydiving categorizations it becomes clear that the current features are insufficient to achieve good generalizability and that a fix specific to skydiving operations won’t solve the whole issue.

By understanding via the explanation that circling is the key feature being used, we can also project that we might incorrectly categorize other planes that circle, such as tour planes or crop-dusting planes.  These other types of potential categorization errors may not have been uncovered in our limited (700 observation) labeled training set or identified in our manual spot checks.

The fix for correcting these categorization errors is most likely additional feature engineering.  However, notice that a feature engineering fix for the error type that was caught (skydiving planes) may be different from the fix needed for the other types of potential errors.  Tour planes and crop-dusters are not going to circle over skydiving drop zones.

This case study highlights how explainability can help us understand the generalizability of our models and help us understand how to improve the features we use to make our models more robust.

Explainability enables troubleshooting at production time

Above we discussed steps that might be done at training and evaluation time.  However, what if a generalizability error is not caught up front and the system goes into production with the latent error present.  At some point during production these failures might be manually noticed.  A reporter checking on a surveillance plane might be told by the owner that is actually a tour plan.  In this case what is the next step for the reporter?  If there is no explanation that comes with the result then it is very hard to know if this was just an outlier error that is to be expected in a correctly operating probabilistic system or if this is an actual flaw in the model which should be corrected or if the plane owner is lying to him.  With an explanation there is a much better chance that the reporter can determine appropriate next steps to distinguish between these possibilities.

Explainability matters more for “decision support systems” 

This system built by BuzzFeed is really a decision support system.  The output of the system is reviewed and evaluated by a human before any follow-up action is taken.  Plane owner’s are not automatically sent emails questioning the use of their planes, articles about government surveillance are not automatically written.  Rather the AI’s output can provide info that might prompt a human to take these steps.  However, the human is still going to apply their judgement before any tangible step is taken.  In these circumstances an explanation is very high value as it helps human understand and trust the AI prediction and helps guide the human to logical next steps.

Leave a Reply