It’s common for vendors to make claims such as “Our algorithm predicts flight risk with 95% accuracy.” What should you do when you hear something like that? Buy the product? Dr. Nigel Guenole of the University of London says you should always ask, “How was that number calculated?”
In this case, if you dig deep enough, what you might find is:
- Of the 100 people who left, the algorithm named 95 of them in advance. It didn’t name anyone who did not leave. That’s what the claim tends to imply to the layman and if it’s true, it’s fantastic.
- Of the 100 people who left, the algorithm named 95 of them in advance. But also named 500 who did not leave. In other words, the algorithm had a very high number of “false positives.” If this is the case the algorithm may not be of much use.
- The algorithm identified 100 people who would leave; 95 of them did. There were 500 other people who left who the algorithm did not identify. In other words, the algorithm had a very high number of “false negatives.” Again, in this case, the algorithm is not as useful as we hoped.
The algorithm works exactly as advertised on the test data it was trained on. There is no information on how it will work on another data set (possibly a case of “overfitting” the test data). Scientists know that something that works perfectly on test data can be completely worthless for other data sets, so we should be highly skeptical about the value of the algorithm.
Unlike Dr. Guenole, I must admit that I often hesitate to ask, “How was this calculated?” because the answer is likely to be embarrassing for the vendor. People without a quantitative background may hesitate to ask for fear they won’t understand the answer. In fact, verbal explanations of how a calculation is performed can be hard to understand even if you do have a quantitative background. Another problem is that the sales representative probably doesn’t know and may make something up on the spot, which doesn’t help anyone.
The trick is to ask for a written explanation of how the number was calculated; a question best asked in an email. This is something you should insist on; it’s not proprietary, it’s not hard for vendor’s rep to find out. A number is meaningless unless you know how it’s calculated, so ask.