A Dimm View of Misleading Metrics and Irrelevant Research (Accuracy and F1)

eDiscovery expert Dr. Bill Dimm explains why some performance metrics don’t give an accurate view of performance for eDiscovery purposes, and why that makes a lot of research utilizing such metrics irrelevant for eDiscovery.

Extract from an article by Dr. Bill Dimm

If one algorithm achieved 98.2% accuracy while another had 98.6% for the same task, would you be surprised to find that the first algorithm required ten times as much document review to reach 75% recall compared to the second algorithm? This article explains why some performance metrics don’t give an accurate view of performance for eDiscovery purposes, and why that makes a lot of research utilizing such metrics irrelevant for eDiscovery.

The key performance metrics for eDiscovery are precision and recall.  Recall, R, is the percentage of all relevant documents that have been found. High recall is critical to defensibility. Precision, P, is the percentage of documents predicted to be relevant that actually are relevant. High precision is desirable to avoid wasting time reviewing non-relevant documents (if documents will be reviewed to confirm relevance and check for privilege before production).  In other words, precision is related to cost.

Additional Reading

Source: ComplexDiscovery