Extract from article by John Tredennick and Thomas Gricks
What is Contextual Diversity?
A typical TAR 1.0 (e.g., first generation TAR system) workflow involves a subject matter expert (often a senior lawyer) reviewing several thousand documents for training purposes before the TAR algorithm can rank the remainder of the population. It is an iterative process that entails significant human time, effort and cost for training and re-training the system (particularly when new documents are added to the collection). The review team can’t begin until an SME does the training, and depending on their inclination to look at random documents, the review can be held up for days or weeks.
In a TAR system based on CAL, we continuously use all the judgments of the review team to make the algorithm smarter (which means that you find relevant documents faster). Documents ranked high for relevance are fed to the review team, who uses their judgments to train the system. The CAL approach can also include contextual diversity, which improves performance, combats potential bias, and ensures topical coverage.
Contextual diversity refers to documents that are different from the ones already seen and judged by human reviewers. Because the system ranks all of the documents on a continual basis, we know a lot about documents—both those the review team has seen but also (and more importantly) those the review team has not yet seen. The contextual diversity algorithm identifies documents based on how significant and how different they are from the ones already seen, and then selects training documents that are the most representative of those unseen topics for human review.