This Section of the guide looks as those issues that are “hot” within the marketplace. It will be revised with each version of the guide, and the author welcomes input on areas readers would like to know more about.
NOTE: Why do I need to read this?
Some of these are the tools above and beyond key words, that can enable you to complete eDisclosure within a tight budget. Even if you aren’t using them, the opposition might well be, so you need to know what they are talking about.
Clustering/Concept/Sentiment Analysis #
This section shows the progression of the abilities of available software as we move through the early stages of clustering, to concept and, most up to date, sentiment analysis. The descriptions focus on the outcomes of the analysis tools, not precisely how they work, that’s a question for the individual suppliers.
Clustering is the ability to automatically group together documents with similar content. It was pioneered by Attenex and their “petri dish” visualisation of the documents, with clusters running off a spine of a shared set of keywords. Other software companies have followed suit in terms of technology, if not the way they display it. Most useful in the context of “find me all documents similar to this one”, which can enable bulk actions such as making the document set relevant, or eliminating it from the review process.
Whereas clustering works on all documents having similar content, concept analysis has the some sort of output, but here the groups of documents are brought together by shared concepts rather than shared text. So even though a document might not contain the words “contract breach” by virtue of the contextual analysis it can be identified as having that concept, and so will appear in the concept group.
Since 2018/9 onwards we have started seeing products that also provide sentiment analysis, identifying if the tone and language used in documents (but particularly emails) are “angry” or “upset”. In some matters, being able to hone in on these kinds of items early on in the review process is very useful.
These tools are normally supplied by software looking at the Early Case Assessment phase of the review process. Specific suppliers include Brainspace and NexLP, though clustering and concept analysis is normally built into most litigation support tools.
Email Threading #
Threading is the ability to display all the emails within a chain of correspondence as a single “thread”. In its more sophisticated versions, any missing emails can be “inferred” by their presence in subsequent iterations of the chain, which might influence the collection decisions. The way in which a chain can branch out can also be captured, so that only a small number of emails have to be read in order to gain an understanding of the entire thread. All litigation support software should support email threading, its become an entry level requirement.
Automatic Translation #
Some programs have the ability to automatically translate a number of common languages. Most can spot that the text is in a foreign language, but the ability to translate is a little less common. No one is suggesting that the translation is of evidential quality, but normally it is enough to enable an initial view to be taken in terms of relevance etc. Practical experience has thrown up one or two issues. The software does not cope well with the presence of two languages in a document, you can get a number of “false positives” with say an English email which has some French words in its address footer, being wrongly categorised as “French”. Also sometimes the document is correctly identified as being in a different language, but (if the module for that language is not installed) then it is arbitrarily categorised as some other country.
Audio/Video Files #
An area that was originally a competition between two market leaders, Intelligent Voice and Nexidia. As the market has matured, Intelligent Voice has established itself as the dominant player in this space.
This is the ability of software to index digital audio – including that within video files – as if it was text and then provide functionality enabling you to search in a similar manner. So, if you have an hour of a recording, the software will take you to the 30 second slot, some 45 minutes in, that contains the words “inside dealing”. A specialist tool for specialist projects, but an absolute Godsend if you have thousands of hours of digital material to review. Further improvements allow the production of text, including with redactions as applied to the audio, so that you can read the conversations rather than listening to them.
Intelligent Voice has biometric voiceprint search capabilities, and a standalone review player for each audio or video file with an embedded navigable transcript. Its end-to-end offering covers the entire audio discovery process from ingestion through to production, including redactions of both text and media, all without the data or the review team ever leaving the platform. Intelligent Voice also supports separate concurrent redaction sets of audio, such as separate productions needing different redactions to the same recording.
During 2020 Complete Discovery Source switched to Intelligent Voice, referencing Intelligent Voice’s “superior technology” in the process. Legility, having acquired Inventus, also went public with their use of Intelligent Voice.
In 2021 HaystackID joined Legility / Inventus in endorsing Intelligent Voice’s end-to-end audio discovery capabilities in Relativity, and doing so at CEO level. HaystackID made specific reference to Intelligent Voice’s interoperability with Relativity Conceptual Analytics, whilst Legility did likewise for Continuous Active Learning in Relativity.
Morae Global announced Q1 2022 that they were replacing their use of CallMiner with Intelligent Voice. Intelligent Voice themselves announced the availability of a Free Player in Relativity, providing all the text-independent features of the Intelligent Voice Player for free, for unprocessed media of any amount. Duration determination and corruption detection are also included for free for Relativity Search Sets, with media file durations listed in your preferred format or formats in the Document List.
In addition to its end to end integration in Relativity, Intelligent Voice is also available with ZyLAB.
Computer/Technology Assisted Review #
There was a school of thought that said predictive coding was set to be the most disruptive technology to effect lawyers, why is this so? The technology came from the United Sates where the twin pressures of every increasing volumes of electronically stored information (ESI) coupled with a constrained financial environment, meant in-house counsel demanded that law firms did more for far less fees. Though its genesis might have been American, the changing way of working had just as much impact over here. In the end the technology wasn’t disruptive per se, in that it did stop large scale manual reviews, but it did enable lawyers to focus on the most potentially relevant documents far quicker in the EDRM cycle than before, so making them more efficient.
There are a number of slightly different technologies out there, but for the purposes of this section, we will group them all under the single heading of Computer Assisted Review (CAR). with the following core approach at the heart of their products. When faced with a mass of ESI, a well-qualified person (or small number of individuals) are used to “train” the software in identifying which documents are relevant overall, important to specific topics, and (in some cases) Privileged. The training involves reviewing a batch of ESI, normally around 1,000 – 1,500 documents, which has been selected at random from the corpus of the material. The computer processes the results and provides another batch of documents, where it starts to suggest its values for relevance, topic association, etc. The reviewer codes this batch and the computer refines its algorithms, and repeats the process. Normally after five or so batches, the machine is ready to work on its own, and then codes the remainder of the collected material. What happens next depends upon the strategies adopted by the drivers of the CAR.
One possible approach is to select a level below which, the documents might be tentatively relevant, but proportionality would mean they could be discounted, say anything below 50%. At the top end of the coding spectrum, you might decide that any document that is marked between 100 and 80% is relevant and, at this stage, does not need human eyes to confirm what the computer has decided. Where you will spend time and money is reviewing the documents that the CAR process says are between 50 – 80% as these are the more marginal calls that need verification.
Using CAR technology has a number of benefits. Foremost is the significant reduction in time and cost of disclosure review work, with the experience of senior people being used up front where it makes most difference. Some cases studies show savings of over 60% in legal fees and review time.
Next, the process lends itself to a verifiable procedure, you can share samples of the documents in the different bands of relevance with the other side, thus proving the process works, without having to delve into the guts of the CAR logic engine. All studies to date, show that computers are far more consistent and accurate than humans in conducting review work, the silicon chip making no distinction between Monday morning or late on a Friday afternoon. Finally, as this is an IT technology, it is improving at an exponential rate, meaning that next year it will be twice as capable, four times so the year after that, and so on. It’s a technology that is here to stay.
Be aware that from the middle to end of 2013 the next generation of CAR products started to split into two totally different “camps”, on the one side was the more traditional pattern based approach that uses Linguistic Analysis (pattern matching) to train the software, and on the other side are products emerging from all the research on information mining carried out by the US government in the aftermath of the 9/11 terrorist attack. It’s a battle that will run for some time, you just need to be aware that it’s going on when the eager salesperson is in front of you. In 2016, England and Wales obtained judicial approval of CAR with precedents being established in two cases;
Pyrrho v MWB [2016] EWHC 256 (Ch) Link here.
Brown v BCA Trading [2016] EWHC 1464 (Ch). Link here.
A further ruling on the use (or rather miss-use) of CAR was given in 2018 :
Triumph Controls UK Ltd & Anor v Primus International Holding Co & Ors [2018] EWHC 176 (TCC). Link here.
This case has interesting implications as it marks the appearance of the next iteration of CAR / TAR, that is to say continuous active learning (CAL), though as already said, it reflects on the poor use of this approach.
Where used correctly, CAL is a different approach to the previous methodology of CAR / TAR. CAR has one or two subject matter experts review document to train the system in an iterative process that might take a number of batches and period of days to arrive a workable “engine”. CAL start with a group of reviewers working on documents, and “learns” from document number one onwards as to what is relevant and what isn’t. The reviewers still need to be properly supervised and the correct QC controls applied, but you can arrive a trained “engine” faster than the CAR approach. The Triumph case shows what can go wrong if the review process isn’t correctly supervised, but when the technology is used correctly it can be faster than the CAR / TAR approach.
Chris Dale has an informative article on the use of CAL here.
As with all the advanced functionality mentioned in this Guide, this is an area where you need to discuss options with your supplier.
Collection of Data from Social Media Environments #
As electronically stored information proliferates into different areas, so the ability to collect it from within those environments becomes more urgent. Various vendors are developing tools (or buying up companies that have done the development) to enable them to hook into LinkedIn, Facebook, Twitter, Yammer, WhatsApp, Teams, Zoom, et al and extract information in a meaningful way. Again, the need for this functionality will depend on the area the matter is within, but increasingly data stored in social media is becoming important in more and more cases. This is particularly the case after a year or more of pandemic lockdown when “traditional” ways of sharing information have been supplanted by social media channels.
Small Quantities of ESI #
This is a constant requirement that comes through every size of procurement and articulates the real need for users to “just read the emails”. The key problem is that information is passed to lawyers in electronic format, yet (for very good reasons as far as the IT department is concerned) they are not allowed to use the firm’s environment to review it. So, they are sat there with a PST of a small number of email, an email with 50 or so Word attachments, or a thumb drive with a couple of thousand items and they “just want to read them”. Providing a quick and easy solution to this requirement will be a real game changer for the various suppliers.
A number of products have come and gone in this space, See the vendor and software chapter for more details.
Charging Model #
Just as lawyers are coming under increasing pressure on prices, so vendors are being stressed by their clients. The default model is that people will charge you by volume, so much per GB at various stages of the process. Increasingly users are looking for a fixed price solution so they have clarity of costs to pass on to their clients. In response to this, some vendors are offering a “managed solution” option that guarantees fixed pricing for users, irrespective of individual case volumes.
See the Procurement section for more discussion on this.
Redaction Tools for “Native” Formats #
A bit of a specialist requirement, but one that could be significant if you really, really need it. What we are talking about here is the ability to redact (that is blank out the offending text, and remove it from all search capabilities) areas within things such as Word, Excel and PowerPoint documents. Normally this involves a cumbersome process of turning the “native” item into a PDF version and then redacting the PDF, but for things such as Excel spreadsheets this is not very workable. A number of vendors are now starting to supply toolkits that let you redact within the “Native” mode. However, if you have Privileged information within a note on an Excel spreadsheet and the opposition has convinced the judge that you must supply the document in its original Native mode, this could be a lifesaver. My normal rule of thumb is that some 0.2% of documents in a collection end up being redacted, and they are Word files to start with, just how crucial the ability to redact Excel spreadsheets really is remains to be seen.
In 2015 The Payne Group produced a redaction tool that allows you to remove material from a native Excel spreadsheet. Other suppliers such as Anexsys (Formally Hobs Legal Docs) also provide Relativity plugins that enable bulk redactions of things such as personal data. Redaction is also now available for audio files and the transcripts produced from them using the plugin from Intelligent Voice.
As ever look through the Supplier and Software Details section for all the products.
Email Family Groups with Non-Relevant Children #
At the heart of this point is what happens when you work with native emails which nowadays is the default situation. Say you have an email with 3 attachments, two of which are deemed relevant to the matter but the third one is not. Within the review platform you will see 4 items, the email and 3 attachments. You code the email and two of the attachments as relevant, and the third attachment as non-relevant. When you carry out the production process, you hand over the original email and two of the attachments in native mode, and keep back the non-relevant item. The problem is that the email in its native mode is a container that holds the email message and the 3 attachments, so you end up handing over the non-relevant item anyway. What will happen when the other side process the load file you give them is that the separate instances of the two attachments will be de-duplicated out against the versions held within the native email, and the non-relevant item will appear in their system.
The potential issue here is what information is contained within the non-relevant item. If its superfluous data with nothing of interest within it, no problem, but what if it’s a document relating to another entity that holds personal or commercially sensitive information?
In this second case, you might make an overall strategic decision to handle the email parent as if it had attachments containing redactions. In these cases, the email is converted to a PDF or Tiff format and exchanged in non-native mode, so that embedded attachments are not handed over.
This topic, will be something your third party supplier will be familiar with, you need to understand the implications of the decisions they will ask you to make. Remember, the default is to exchange in Native format, if you are not doing this you need to explain and agree your approach with the opposing side before the production deadline.
From 2018 onwards, a number of suppliers are storing emails in a .MHT format as opposed to the normal .MSG standard. The big advantage of this is that the .MHT does not include the attachments to the email, thus removing all the issues described above.
Talk to your supplier about what they can offer, for the author, this is a very significant step forward in functionality and its use should be encouraged as much as possible.