More Keepers? Predictive Coding Technologies and Protocols Survey – Fall 2021 Results

Aug 24, 2021

Content Assessment: More Keepers? Predictive Coding Technologies and Protocols Survey – Fall 2021 Results

Information - 90%

Insight - 90%

Relevance - 95%

Objectivity - 95%

Authority - 90%

92%

Excellent

A short percentage-based assessment of the qualitative benefit of the recent fall 2021 predictive coding technologies and protocols survey.

Editor’s Note: These are the results of the seventh semi-annual Predictive Coding Technologies and Protocols Survey conducted by ComplexDiscovery. As of today, the seven surveys have provided detailed feedback from 426 legal, business, and technology professionals on the use of specific machine learning technologies in predictive coding. The surveys have also provided insight into the use of those machine learning technologies as part of example technology-assisted review protocols.

This iteration of the survey had 42 responders and continued to focus on predictive coding technologies, protocols, workflows, and uses across the eDiscovery ecosystem.

The Predictive Coding Technologies and Protocols Fall 2021 Survey

The Predictive Coding Technologies and Protocols Survey is a non-scientific survey designed to help provide a general understanding of the use of predictive coding technologies, protocols, and workflows by data discovery and legal discovery professionals within the eDiscovery ecosystem. The fall 2021 survey was open from August 12, 2021, through August 23, 2021, with individuals invited to participate directly by ComplexDiscovery and other industry organizations.

Designed to provide a general understanding of predictive coding technologies and protocols, the survey had two primary educational objectives:

To provide a consolidated listing of potential predictive coding technology, protocol, and workflow definitions. While not all-inclusive or comprehensive, the listing was vetted with selected industry predictive coding experts for completeness and accuracy, thus it appears to be profitable for use in educational efforts.
To ask eDiscovery ecosystem professionals about their preferences and patterns of use regarding predictive coding platforms, technologies, protocols, workflows, and areas of usage.

The survey offered responders an opportunity to provide predictive coding background information, including their primary predictive coding platform, as well as posed five specific questions to responders. Those questions being:

How often do you use predictive coding as part of your eDiscovery workflow? (Prevalence)
Which predictive coding technologies are utilized by your eDiscovery platform? (Technologies)
Which technology-assisted review protocols are utilized in your delivery of predictive coding? (Protocols)
What is the primary technology-assisted review workflow utilized in your delivery of predictive coding? (Workflow)
What are the areas where you use technology-assisted review technologies, protocols, and workflows? (Areas of Usage)

Closed on August 23, 2021, the fall survey had 42 responders.

Key Results and Observations

Predictive Coding Technology and Protocol Survey Responder Overview (Chart 1)

47.62% of responders were from law firms.
28.57% of responders were from software or services provider organizations.
The remaining 23.81% of responders were either part of a consultancy (11.90%), a corporation (2.38%), or another type of entity (9.52%).

Primary Predictive Coding Platform (Chart 2)

There were 18 different platforms reported as a primary predictive coding platform by responders.
Relativity was reported as a primary predictive coding platform by 35.71% of survey responders.
The top two platforms were reported as a primary predictive coding platform by 57.14% of survey responders.
2.38% of responders reported they had no primary platform for predictive coding.

Prevalence of Predictive Coding Usage in eDiscovery (Chart 3)

More than 40% of survey responders (40.48%) reported using predictive coding in their eDiscovery workflow more than 50% of the time.
69.05% of responders reported using predictive coding in their eDiscovery workflow at least 5% of the time.
Almost 31% of responders (30.95%) reported using predictive coding in their eDiscovery workflow less than 5% of the time.

Predictive Coding Technology Employment (Chart 4)

Active Learning was reported as the most used predictive coding technology with 88.10% of responders using it in their predictive coding efforts.
38.10% of responders reported using only one predictive coding technology in their predictive coding efforts.
57.14% of responders reported using more than one predictive coding technology in their predictive coding efforts.
4.76% of responders reported not using any specific predictive coding technology.

Technology-Assisted Review Protocol Employment (Chart 5)

All listed technology-assisted protocols for predictive coding were reported as being used by at least one survey responder.
Continuous Active Learning® (CAL®) was reported as the most used predictive coding protocol with 83.33% of responders using it in their predictive coding efforts.
54.76% of responders reported using only one predictive coding protocol in their predictive coding efforts.
40.47% of responders reported using more than one predictive coding protocol in their predictive coding efforts.
4.76% of responders reported not using any specific predictive coding protocol.

Technology-Assisted Review Workflow Employment (Chart 6)

57.14% of responders reported using Technology-Assisted Review (TAR) 2.0 as a primary workflow in the delivery of predictive coding.
7.14% of responders reported using TAR 1.0 and 19.05% of responders reported using TAR 3.0 as a primary workflow in the delivery of predictive coding.
16.67% of responders reported not using TAR 1.0, TAR 2.0, or TAR 3.0 as a primary workflow in the delivery of predictive coding.

Technology-Assisted Review Uses (Chart 7)

95.24% of responders reported using technology-assisted review in more than one area of data and legal discovery.
92.86% of responders reported using technology-assisted review for the identification of relevant documents.
21.43% of responders reported using technology-assisted review for information governance and data disposition.

Survey Charts

(Charts can be expanded for detailed viewing.)

Chart 1: Survey Responder Overview (Background)

1 - Predictive Coding Technologies and Protocols Survey Overview - Fall 2021

Chart 2: Name of Primary Predictive Coding Platform (Background)

2 - Primary Predictive Coding Platform - Fall 2021

Chart 3: How often do you use predictive coding as part of your eDiscovery workflow? (Question #1)

3 - Predictive Coding Usage - Fall 2021

Chart 4: Which predictive coding technologies are utilized by your eDiscovery platform? (Question #2)

4 - Predictive Coding Technology Usage - Fall 2021

Chart 5: Which technology-assisted review protocols are utilized in your delivery of predictive coding? (Question #3)

5 - Technology-Assisted Review Protocol Usage - Fall 2021

Chart 6: What is the primary technology-assisted review workflow utilized in your delivery of predictive coding? (Question #4)

6 - Technology-Assisted Review Workflow Usage - Fall 2021

Chart 7: What are the areas where you use technology-assisted review technologies, protocols, and workflows? (Question #5)

7 - Technology-Assisted Review Uses - Fall 2021

Predictive Coding Technologies and Protocols (Survey Backgrounder)

As defined in The Grossman-Cormack Glossary of Technology-Assisted Review (1), Predictive Coding is an industry-specific term generally used to describe a technology-assisted review process involving the use of a machine learning algorithm to distinguish relevant from non-relevant documents, based on a subject matter expert’s coding of a training set of documents. This definition of predictive coding provides a baseline description that identifies one particular function that a general set of commonly accepted machine learning algorithms may use in a technology-assisted review (TAR).

With the growing awareness and use of predictive coding in the legal arena today, it appears that it is increasingly more important for electronic discovery professionals to have a general understanding of the technologies that may be implemented in electronic discovery platforms to facilitate predictive coding of electronically stored information. This general understanding is essential as each potential algorithmic approach has efficiency advantages and disadvantages that may impact the efficiency and efficacy of predictive coding.

To help in developing this general understanding of predictive coding technologies and to provide an opportunity for electronic discovery providers to share the technologies and protocols they use in and with their platforms to accomplish predictive coding, the following working lists of predictive coding technologies and TAR protocols are provided for your use. Working lists on predictive coding workflows and uses are also included for your consideration as they help define how the predictive coding technologies and TAR protocols are implemented and used.

A Working List of Predictive Coding Technologies (1,2,3,4)

Aggregated from electronic discovery experts based on professional publications and personal conversations, provided below is a non-all inclusive working list of identified machine learning technologies that have been applied or have the potential to be applied to the discipline of eDiscovery to facilitate predictive coding. This working list is designed to provide a reference point for identified predictive coding technologies and may over time include additions, adjustments, and amendments based on feedback from experts and organizations applying and implementing these mainstream technologies in their specific eDiscovery platforms.

Listed in Alphabetical Order

Active Learning: A process, typically iterative, whereby an algorithm is used to select documents that should be reviewed for training based on a strategy to help the classification algorithm learn efficiently.
Decision Tree: A step-by-step method of distinguishing between relevant and non-relevant documents, depending on what combination of words (or other features) they contain. A Decision Tree to identify documents pertaining to financial derivatives might first determine whether or not a document contained the word “swap.” If it did, the Decision Tree might then determine whether or not the document contained “credit,” and so on. A Decision Tree may be created either through knowledge engineering or machine learning.
k-Nearest Neighbor Classifier (k-NN): A classification algorithm that analyzes the k example documents that are most similar (nearest) to the document being classified in order to determine the best classification for the document. If k is too small (e.g., k=1), it may be extremely difficult to achieve high recall.
Latent Semantic Analysis (LSA): A mathematical representation of documents that treats highly correlated words (i.e., words that tend to occur in the same documents) as being, in a sense, equivalent or interchangeable. This equivalency or interchangeability can allow algorithms to identify documents as being conceptually similar even when they aren’t using the same words (e.g., because synonyms may be highly correlated), though it also discards some potentially useful information and can lead to undesirable results caused by spurious correlations.
Logistic Regression: A state-of-the-art supervised learning algorithm for machine learning that estimates the probability that a document is relevant, based on the features that it contains. In contrast to the Naïve Bayes, algorithm, Logistic Regression identifies features that discriminate between relevant and non-relevant documents.
Naïve Bayesian Classifier: A system that examines the probability that each word in a new document came from the word distribution derived from a trained responsive document or trained non-responsive documents. The system is naïve in the sense that it assumes that all words are independent of one another.
Neural Network: An Artificial Neural Network (ANN) is a computational model. It is based on the structure and functions of biological neural networks. It works like the way the human brain processes information. It includes a large number of connected processing units that work together to process information.
Probabilistic Latent Semantic Analysis (PLSA): This is similar in spirit to LSA but it uses a probabilistic model to achieve results that are expected to be better.
Random Forests: An ensemble learning method for classification, regression, and other tasks, that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. Random decision forests correct for decision trees’ habit of overfitting to their training set.
Relevance Feedback: An active learning process in which the documents with the highest likelihood of relevance are coded by a human, and added to the training set.
Support Vector Machine: A mathematical approach that seeks to find a line that separates responsive from non-responsive documents so that, ideally, all of the responsive documents are on one side of the line and all of the non-responsive ones are on the other side.

General TAR Protocols (5,6,7,8,9,10)

Additionally, these technologies are generally employed as part of a TAR protocol which determines how the technologies are used. Examples of TAR protocols include:

Listed in Alphabetical Order

Continuous Active Learning® (CAL®): In CAL®, the TAR method developed, used, and advocated by Maura R. Grossman and Gordon V. Cormack, after the initial training set, the learner repeatedly selects the next-most-likely-to-be-relevant documents (that have not yet been considered) for review, coding, and training, and continues to do so until it can no longer find any more relevant documents. There is generally no second review because, by the time the learner stops learning, all documents deemed relevant by the learner have already been identified and manually reviewed.
Hybrid Multimodal Method: An approach developed by the e-Discovery Team (Ralph Losey) that includes all types of search methods, with primary reliance placed on predictive coding and the use of high-ranked documents for continuous active training.
Scalable Continuous Active Learning (S-CAL): The essential difference between S-CAL and CAL® is that for S-CAL, only a finite sample of documents from each successive batch is selected for labeling, and the process continues until the collection—or a large random sample of the collection—is exhausted. Together, the finite samples form a stratified sample of the document population, from which a statistical estimate of ρ may be derived.
Simple Active Learning (SAL): In SAL methods, after the initial training set, the learner selects the documents to be reviewed and coded by the teacher, and used as training examples, and continues to select examples until it is sufficiently trained. Typically, the documents the learner chooses are those about which the learner is least certain, and therefore from which it will learn the most. Once sufficiently trained, the learner is then used to label every document in the collection. As with SPL, the documents labeled as relevant are generally re-reviewed manually.
Simple Passive Learning (SPL): In simple passive learning (“SPL”) methods, the teacher (i.e., human operator) selects the documents to be used as training examples; the learner is trained using these examples, and once sufficiently trained, is used to label every document in the collection as relevant or non-relevant. Generally, the documents labeled as relevant by the learner are re-reviewed manually. This manual review represents a small fraction of the collection, and hence a small fraction of the time and cost of an exhaustive manual review.

TAR Workflows (11)

TAR workflows represent the practical application of predictive coding technologies and protocols to define approaches to completing predictive coding tasks. Three examples of TAR workflows include:

TAR 1.0 involves a training phase followed by a review phase with a control set being used to determine the optimal point when you should switch from training to review. The system no longer learns once the training phase is completed. The control set is a random set of documents that have been reviewed and marked as relevant or non-relevant. The control set documents are not used to train the system. They are used to assess the system’s predictions so training can be terminated when the benefits of additional training no longer outweigh the cost of additional training. Training can be with randomly selected documents, known as Simple Passive Learning (SPL), or it can involve documents chosen by the system to optimize learning efficiency, known as Simple Active Learning (SAL).
TAR 2.0 uses an approach called Continuous Active Learning® (CAL®), meaning that there is no separation between training and review–the system continues to learn throughout. While many approaches may be used to select documents for review, a significant component of CAL® is many iterations of predicting which documents are most likely to be relevant, reviewing them, and updating the predictions. Unlike TAR 1.0, TAR 2.0 tends to be very efficient even when prevalence is low. Since there is no separation between training and review, TAR 2.0 does not require a control set. Generating a control set can involve reviewing a large (especially when prevalence is low) number of non-relevant documents, so avoiding control sets is desirable.
TAR 3.0 requires a high-quality conceptual clustering algorithm that forms narrowly focused clusters of fixed size in concept space. It applies the TAR 2.0 methodology to just the cluster centers, which ensures that a diverse set of potentially relevant documents are reviewed. Once no more relevant cluster centers can be found, the reviewed cluster centers are used as training documents to make predictions for the full document population. There is no need for a control set–the system is well-trained when no additional relevant cluster centers can be found. Analysis of the cluster centers that were reviewed provides an estimate of the prevalence and the number of non-relevant documents that would be produced if documents were produced based purely on the predictions without human review. The user can decide to produce documents (not identified as potentially privileged) without review, similar to SAL from TAR 1.0 (but without a control set), or he/she can decide to review documents that have too much risk of being non-relevant (which can be used as additional training for the system, i.e., CAL®). The key point is that the user has the info he/she needs to make a decision about how to proceed after completing review of the cluster centers that are likely to be relevant, and nothing done before that point becomes invalidated by the decision (compare to starting with TAR 1.0, reviewing a control set, finding that the predictions aren’t good enough to produce documents without review, and then switching to TAR 2.0, which renders the control set virtually useless).

TAR Uses (12)

TAR technologies, protocols, and workflows can be used effectively to help eDiscovery professionals accomplish many data discovery and legal discovery tasks. Nine commonly considered examples of TAR use include:

Identification of Relevant Documents
Early Case Assessment/Investigation
Prioritization for Review
Categorization (By Issues, For Confidentiality or Privacy)
Privilege Review
Quality Control and Quality Assurance
Review of Incoming Productions
Disposition/Trial Preparation
Information Governance and Data Disposition

Survey Information (13,14,15,16,17,18, 19, 20, 21)

References

(1) Grossman, M. and Cormack, G. (2013). The Grossman-Cormack Glossary of Technology-Assisted Review. [ebook] Federal Courts Law Review. Available at: http://www.fclr.org/fclr/articles/html/2010/grossman.pdf [Accessed 31 Aug. 2018].

(2) Dimm, B. (2018). Expertise on Predictive Coding. [email].

(3) Roitblat, H. (2013). Introduction to Predictive Coding. [ebook] OrcaTec. Available at: https://theolp.wildapricot.org/Resources/Documents/Introduction%20to%20Predictive%20Coding%20-%20Herb%20Roitblat.pdf [Accessed 31 Aug. 2018].

(4) Pickens, J. (2017). Deep Learning in E-Discovery – Fact or Fiction. [online] BloombergLaw.com. Available at: https://news.bloomberglaw.com/e-discovery-and-legal-tech/insight-deep-learning-and-e-discovery-fact-or-fiction [Accessed 161 Aug. 2021].

(5) Grossman, M. and Cormack, G. (2017). Technology-Assisted Review in Electronic Discovery. [ebook] Available at: https://judicialstudies.duke.edu/wp-content/uploads/2017/07/Panel-1_TECHNOLOGY-ASSISTED-REVIEW-IN-ELECTRONIC-DISCOVERY.pdf [Accessed 31 Aug. 2018].

(6) Grossman, M. and Cormack, G. (2016). Continuous Active Learning for TAR. [ebook] Practical Law. Available at: https://pdfs.semanticscholar.org/ed81/f3e1d35d459c95c7ef60b1ba0b3a202e4400.pdf [Accessed 31 Aug. 2018].

(7) Grossman, M. and Cormack, G. (2016). Scalability of Continuous Active Learning for Reliable High-Recall Text Classification. [ebook] Available at: https://plg.uwaterloo.ca/~gvcormac/scal/cormackgrossman16a.pdf [Accessed 3 Sep. 2018].

(8) Losey, R., Sullivan, J. and Reichenberger, T. (2015). e-Discovery Team at TREC 2015 Total Recall Track. [ebook] Available at: https://trec.nist.gov/pubs/trec24/papers/eDiscoveryTeam-TR.pdf [Accessed 1 Sep. 2018].

(9) “CONTINUOUS ACTIVE LEARNING Trademark Of Maura Grossman And Gordon V. Cormack – Registration Number 5876987 – Serial Number 86634255 :: Justia Trademarks”. Trademarks.Justia.Com, 2020, https://trademarks.justia.com/866/34/continuous-active-86634255.html [Accessed 12 Feb. 2020].

(10) “CAL Trademark Of Maura Grossman And Gordon V. Cormack – Registration Number 5876988 – Serial Number 86634265 :: Justia Trademarks”. Trademarks.Justia.Com, 2020, https://trademarks.justia.com/866/34/cal-86634265.html [Accessed 12 Feb. 2020].

(11) Dimm, B. (2016), TAR 3.0 Performance. [online] Clustify Blog – eDiscovery, Document Clustering, Predictive Coding, Information Retrieval, and Software Development. Available at: https://blog.cluster-text.com/2016/01/28/tar-3-0-performance/ [Accessed 18 Feb. 2019].

(12) Electronic Discovery Reference Model (EDRM) (2019). Technology Assisted Review (TAR) Guidelines. [online] Available at: https://www.edrm.net/wp-content/uploads/2019/02/TAR-Guidelines-Final.pdf [Accessed 18 Feb. 2019].

(13) Dimm, B. (2018). TAR, Proportionality, and Bad Algorithms (1-NN). [online] Clustify Blog – eDiscovery, Document Clustering, Predictive Coding, Information Retrieval, and Software Development. Available at: https://blog.cluster-text.com/2018/08/13/tar-proportionality-and-bad-algorithms-1-nn/ [Accessed 31 Aug. 2018].

(14) Robinson, R. (2013). Running Results: Predictive Coding One-Question Provider Implementation Survey. [online] ComplexDiscovery: eDiscovery Information. Available at: https://complexdiscovery.com/2013/03/05/running-results-predictive-coding-one-question-provider-implementation-survey/ [Accessed 31 Aug. 2018].

(15) Robinson, R. (2018). A Running List: Top 100+ eDiscovery Providers. [online] ComplexDiscovery: eDiscovery Information. Available at: https://complexdiscovery.com/2017/01/19/28252/ [Accessed 31 Aug. 2018].

(16) Robinson, R. (2018) Relatively Speaking: Predictive Coding Technologies and Protocols Survey Results [online] ComplexDiscovery: eDiscovery Information. Available at: https://complexdiscovery.com/relatively-speaking-predictive-coding-technologies-and-protocols-survey-results/ [Accessed 18 Feb. 2019].

(17) Robinson, R. (2019) Actively Learning? Predictive Coding Technologies and Protocols Survey Results [online] ComplexDiscovery: eDiscovery Information. Available at: https://complexdiscovery.com/actively-learning-predictive-coding-technologies-and-protocols-survey-spring-2019-results/ [Accessed 22 Aug. 2019]

(18) Robinson, R. (2019) From Platforms to Workflows: Predictive Coding Technologies and Protocols Survey – Fall 2019 Results [online] ComplexDiscovery: eDiscovery Information. Available at: https://complexdiscovery.com/from-platforms-to-workflows-predictive-coding-technologies-and-protocols-survey-fall-2019-results/ [Accessed 12 Feb. 2020].

(19) Robinson, R. (2020) Is It All Relative? Predictive Coding Technologies and Protocols Survey – Spring Results [online] ComplexDiscovery: eDiscovery Information. Available at: https://complexdiscovery.com/is-it-all-relative-predictive-coding-technologies-and-protocols-survey-spring-2020-results/ [Accessed August 7, 2020].

(20) Robinson, R. (2020) Casting a Wider Net? Predictive Coding Technologies and Protocols Survey – Fall 2020 [online] ComplexDiscovery: eDiscovery Information. Available at: https://complexdiscovery.com/casting-a-wider-net-predictive-coding-technologies-and-protocols-survey-fall-2020-results/ [Accessed February 5, 2021].

(21) Robinson, R. (2021) Cold Weather Catch? Predictive Coding Technologies and Protocols Survey – Spring 2021 [online] ComplexDiscovery: eDiscovery Information. Available at: https://complexdiscovery.com/cold-weather-catch-predictive-coding-technologies-and-protocols-survey-spring-2021-results/ [Accessed August 8, 2021].

Click here to provide specific additions, corrections, and updates.

* Predictive Coding Survey Respondents: Seven Surveys

8 - Predictive Coding Survey Respondents (Individual and Aggregate) - Fall 2021

Source: ComplexDiscovery

Have a Request?

If you have information or offering requests that you would like to ask us about, please let us know, and we will make our response to you a priority.

ComplexDiscovery OÜ is an independent digital publication and research organization based in Tallinn, Estonia. ComplexDiscovery covers cybersecurity, data privacy, regulatory compliance, and eDiscovery, with reporting that connects legal and business technology developments—including high-growth startup trends—to international business, policy, and global security dynamics. Focusing on technology and risk issues shaped by cross-border regulation and geopolitical complexity, ComplexDiscovery delivers editorial coverage, original analysis, and curated briefings for a global audience of legal, compliance, security, and technology professionals. Learn more at ComplexDiscovery.com.

Generative Artificial Intelligence and Large Language Model Use

ComplexDiscovery OÜ recognizes the value of GAI and LLM tools in streamlining content creation processes and enhancing the overall quality of its research, writing, and editing efforts. To this end, ComplexDiscovery OÜ regularly employs GAI tools, including ChatGPT, Claude, Gemini, Grammarly, Midjourney, and Perplexity, to assist, augment, and accelerate the development and publication of both new and revised content in posts and pages published (initiated in late 2022).

Market Sizing

Editor's Note: The eDiscovery Market Size Mashup is a research tool now in its fourteenth annual cycle. Since 2012, it has tracked the worldwide eDiscovery market through three structural eras: the early-cloud era, when subscription consumption began to displace perpetual licensing; the AI-assisted-review era, when predictive coding reset per-document review economics; and the demand-and-response era that defines 2025 through 2030, when generative-AI-assisted review and emerging agentic workflow features compress per-document cost against a data curve growing roughly five times faster than the dollars available to discover it.

The purpose has remained consistent across the cycles: to give legal technology executives, consultants, analysts, service providers, investors, and corporate legal teams a reconciled mid-range view of the market that is internally consistent, methodologically disclosed, and updated each year against the latest available data. The 2025 to 2030 cycle places the worldwide eDiscovery market at an estimated $19.61 billion in 2025 and a projected $28.08 billion by 2030, a reconciled 7.44 percent compound annual growth rate. Underneath the aggregate trajectory sits the central arithmetic of the cycle: a 27.6-percentage-point annual gap between data growth and market growth that compounds across five years into a 3.13-times productivity-per-dollar mandate by 2030. Each of the twelve Market Intelligence installments published across this cycle examined a single segmentation lens. The consolidated synthesis that follows brings those lenses together as one citable reference for procurement, capability planning, market analysis, and vendor-selection decisions through the back half of the decade. [exclude_from_rss]

[taq_review]

[/exclude_from_rss] Industry Research - eDiscovery Market Sizing Beat

Complete look: ComplexDiscovery OÜ's 2025 to 2030 eDiscovery market size mashup

A synthesis of the worldwide eDiscovery market across software, services, deployment, geography, sector, delivery, task share, and the demand-side data growth curve, reconciled within the ComplexDiscovery OÜ eDiscovery Marketplace Mashup Model ComplexDiscovery OÜ Staff Two numbers shape worldwide eDiscovery through 2030, and they do not move at the same pace. The dollars to discover potentially relevant information rise from approximately 19.61 billion in 2025 to approximately 28.08 billion by 2030, a multiplier of 1.4 times. The data those dollars must reach rises from approximately 181 zettabytes in 2025 to approximately 812 zettabytes in 2030, a multiplier of 4.5 times. By the end of the decade, the same dollar must cover roughly 3.13 times more data than it does at the start. That arithmetic, larger than any single segment shift or composition change, is the structural force underneath this cycle of the mashup. The 3.13-times productivity-per-dollar mandate that compounds out of the gap is a primary force underneath much of what follows. It pressures software to take share from services as channel billing shifts toward AI-driven workflows. It pressures review's relative share of task spend to decline even as review dollars rise. It pressures cloud-first procurement to become the operational default. And it sits alongside other structural dynamics including subscription consumption migration, direct-buyer maturation, and supplier consolidation. None of those other dynamics is reducible to the productivity mandate alone, but each operates in a market where the mandate sets the demand-side ceiling. What follows is a tour of how the mandate lands across each segmentation lens explored in this cycle of the Market Intelligence series, from the aggregate market line down through software, services, deployment, cloud composition, geography, sector, delivery approach, task share, and the demand-side data growth curve underneath all of it.

The shape of the worldwide market

The reconciled view places the worldwide eDiscovery market at approximately 19.61 billion dollars in 2025, rising to approximately 28.08 billion dollars by 2030, a compound annual growth rate of approximately 7.44 percent. Software, the smaller of the two segments in absolute terms, grows at roughly 10.41 percent. Services, the larger segment, grows at roughly 5.75 percent. The 4.66-percentage-point segment CAGR gap translates into a 5-percentage-point composition shift across the five-year horizon. Software's share of total worldwide eDiscovery spend rises from 34 percent in 2025 to 39 percent in 2030; services' share falls correspondingly from 66 percent to 61 percent. Services remains the larger segment by absolute spend through 2030; the segment crossover point falls outside the 2025 to 2030 window.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/05/eDiscovery-Market-Sizing-Past-and-Projected-2026.pdf" title="eDiscovery Market Sizing - Past and Projected - 2026"] Chart: eDiscovery Market Sizing, Past and Projected (2012 to 2030)

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/06/eDiscovery-Software-and-Services-Market-2025-2030.pdf" title="eDiscovery Software and Services Market (2025-2030)"] Chart: eDiscovery Software and Services Market (2025 to 2030)

Software's outperformance is not a fluke of the moment. The 4.66-percentage-point CAGR gap is, in large measure, the segment-level expression of an AI-driven channel reallocation: the same review workflow that once generated services revenue at a per-document or per-hour rate increasingly generates software revenue at a SaaS subscription or AI-inference rate. The work has not disappeared, and in many cases has expanded as data volumes grow, but the channel through which the work gets billed has steadily shifted from services to software. Services growth is slower but structurally durable; cross-border data, regulatory exposure, advisory work, and specialized response work sustain the services line even as software automates discrete tasks. The services segment in 2030 will not be a slower-growing version of the services segment of 2025; it will be a different mix.

How the composition is changing

Within the software segment, the cloud-first transition that has been underway for the better part of a decade is now functionally complete for new deployments. Off-premise software grows from approximately 5.29 billion dollars in 2025 to approximately 8.87 billion dollars in 2030, while on-premise software grows from 1.37 billion to 2.08 billion. On-premise solutions persist where security, sovereignty, or contractual constraints require them, but they are no longer the default for new procurement.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/05/eDiscovery-Software-Market-2025-2030-On-Off-Premise.pdf" title="eDiscovery Software Market (2025-2030) - On + Off Premise"] Chart: eDiscovery Software Market, On and Off Premise (2025 to 2030)

The more interesting subplot is inside the cloud category. SaaS holds roughly two-thirds of cloud spend in 2025 (approximately 67 percent) and drifts to approximately 63 percent by 2030 as PaaS and IaaS components compound at faster rates. PaaS rises from 15 percent of cloud spend to 17 percent; IaaS rises from 18 percent to 20 percent. The reason is simple: as advanced eDiscovery workloads incorporate large-scale processing, AI inference, vector search, and complex data engineering, customers and providers are increasingly integrating directly with platform and infrastructure services. Some of that integration appears in vendor SaaS revenue, and some appears as direct PaaS or IaaS spend.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/06/eDiscovery-Cloud-Software-Market-2025-2030.pdf" title="eDiscovery Cloud Software Market (2025-2030)"] Chart: eDiscovery Cloud Software Market (2025 to 2030)

Services growth lags software growth, but the headline number understates the qualitative shift inside services. Traditional managed-review revenue faces continued pricing compression as AI-assisted review compresses billable hours. Advisory services (litigation readiness, information governance, AI risk advisory) grow on the back of regulatory complexity. Specialized response work (forensic collection, second-request response, cross-border data transfer, regulatory inquiry support) grows at premium rates. The services segment in 2030 will not be a slower-growing version of the services segment of 2025; it will be a different mix.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/05/eDiscovery-Services-Market-2025-2030.pdf" title="eDiscovery Services Market (2025-2030)"] Chart: eDiscovery Services Market (2025 to 2030)

Where the money is going

The United States continues to anchor the worldwide market, accounting for roughly 66 percent of global spend in 2025 and easing to roughly 64 percent in 2030. The shift toward rest-of-world is gradual but real, driven by the maturation of data protection regimes outside the United States, the internationalization of regulatory inquiries, and the gradual buildout of regional capacity. Within rest-of-world, the United Kingdom, Canada, Germany, Australia, and Japan continue to claim the largest sub-shares, with rising activity in Singapore, India, and parts of the Middle East.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/06/eDiscovery-Market-Geographical-Overview-2025-2030.pdf" title="eDiscovery Market Geographical Overview (2025-2030)"] Chart: eDiscovery Market Geographical Overview (2025 to 2030)

The reconciliation distinguishes between government and regulatory demand on one hand and non-government (private-sector) demand on the other. Both grow over the period; non-government grows faster in both percentage and absolute terms. Non-government growth reflects expansion in civil litigation, internal investigations, corporate compliance, and AI-related risk advisory work. Government and regulatory growth reflects persistent investigative activity, ongoing premerger notification work, parallel inquiries in the European Union and the United Kingdom, and continued cross-border regulatory coordination.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/06/eDiscovery-Government-and-Non-Government-Market-Overview-2025-2030.pdf" title="eDiscovery Government and Non-Government Market Overview (2025-2030)"] Chart: eDiscovery Government and Non-Government Market Overview (2025 to 2030)

Segmenting worldwide spend by who captures the direct economic transaction (corporations and governments, law firms, or service providers) reveals a clear picture. The corporations-and-governments category remains the dominant channel throughout the period, reflecting continued in-house consumption supplemented by direct vendor procurement. Service providers grow faster than law firms over the forecast period. Law firms increasingly act as orchestrators rather than primary procurement channels, with the work and the dollars flowing around them.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/06/eDiscovery-Market-By-Direct-Delivery-Approach-2025-2030.pdf" title="eDiscovery Market By Direct Delivery Approach (2025-2030)"] Chart: eDiscovery Market by Direct Delivery Approach (2025 to 2030)

The task shift, and why it matters

The most consequential structural shift in the industry is not visible in the aggregate market line. It is in the composition of where eDiscovery dollars get spent across the three core tasks of collection, processing, and review. Review remains the largest single task expenditure throughout the period, but collection and processing capture increasing absolute and relative shares. Across a longer horizon, from RAND Corporation's 2012 baseline through ComplexDiscovery OÜ's 2025 modeling and 2030 forecast, review's share of total task spend has fallen from 73 percent to a reconciled 62 percent to a projected 52 percent: a 21-percentage-point decline across 18 years. Collection, over the same span, has expanded over threefold, from 8 percent to a projected 25 percent, a 17-percentage-point gain. Processing has been comparatively stable, rising from 19 percent to 23 percent.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/05/eDiscovery-Market-By-Task-2025-2030.pdf" title="eDiscovery Market By Task (2025-2030)"] Chart: eDiscovery Market by Task (2025 to 2030)

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/06/Relative-Task-Expenditures-for-Core-eDiscovery-Tasks.pdf" title="Relative Task Expenditures for Core eDiscovery Tasks"] Chart: Relative Task Expenditures for Core eDiscovery Tasks (2012, 2025, 2030)

The pace of the rebalance is accelerating. Roughly 47 percent of the 18-year share movement happens in the final five years from 2025 to 2030. Review's 5-year decline of 10 percentage points nearly equals the prior 13-year decline of 11 percentage points. The trend is consistent with a demand-and-response dynamic rather than two independent forces operating in parallel. The demand side is the growth in data volume subject to potential collection. The supply-side response is AI-assisted review's compression of per-document review costs, with predictive coding through the prior decade, generative-AI-assisted review through the current decade, and emerging agentic workflow features as the next compression wave. Data growth raises absolute review work to be done; AI compression compresses per-document review prices. Whether the resulting absolute review spend rises, falls, or stays flat depends on the relative pace of the two. In the reconciled view, review absolute spend continues to grow modestly, from approximately 12.16 billion dollars in 2025 to approximately 14.60 billion dollars in 2030, but materially slower than the aggregate market, which is why review's share declines. For practitioners and providers, the practical consequence is that capacity decisions made today should anticipate the structural drift toward collection-heavy and processing-heavy task profiles.

The demand side, data growth as the underlying force

Worldwide data volumes are projected to grow from approximately 181 zettabytes in 2025 to approximately 812 zettabytes in 2030, a compound annual growth rate of approximately 35 percent. Enterprise-held data, the subset most relevant to discoverable information, expands from approximately 54 zettabytes to approximately 243 zettabytes over the same period at the same rate, holding steady at roughly 30 percent of the global total. The 181 zettabyte 2025 anchor is consistent with IDC's Global DataSphere baseline. The 35 percent CAGR through 2030 reflects the Mashup Model's reconciliation across data-universe forecasts and enterprise-specific projections, and sits on the upper end of published industry growth estimates. It is higher than IDC's headline total-data forecast trajectory and lower than the most aggressive AI-content-driven projections.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/06/Data-Volume-and-Growth-in-Zettabytes-2025-2030.pdf" title="Data Volume and Growth in Zettabytes (2025-2030)"] Chart: Data Volume and Growth in Zettabytes (2025 to 2030)

The 35-percent data CAGR set against the 7.44-percent eDiscovery market CAGR is the central arithmetic of the cycle. Across the five-year horizon, global data multiplies roughly 4.5 times (4.484x, from 181 to 812 zettabytes); the worldwide eDiscovery market multiplies roughly 1.4 times (1.432x, from 19.6 to 28.1 billion dollars). Divide one by the other and the productivity-per-dollar requirement falls out: 4.484 ÷ 1.432 ≈ 3.13. By 2030, the same dollar must process, store, search, review, and produce against roughly 3.13 times more data than it did in 2025 just to maintain the same coverage ratio. That is not a marginal improvement. It is the productivity mandate that defines the decade for the industry. The mandate sits underneath several of the shifts documented above: the segment-level CAGR gap reflects the channel through which the productivity gain flows; the task-share rebalance reflects where the gain lands at the workflow level. AI capability compounding (predictive coding through the prior decade, generative-AI-assisted review through the current decade, emerging agentic workflow features as the next compression wave) is the bridge that must close the gap. Whether the bridge closes the gap fully, partially, or in stages depends on the pace at which the current generation of tooling matures, the pace at which agentic features move from product roadmap to production deployment, and whether the industry holds to full-coverage discovery as the standard or moves toward risk-tiered coverage that reserves the most intensive workflows for the documents that matter most. The mandate sets the ceiling; the answer to how the ceiling gets met is the open question of the decade.

What the reconciled view implies

The reconciled view supports a small set of interpretive points, none of them prescriptive, all of them grounded in the figures. For software vendors, the 10.41 percent reconciled software CAGR outpaces services by 4.66 percentage points a year, and the vendors positioned to capture a disproportionate share of incremental software dollars are those integrating AI-assisted review, modular SaaS delivery, and platform-aware processing into the same product surface. For service providers, slower nominal growth does not imply a less attractive market. It implies a different one. Providers that reposition around higher-value advisory and specialized regulatory response can outpace the segment headline. For law firms, the modest share of direct economic transactions captured by law firms suggests a continued shift toward orchestration and advisory positioning rather than primary procurement. For corporate and government legal teams, in-house consumption continues to dominate the direct delivery approach. Build-versus-buy on internal capabilities, governance over AI use in discovery, and readiness for second-request response remain the central program-level questions. For investors and analysts, the dynamics support a continued investment thesis around cloud-native, AI-enabled software platforms, with margin pressure on traditional services and continued consolidation at the supplier level.

Closing the loop

At the start of this analysis, the cycle was framed around two numbers that do not move at the same pace: 19.61 billion dollars rising to 28.08 billion against 181 zettabytes rising to 812 zettabytes. The productivity-per-dollar mandate that compounds out of that gap, 3.13 times by 2030, is not a forecast of efficiency. It is a measurement of pressure. The segment, task, and channel shifts documented above are the visible places where the pressure shows up. AI capability compounding is the bridge that must close the gap. Whether the industry meets that pressure through AI-assisted tooling alone, or moves toward a structural redefinition of what discovery coverage means in practice, from full coverage of every potentially relevant artifact to risk-tiered coverage that reserves the most intensive workflows for the documents that matter most, is the open question of the decade. The mashup measures the pressure. The industry answers it.

About the Model behind these figures

All quantitative figures in this analysis are drawn from the ComplexDiscovery OÜ eDiscovery Marketplace Mashup Model, a proprietary analyst-aggregation framework that reconciles publicly available third-party research, vendor disclosures, and industry reference work into a single defensible mid-range view of the worldwide eDiscovery market. The Model is a research aggregation tool maintained by ComplexDiscovery OÜ. It is not distributed publicly and does not constitute primary research; figures cited here represent reconciled estimates aligned to a common scope, geography, and timeframe.

This Mashup is the public synthesis vehicle for the Model's 2025 to 2030 cycle. It provides the consolidated reconciliation across software, services, deployment, cloud composition, geography, sector, delivery approach, task composition, long-horizon task share, and data growth, along with a representative listing of the organizations and publications that inform the Model's source aggregation.

Methodology

The scope of this synthesis is the worldwide eDiscovery market, encompassing software and services, expressed in U.S. dollars, across calendar years 2025 through 2030. Reconciliation of varying market definitions, geographic scopes, and source methodologies is presented as ranges with assumptions disclosed where precise alignment is not possible. The 2012 task baseline cited in the long-horizon task share section derives from the RAND Corporation 2012 study by Pace and Zakaras. Compound annual growth rates are derived using the standard formula ((End divided by Start) raised to the power of one over the number of years) minus 1, with five years as the denominator unless otherwise noted. The 3.13-times productivity mandate is a coverage-flat ratio (data multiplier divided by market multiplier), not a forecast of realized productivity gains.

Citing this analysis

The primary citable resource for the figures and analysis in this article is the ComplexDiscovery OÜ eDiscovery Marketplace Mashup Model. The Model is the aggregated research artifact maintained by ComplexDiscovery OÜ since 2012 and is the appropriate citation when referencing data points, projections, segmentation, or analyses derived from any ComplexDiscovery annual eDiscovery market size mashup. Suggested citation: Robinson, R. (2026). 2025 to 2030 eDiscovery Market Size Mashup (H. Robinson, Ed.). ComplexDiscovery OÜ.

Sources informing the Model

The listing below provides an overview of the organizations and publications whose data points have informed the development of the Model over time. The Model itself aggregates publicly available content (including abstracts, excerpts, quotes, references, and data points) from these sources, with inputs collected since the inaugural ComplexDiscovery eDiscovery Market Size Mashup in 2012. Individual entries are presented for transparency about Model construction and are not the appropriate citation for figures appearing in this analysis; readers referencing figures should cite the Model. Market modeling rounding may result in slight differences in aggregate numbers.

360 Market Updates / 360iResearch
Aberdeen
ACG Partners
Allied Market Research
American Medical Association
BMC
Catalyst Investors
ComplexDiscovery OÜ eDiscovery Marketplace Mashup Model (incorporating industry news, editorial analysis, eDiscovery Business Confidence Surveys, eDiscovery Pricing Surveys, and prior Annual eDiscovery Market Size Mashups since 2012)
CS Disco
Discovery & Legal Technology Association (DLTA)
eDiscovery Journal
EY
Facts and Factors
Forbes
FRONTEO (UBIC)
Future Market Insights
Gartner
Georgetown Law Center for the Study of the Legal Profession and Thomson Reuters Legal Executive Institute
Global Industry Analysts
Grand View Research
Greentarget
Harvard Business Review
Houlihan Lokey
i360
IBIS World
IDC
Industry eDiscovery Provider, Analyst, and Investor Briefings and Discussions
Industry Observer Estimations (Multiple Observers)
Industry Research (Company)
KLDiscovery
Markets and Markets
Mordor Intelligence
Nasdaq
Nuix
P&S Market Research
Prescient & Strategic Intelligence
RAND Institute for Civil Justice
Relativity Fest, Industry Panel Discussions
Reports and Data
Research and Markets
Richmond Journal of Law and Technology
Statista
The Conference Board
The Radicati Group
Third-Party Market Studies (Independent Industry Briefings)
Transparency Market Research
U.S. Bureau of Economic Analysis
U.S. Department of Commerce, International Trade Administration
U.S. Securities and Exchange Commission (public company filings)
Zion Market Research

Market Intelligence series reports (2025-30 eDiscovery market size mashup)

[the_ad id="45753"]

Assisted by GAI and LLM Technologies Additional reading

Source: ComplexDiscovery OÜ

ComplexDiscovery’s mission is to enable clarity for complex decisions by providing independent, data‑driven reporting, research, and commentary that make digital risk, legal technology, and regulatory change more legible for practitioners, policymakers, and business leaders.

[taq_review]

[/exclude_from_rss] Industry Research - eDiscovery Market Sizing Beat

Complete look: ComplexDiscovery OÜ's 2025 to 2030 eDiscovery market size mashup

The shape of the worldwide market

How the composition is changing

Where the money is going

The task shift, and why it matters

The demand side, data growth as the underlying force

What the reconciled view implies

Closing the loop

About the Model behind these figures

Methodology

Citing this analysis

Sources informing the Model

360 Market Updates / 360iResearch
Aberdeen
ACG Partners
Allied Market Research
American Medical Association
BMC
Catalyst Investors
ComplexDiscovery OÜ eDiscovery Marketplace Mashup Model (incorporating industry news, editorial analysis, eDiscovery Business Confidence Surveys, eDiscovery Pricing Surveys, and prior Annual eDiscovery Market Size Mashups since 2012)
CS Disco
Discovery & Legal Technology Association (DLTA)
eDiscovery Journal
EY
Facts and Factors
Forbes
FRONTEO (UBIC)
Future Market Insights
Gartner
Georgetown Law Center for the Study of the Legal Profession and Thomson Reuters Legal Executive Institute
Global Industry Analysts
Grand View Research
Greentarget
Harvard Business Review
Houlihan Lokey
i360
IBIS World
IDC
Industry eDiscovery Provider, Analyst, and Investor Briefings and Discussions
Industry Observer Estimations (Multiple Observers)
Industry Research (Company)
KLDiscovery
Markets and Markets
Mordor Intelligence
Nasdaq
Nuix
P&S Market Research
Prescient & Strategic Intelligence
RAND Institute for Civil Justice
Relativity Fest, Industry Panel Discussions
Reports and Data
Research and Markets
Richmond Journal of Law and Technology
Statista
The Conference Board
The Radicati Group
Third-Party Market Studies (Independent Industry Briefings)
Transparency Market Research
U.S. Bureau of Economic Analysis
U.S. Department of Commerce, International Trade Administration
U.S. Securities and Exchange Commission (public company filings)
Zion Market Research

Market Intelligence series reports (2025-30 eDiscovery market size mashup)

[the_ad id="45753"]

Assisted by GAI and LLM Technologies Additional reading

Source: ComplexDiscovery OÜ

ComplexDiscovery’s mission is to enable clarity for complex decisions by providing independent, data‑driven reporting, research, and commentary that make digital risk, legal technology, and regulatory change more legible for practitioners, policymakers, and business leaders.

Complete look: ComplexDiscovery OÜ’s 2025 to 2030 eDiscovery market size mashup

Marketing

The one question that reveals whether your marketing plan is actually a plan

Investments

eDiscovery Vendor Viability Scoring Tool: Making the Subjective Objective

Pricing

Editor's Note: Generative AI is no longer a future-state concept in eDiscovery pricing; it is already reshaping how legal, technology, and corporate teams evaluate cost, value, and defensibility. In this Winter 2026 Pricing Pulse analysis, ComplexDiscovery OÜ, in partnership with EDRM, examines a market that is simultaneously stabilizing in traditional service categories and fragmenting in newer AI-driven ones. The findings highlight a clear divide between established pricing norms for forensic collection, processing, hosting, and document review, and the still-developing commercial models emerging around GenAI-assisted review. For cybersecurity, data privacy, regulatory compliance, and eDiscovery professionals, that divide matters. Pricing transparency now directly affects budgeting, vendor selection, matter planning, and risk management—especially as organizations weigh the promise of AI efficiency against unresolved questions around exception handling, quality control, and contract structure. This analysis offers a timely benchmark for understanding where the market stands today and where pricing pressure is likely to intensify next.

[exclude_from_rss]

[taq_review]

[/exclude_from_rss] Industry Research

A Complete Analysis of the Winter 2026 eDiscovery Pricing Survey

ComplexDiscovery Staff

Executive Summary

The Winter 2026 eDiscovery Pricing Survey, conducted by ComplexDiscovery OÜ in partnership with the Electronic Discovery Reference Model (EDRM) across December 2025 through February 2026, captures a market at a pivotal inflection point. Generative AI (GenAI) has moved into operational workflows for a significant and growing segment of the eDiscovery market — but adoption is uneven, pricing frameworks have not kept pace, and a meaningful share of practitioners have not yet engaged with AI-assisted review at any level. That bifurcation between early adopters and the rest of the market is itself one of the survey's defining findings. Drawing on 53 responses from legal professionals, technology providers, corporations, and consultancies, this survey provides a detailed pricing snapshot of the current eDiscovery market, spanning forensic collection, data processing and hosting, document review, and GenAI-assisted review. Several clear signals emerge from the data. Forensic collection and examination rates have stabilized in the $250–$350 per hour range for standard work, with premium rates for testimony and analysis. Data hosting has commoditized meaningfully at the infrastructure level, while analytics-enabled hosting retains pricing differentiation. Document review rates are stable but per-document billing remains opaque. Most critically, GenAI-assisted review pricing is experimentally diverse — hybrid models and per-document billing each claim roughly 28% of reported primary models, with the $0.11–$0.50 per-document range emerging as a competitive zone that directly challenges traditional human review economics. This report covers all 25 survey questions, organized into four thematic sections, with analyst observations and strategic implications throughout. All findings represent self-reported practitioner perceptions of prevailing market pricing — not verified transaction records — and should be read as directional market intelligence. Unlike vendor-produced or client-commissioned pricing guides, the Pricing Pulse is designed and published independently by ComplexDiscovery OÜ in partnership with the Electronic Discovery Reference Model (EDRM), with no commercial interest in any specific pricing outcome.

About the Survey

Survey Design and Purpose The Winter 2026 eDiscovery Pricing Survey was designed and administered by ComplexDiscovery OÜ in partnership with the Electronic Discovery Reference Model (EDRM) as part of its ongoing Pricing Pulse research program. The survey's primary purpose is to provide eDiscovery practitioners, technology providers, and legal operations professionals with empirically grounded pricing benchmarks across the key service categories that define the eDiscovery market. The Pricing Pulse is practitioner-reported and independently produced — it is not sponsored by, or designed to favor, any vendor, platform, or service category. Respondent comments critiquing the survey design itself are actively incorporated into future iterations, as reflected in this report's processing methodology note. This iteration of the survey placed particular emphasis on generative AI-assisted review pricing — a category first addressed formally in prior survey cycles and highlighted significantly in Winter 2026 to reflect the technology's accelerating, if uneven, integration into eDiscovery workflows. The five GenAI pricing questions (Questions 18–22) were designed to capture not just price points but pricing model structures, exception handling practices, and the nascent development of outcome-based pricing — recognizing that practitioners at very different stages of AI adoption would be responding. Respondent Profile The survey received 53 completed responses. By business segment, law firms represented the largest cohort at 43.4% (23 respondents), followed by software and/or services providers at 24.5% (13), corporations at 15.1% (8), consultancies at 9.4% (5), and media, research, or educational organizations at 7.5% (4). By primary function, 67.9% (36) identified as legal/litigation support professionals, 26.4% (14) as business or business support functions, and 5.7% (3) as IT or product development.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Survey-Respondents-by-Organizational-Segment-Winter-2026.pdf" title="Survey Respondents by Organizational Segment - Winter 2026"]

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Survey-Respondents-by-Primary-Function-Winter-2026.pdf" title="Survey Respondents by Primary Function - Winter 2026"]

Geographically, the survey is overwhelmingly U.S.-centric: 92.5% of respondents (49) indicated North America – United States as their primary eDiscovery business geography, with the remaining 7.5% distributed across Europe (United Kingdom and non-UK) and Asia/Asia Pacific. This composition reflects the survey's community of practitioners and should be taken into account when applying results to non-U.S. markets.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Survey-Respondents-by-Geographic-Region-Winter-2026.pdf" title="Survey Respondents by Geographic Region - Winter 2026"]

The respondent pool's composition — heavily weighted toward legal practitioners with meaningful technology provider and in-house corporate representation — lends credibility to the pricing data for legal use cases while also surfacing supply-side perspectives from vendors who see pricing across many client engagements.

Section 1: Forensic Collection, Examination, and Testimony Pricing

Forensic collection and digital examination form the evidentiary foundation of eDiscovery. Unlike commoditized downstream services, forensic work depends on specialized expertise, defensible chain-of-custody protocols, and increasingly complex device environments. Mobile devices, cloud-linked data ecosystems, encrypted storage, and enterprise application footprints have expanded the examiner's scope considerably over the past several years, sustaining rate levels that resist the downward pressure more commoditized services face. Expert witness testimony sits at the highest value tier of forensic work — where practitioner credentials, courtroom experience, and legal exposure command significant premium pricing. Q1 & Q2 — Per Hour Cost for Onsite and Remote Collection The $250–$350 per hour range is the clear market anchor for forensic collection, cited by 56.6% of respondents for both onsite and remote collection. However, the distributions diverge meaningfully at the premium tier: 20.8% of respondents report onsite collection rates exceeding $350 per hour, compared to just 5.7% for remote. Conversely, remote collection skews lower — 18.9% report sub-$250 rates for remote work, versus only 5.7% for onsite. This onsite premium reflects real cost structures: travel, physical access logistics, on-premises security requirements, and the coordination burden of collecting in active enterprise environments. The growth of remote forensic collection tools — driven in part by pandemic-era necessity and now institutionalized in many engagements — has introduced competitive downward pressure on remote rates that onsite services do not face to the same degree. Four respondents (7.5%) indicate alternative pricing models for remote collection, suggesting some providers are moving toward flat-fee or subscription-based remote collection arrangements.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Collection-Pricing-Per-Hour-Cost-for-an-Onsite-Collection-by-a-Forensic-Examiner-Winter-2026-.pdf" title="Collection Pricing - Per Hour Cost for an Onsite Collection by a Forensic Examiner - Winter 2026"]

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Collection-Pricing-Per-Hour-Cost-for-a-Remote-Collection-by-a-Forensic-Examiner-Winter-2026.pdf" title="Collection Pricing - Per Hour Cost for a Remote Collection by a Forensic Examiner - Winter 2026"]

Q3 & Q4 — Per Device Cost for Desktop/Laptop and Mobile Device Collection Device-based pricing skews decisively to the upper tier: 50.9% of respondents report per-device costs exceeding $350 for desktop and laptop collections, and 49.1% report the same for mobile devices. The $250–$350 mid-range captures 18.9% for computers and 24.5% for mobile devices — the higher mobile representation in the mid-range may reflect lower-complexity or volume-based mobile collection engagements where physical access is easier and device configurations are more standardized. Perhaps most notable is the convergence of mobile and computer collection pricing at the upper tier. Mobile device collection — once considered simpler than computer collection due to smaller storage capacities — now commands comparable rates as encryption, cloud sync architectures, third-party application data, and ephemeral messaging platforms have substantially increased examiner effort and risk. Practitioners seeking to budget mobile collection as a lower-cost alternative to computer collection will increasingly find the market does not support that assumption.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Collection-Pricing-Per-Device-Cost-for-a-Desktop-Laptop-Computer-Collection-by-a-Forensic-Examiner-Winter-2026.pdf" title="Collection Pricing - Per Device Cost for a Desktop Laptop Computer Collection by a Forensic Examiner - Winter 2026"]

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Collection-Pricing-Per-Device-Cost-for-a-Mobile-Device-Collection-by-a-Forensic-Examiner-Winter-2026.pdf" title="Collection Pricing - Per Device Cost for a Mobile Device Collection by a Forensic Examiner - Winter 2026"]

Q5 — Per Hour Cost for Investigation, Analysis, and Report Generation Investigation, analysis, and report generation command a higher hourly rate floor than collection itself. More than half of respondents (54.7%) report rates in the $350–$550 range for this work, compared to the $250–$350 majority for collection. Only 30.2% report rates below $350 per hour for analysis, and 5.7% exceed $550. This premium reflects the cognitive and legal weight of analytical work. Forensic examiners producing reports that will be used in litigation, regulatory proceedings, or internal investigations are exercising expert judgment that creates professional liability — and the market prices that exposure accordingly. Practitioners purchasing forensic services should anticipate that billing rates will escalate from collection through analysis, often within the same engagement.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Collection-Pricing-Per-Hour-Cost-for-Investigation-Analysis-and-Report-Generation-by-an-FE-Winter-2026.pdf" title="Collection Pricing - Per Hour Cost for Investigation Analysis and Report Generation by an FE - Winter 2026"]

Q6 — Per Hour Cost for Expert Witness Testimony Expert witness testimony carries the highest rate profile in the forensic pricing group. While 47.2% report testimony rates in the $350–$550 range — consistent with analysis rates — a notable 26.4% report rates exceeding $550 per hour, the highest proportion in any >$550 category across the survey. The elevated 'do not know' response rate (20.8%) likely reflects that many practitioners engage forensic examiners for collection and analysis but not testimony, creating a meaningful gap in their pricing awareness for this segment. Expert witness rates are driven by factors beyond standard hourly billing — including the examiner's track record, publication history, geographic availability, and the complexity of the matter at issue. The wide distribution, from below $350 to above $550, reflects a market where individual credentials create significant pricing dispersion.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Collection-Pricing-Per-Hour-Cost-for-Expert-Witness-Testimony-In-Person-and-Written-by-an-FE-Winter-2026.pdf" title="Collection Pricing - Per Hour Cost for Expert Witness Testimony (In-Person and Written) by an FE - Winter 2026"]

Analyst Observation — Forensic Collection & Examination The forensic pricing landscape shows a well-established rate structure for collection and a predictable escalation through analysis to testimony. The $250–$350 range for collection hours serves as a reliable negotiation baseline. The key risk for buyers is underbudgeting for analysis and testimony phases — where rates routinely exceed $350/hour and frequently surpass $550. Practitioners with active litigation portfolios should establish explicit rate schedules with forensic vendors for all service tiers at engagement outset, not just collection. Key Takeaways — Section 1

$250–$350/hour is the market anchor for both onsite and remote forensic collection (56.6% each).
Onsite collection carries a measurable premium: 20.8% report >$350/hour vs. 5.7% for remote.
Mobile device collection rates have converged with computer collection at the upper tier (both ~50% report >$350/device).
Investigation, analysis, and report generation rates escalate to $350–$550/hour for 54.7% of respondents.
Expert witness testimony exceeds $550/hour for 26.4% — the highest proportion across all survey categories.

Section 2: Data Processing, Hosting, and Project Management Pricing

Data processing and hosting represent the operational infrastructure of eDiscovery delivery. Processing — transforming raw electronically stored information (ESI) into a reviewable format — has historically been a significant cost driver in large matters. Hosting provides the platform on which review takes place. Both categories have experienced significant commoditization pressure from cloud infrastructure economics, but the emergence of AI-driven early culling and processing tools is beginning to reshape volume dynamics in ways that affect both pricing and billing model design. Q7 & Q8 — Per GB Cost to Process ESI at Ingestion and at Completion Processing pricing at ingestion is relatively compressed: 39.6% of respondents report rates in the $25–$75 per GB range, and 34.0% report rates below $25 per GB. A significant 18.9% indicate alternative pricing models, reflecting the market's movement away from traditional per-GB ingestion billing. Processing pricing at completion of processing tells a different story. The most commonly reported range shifts to 'less than $100 per GB' (37.7%), and the proportion reporting alternative pricing models rises to 22.6%. Another 15.1% report $100–$150 per GB at completion, and 9.4% exceed $150 per GB. The jump from ingestion to completion reflects the data expansion and enrichment that occurs through native processing, deduplication, OCR, and promotion — processes that substantially increase the per-GB cost basis for providers. One respondent offered a methodologically important observation worth acknowledging directly: the survey's two-question processing model may conflate two distinct industry billing philosophies — an 'all-in' per-GB rate that covers ingestion through promotion, versus a staged model with separate per-GB charges for ingestion and native processing or promotion to review. This is a legitimate distinction, and practitioners benchmarking against these results should clarify which model their vendor employs. Future survey iterations will address this more precisely.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Processing-Pricing-Per-GB-Cost-to-Process-ESI-Based-on-Volume-at-Ingestion-Winter-2026.pdf" title="Processing Pricing - Per GB Cost to Process ESI Based on Volume at Ingestion - Winter 2026"]

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Processing-Pricing-Per-GB-Cost-to-Process-ESI-Based-on-Volume-at-Completion-Winter-2026.pdf" title="Processing Pricing - Per GB Cost to Process ESI Based on Volume at Completion - Winter 2026"]

Q9 & Q10 — Per GB Per Month Cost to Host ESI Without and With Analytics Data hosting without analytics has substantially commoditized. More than half of respondents (54.7%) report hosting rates below $10 per GB per month, and another 30.2% fall in the $10–$20 range. Less than 2% report rates exceeding $20 per GB per month. This distribution reflects years of cloud infrastructure cost reduction passed through to buyers, as major platform providers compete on storage economics. Analytics-enabled hosting shows a wider and higher distribution. While 43.4% report rates below $15 per GB per month with analytics, 32.1% fall in the $15–$25 range, and 11.3% exceed $25 per GB per month. The premium for analytics-capable hosting reflects platform differentiation: vendors with mature AI search, conceptual clustering, visualization tools, and review workflow automation can sustain higher rates. Undifferentiated platforms — those competing primarily on storage price — face continued downward pressure as infrastructure costs decline. One respondent's comment corroborates this trajectory directly, observing that while overall eDiscovery pricing has been stable, technology costs specifically appear to be coming down — a signal consistent with the commoditization pattern visible in the hosting data.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Processing-Pricing-Per-GB-Cost-Per-Month-to-Host-ESI-without-Analytics-Winter-2026.pdf" title="Processing Pricing - Per GB Cost Per Month to Host ESI without Analytics - Winter 2026"]

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Processing-Pricing-Per-GB-Cost-Per-Month-to-Host-ESI-with-Analytics-Winter-2026.pdf" title="Processing Pricing - Per GB Cost Per Month to Host ESI with Analytics - Winter 2026"]

Q11 — User License Fee Per Month for Access to Hosted Data User licensing is in an active state of structural transition. The $50–$100 per user per month range is the most frequently cited (41.5%), but a striking 34.0% of respondents report alternative pricing models — the highest alternative-model proportion among any category in the processing and hosting section. Only 17.0% report rates below $50 per user per month. The high alternative-model rate reflects a market shift away from traditional per-seat licensing toward enterprise agreements, volume tiers, and managed service arrangements that bundle access costs into broader contract structures. For corporate legal departments and law firms managing multi-matter eDiscovery portfolios, these bundled arrangements restructure cost visibility: per-matter spend attribution becomes less granular, which may simplify budgeting at the portfolio level but reduces transparency at the individual matter level. Whether bundled arrangements represent a net financial advantage depends on volume, negotiated terms, and how closely actual usage tracks the contracted scope — variables the survey does not measure.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Processing-Pricing-User-License-Fee-Per-Month-for-Access-to-Hosted-Data-Winter-2026.pdf" title="Processing Pricing - User License Fee Per Month for Access to Hosted Data - Winter 2026"]

Q12 — Per Hour Cost of Project Management Support for eDiscovery Project management pricing is the most consistent and well-understood category in the processing and hosting group. More than half of respondents (52.8%) report rates in the $100–$200 per hour range, and 26.4% report rates exceeding $200 per hour. The low 'do not know' rate (5.7%) — tied with Q9 for the lowest across all Section 2 questions — indicates that PM pricing is well understood by practitioners and regularly visible in vendor proposals. The 26.4% reporting greater than $200 per hour for project management likely reflects the growing complexity of modern eDiscovery engagements. Today's project managers must coordinate across AI review platforms, multiple review vendor relationships, technical review workflows, and real-time quality control functions — a scope considerably broader than the data management and platform coordination role the title suggested in prior market iterations.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Processing-Pricing-Per-Hour-Cost-of-Project-Management-Support-for-eDiscovery-Winter-2026.pdf" title="Processing Pricing - Per Hour Cost of Project Management Support for eDiscovery - Winter 2026"]

Analyst Observation — Processing, Hosting & Project Management Processing pricing is bifurcating: per-GB billing at ingestion remains common, but completion-phase and analytics-related pricing is shifting toward bundled and alternative models. Practitioners anchored to traditional per-GB benchmarks for TAR, analytics hosting, or managed service arrangements may be negotiating based on outdated frameworks. Hosting has genuinely commoditized at the infrastructure level — the pricing action now lives in analytics differentiation layered above the storage tier. Key Takeaways — Section 2

Processing at ingestion is largely below $75/GB (73.6% combined), but completion-phase pricing climbs with 24.5% reporting $100/GB or more.
Alternative pricing models account for 18.9% at ingestion and 22.6% at completion — signaling a structural shift away from per-GB processing billing.
Basic hosting has commoditized: 54.7% report sub-$10/GB/month. Analytics hosting retains differentiation with 11.3% exceeding $25/GB/month.
User licensing is migrating from per-seat to bundled models — 34.0% report alternative pricing structures.
Project management rates are well understood and rising: 26.4% now exceed $200/hour, reflecting growing engagement complexity.

Section 3: Document Review Pricing

Document review sits at the commercial center of most eDiscovery engagements. It is the largest cost driver in complex litigation, the primary arena in which human expertise meets technology leverage, and the category most directly disrupted by the emergence of GenAI-assisted review. Pricing in this section spans both hourly attorney rates (the traditional billing model) and per-document rates (a model that has gained traction as technology-assisted review has enabled higher throughput). The data in this section provides critical context for interpreting the GenAI pricing data that follows in Section 4. Q13 — Per GB Cost for Predictive Coding / Technology-Assisted Review Predictive coding and technology-assisted review (TAR) pricing has largely migrated away from per-GB billing. The highest single response category (35.8%) is 'alternative pricing model' — the highest alternative-model proportion of any per-GB question in the survey. Among those who do provide per-GB TAR pricing, 30.2% report rates below $75 per GB, 13.2% report $75–$150 per GB, and only 1.9% exceed $150 per GB. The 18.9% 'do not know' rate for TAR pricing suggests that many practitioners receive predictive coding as an embedded capability within their review platform subscription rather than a separately line-itemed service. This bundling trend, combined with the high alternative-model rate, indicates that standalone per-GB TAR billing is becoming the exception rather than the rule as platforms integrate AI-driven prioritization into standard hosting fees.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Review-Pricing-Per-GB-Cost-for-Predictive-Coding-in-a-Technology-Assisted-Review-Winter-2026.pdf" title="Review Pricing - Per GB Cost for Predictive Coding in a Technology-Assisted Review - Winter 2026"]

Q14 & Q15 — Per Hour Cost for Onsite and Remote Managed Review Attorneys Hourly managed review attorney rates are well understood and show a consistent onsite premium over remote delivery. For onsite review, 45.3% of respondents report rates exceeding $40 per hour, and 32.1% report $25–$40 per hour. For remote review, the distribution shifts: 41.5% report $25–$40 per hour, and 35.8% report greater than $40 per hour. The onsite premium reflects overhead recovery for physical review facilities, security infrastructure, and on-site supervision costs. Despite the normalization of remote review following the pandemic era, onsite review commands a persistent rate premium that clients with physical review requirements should anticipate. The relatively high 'do not know' rates for both onsite (18.9%) and remote (17.0%) suggest that many practitioners engage review vendors without direct visibility into the underlying attorney billing rates — a transparency gap that can make accurate matter budgeting difficult.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Review-Pricing-Per-Hour-Cost-for-Document-Review-Attorneys-to-Review-Documents-Onsite-Winter-2026.pdf" title="Review Pricing - Per Hour Cost for Document Review Attorneys to Review Documents Onsite - Winter 2026"]

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Review-Pricing-Per-Hour-Cost-for-Document-Review-Attorneys-to-Review-Documents-Remote-Winter-2026.pdf" title="Review Pricing - Per Hour Cost for Document Review Attorneys to Review Documents Remote - Winter 2026"]

Q16 & Q17 — Cost Per Document for Onsite and Remote Managed Review Per-document billing for human document review carries significant uncertainty across the respondent pool. For onsite per-document review, 34.0% of respondents indicate they do not know the cost — the highest 'do not know' rate among all document review questions. For remote per-document review, 30.2% report not knowing. Among those with visibility, the $0.50–$1.00 per document range dominates for both onsite (30.2%) and remote (28.3%) delivery, with onsite showing a higher proportion of rates exceeding $1.00 per document (22.6% vs. 18.9% remote). Remote per-document review trends lower at the bottom of the range: 13.2% report sub-$0.50 rates for remote work versus only 3.8% for onsite. This directional difference is consistent with lower overhead costs in remote delivery environments. In this analyst's view, where the $0.50–$1.00 per-document rate for human review meets GenAI-assisted pricing in the $0.11–$0.50 range, the economic case for AI-assisted review becomes direct — provided quality and defensibility standards are met. The per-document rate distribution for human review is strategically important as a baseline against which GenAI-assisted review pricing should be evaluated. Where human review rates run $0.50–$1.00 per document and GenAI-assisted alternatives are priced in the $0.11–$0.50 range, the cost differential is substantial enough to drive adoption decisions — though the economic case ultimately depends on matter-specific quality thresholds and the degree to which AI exception handling costs are controlled.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Review-Pricing-Per-Document-Cost-for-Document-Review-Attorneys-to-Review-Documents-Onsite-Winter-2026.pdf" title="Review Pricing - Per Document Cost for Document Review Attorneys to Review Documents Onsite - Winter 2026"]

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Review-Pricing-Per-Document-Cost-for-Document-Review-Attorneys-to-Review-Documents-Remote-Winter-2026.pdf" title="Review Pricing - Per Document Cost for Document Review Attorneys to Review Documents Remote - Winter 2026"]

Analyst Observation — Document Review Traditional document review rates have held relatively stable, but the market's increasing inability to articulate per-document pricing — particularly for onsite review — signals a structural shift away from document-count-based billing toward time-based models that are less directly comparable to AI-assisted pricing. Practitioners should push for per-document rate transparency in vendor proposals to enable genuine cost modeling against AI alternatives. Key Takeaways — Section 3

TAR/predictive coding billing is migrating away from per-GB models: 35.8% report alternative pricing, 18.9% don't know — bundled platform pricing is absorbing this cost.
Onsite managed review attorney rates exceed $40/hour for 45.3% of respondents vs. 35.8% for remote — the onsite premium persists.
Per-document review rates cluster in the $0.50–$1.00 range for both onsite and remote, with significant 'do not know' responses (34% onsite, 30.2% remote) indicating a transparency gap.
The $0.50–$1.00 per-document human review baseline sets up direct economic competition with emerging GenAI-assisted review pricing.

Section 4: GenAI-Assisted Review Pricing

The Winter 2026 survey's GenAI section was designed to illuminate where pricing clarity exists, where models are still fluid, and where the industry is beginning to form conventions around AI-assisted document review. What the results reveal is not a uniformly mature market but a bifurcated one: a segment of practitioners actively deploying and pricing GenAI review, and a substantial minority — 17.0% reporting it as not applicable or unknown — who have not yet engaged with it at a pricing level. Both cohorts are represented in the data, and the analysis in this section is relevant to each in different ways. This is not surprising. GenAI-assisted review introduces fundamentally different cost economics than traditional review: provider costs are driven by token consumption, GPU infrastructure, and model licensing — not attorney hours. Translating those costs into buyer-facing pricing structures that are transparent, predictable, and defensible has proven more difficult than the technology adoption itself. Q18 — Primary Model for GenAI-Assisted Review The two leading GenAI pricing models are effectively tied: hybrid pricing (combinations of multiple models) and per-document billing each account for 28.3% of primary model responses (15 respondents each). Per-GB billing captures 11.3%, per-token billing 5.7%, flat monthly subscription 5.7%, and outcome-based pricing 3.8%. Notably, 17.0% report that GenAI-assisted review pricing is not applicable or unknown to them — suggesting a meaningful share of the practitioner community has not yet engaged with AI review at a pricing level. The dominance of hybrid models reflects the reality that many providers are constructing bespoke proposals that combine per-document minimums, per-GB infrastructure charges, and platform subscription components. This complexity makes apples-to-apples comparison difficult for buyers — and may be intentional. Per-document pricing's co-equal standing with hybrid models suggests that a document-level unit of value is widely accepted as a conceptual billing anchor, even when the final structure is more complex. One respondent's comment illustrates the breadth of emerging structures not fully captured by the five survey model options: some providers are pricing GenAI review as an hourly professional service — with consultants performing query engineering, model interaction, and attorney collaboration — billed at standard hourly rates with per-matter minimums and not-to-exceed caps. This hourly professional service model sits outside the per-document or per-GB frameworks the market most commonly discusses, and its presence signals that GenAI pricing model diversity is wider than any single survey's categories can fully contain. Per-token pricing — the underlying cost reality for large language model deployments — has not been widely passed through to buyers (5.7%). This indicates that providers are currently absorbing token cost variability and presenting buyers with higher-order pricing units. As token costs evolve with model efficiency improvements, the degree to which providers pass these economics through will be an important market dynamic to watch.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/03/Review-Pricing-Primary-Model-for-Gen-AI-Assisted-Review-in-eDiscovery-Winter-2026.pdf" title="Review Pricing - Primary Model for Gen AI-Assisted Review in eDiscovery - Winter 2026"]

Q19 — Average Cost Per Document for GenAI-Assisted Review (Per-Document Model) Among all survey respondents, the $0.26–$0.50 per-document tier is the most frequently cited GenAI price point (20.8%), followed by both the $0.11–$0.25 and $0.05–$0.10 ranges (15.1% each). Seven and a half percent report per-document GenAI rates exceeding $0.50, and 5.7% report rates below $0.05. A significant 35.8% indicate this pricing model is not applicable to them or that they do not know the cost. The broad distribution among those with pricing visibility — from under a nickel to over fifty cents per document — reflects the wide variance in task complexity, model selection, and quality control overhead that different GenAI review implementations involve. The $0.11–$0.50 range represents the most commercially active zone. At the lower end, GenAI review offers compelling cost efficiency relative to the $0.50–$1.00 range for human per-document review. At the upper end of GenAI pricing (>$0.50), the value proposition requires stronger justification — particularly around accuracy, speed, or reduced downstream review burden. Practitioners should push vendors for specificity on what the per-document fee includes: model inference costs alone, or QC, exception handling, and reporting as well.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/03/Review-Pricing-Average-Cost-Per-Document-in-Per-Document-Model-of-Gen-AI-Assisted-Review-Winter-2026.pdf" title="Review Pricing - Average Cost Per Document in Per Document Model of Gen AI-Assisted Review - Winter 2026"]

Q20 — Average Cost Range for GenAI-Assisted Review (Per-GB Model) Per-GB GenAI pricing is less prevalent in practice — 64.2% of respondents indicate this model is not applicable or unknown. Among those who do report per-GB GenAI pricing, the $25–$50 per GB range is most common (17.0%), followed by below $25 per GB (13.2%). Two respondents (3.8%) report rates exceeding $100 per GB for GenAI review — likely representing specialized, computationally intensive analytical workflows rather than standard review acceleration. Given that data processing at ingestion typically falls below $75 per GB, a per-GB GenAI review charge layered on top represents a meaningful incremental cost. Practitioners evaluating per-GB GenAI pricing should model total matter economics carefully, including whether early data culling through AI reduces the volume that reaches review — potentially offsetting the per-GB GenAI charge with reduced processing and hosting costs downstream.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/03/Review-Pricing-Average-Cost-Range-Per-GB-in-Per-GB-Model-of-Gen-AI-Assisted-Review-Winter-2026.pdf" title="Review Pricing - Average Cost Range Per GB in Per GB Model of Gen AI-Assisted Review - Winter 2026"]

Q21 — Outcome-Based Pricing Structure for GenAI-Assisted Review Outcome-based pricing for GenAI review remains largely theoretical in the current market: 79.2% of respondents report no applicable experience with it. Among the minority with exposure, custom agreements dominate (9.4%), with small numbers reporting tiered pricing based on review speed improvements (3.8%), fixed fees based on achieved accuracy rates (3.8%), a combination of performance metrics (1.9%), and percentage of cost savings compared to traditional review (1.9%). The theoretical appeal of outcome-based pricing is clear — it aligns provider incentives with client results and distributes AI benefit-sharing in a transparent way. The operational mechanisms, however, remain underdeveloped. Defining accuracy baselines, attributing speed gains to AI versus staffing decisions, and calculating savings against hypothetical traditional review costs are all methodologically complex. The custom-agreement dominance (9.4%) reflects that outcome-based structures, where they exist, are negotiated on a bespoke basis without market-standardized frameworks. In this analyst's view, this is an area where the industry is likely to see active experimentation and standardization attempts in coming survey cycles — though the timeline will depend on how quickly buyers begin demanding performance accountability in AI review contracts.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/03/Review-Pricing-Typical-Structure-of-Outcome-Based-Pricing-Models-in-Gen-AI-Assisted-Review-Winter-2026.pdf" title="Review Pricing - Typical Structure of Outcome-Based Pricing Models in Gen AI-Assisted Review - Winter 2026"]

Q22 — How Pricing Models Handle Failed or Exception Documents in GenAI Review Exception document handling — documents that fail AI processing or require human intervention — is a practical and financially significant issue that is significantly underappreciated in headline GenAI pricing discussions. Nearly 40% of respondents (39.6%) cannot speak to how their contracts address this scenario. Among those with visibility, no single approach dominates: 18.9% report that exception documents route to manual review at standard rates; 17.0% say handling depends on the specific issue encountered; 9.4% each report that exceptions are charged as additional processing time or included in the base price (no additional charge); and 5.7% report per-document exception billing. The variability of exception handling approaches — and the high proportion of respondents with no visibility — represents a meaningful contract risk for buyers. In matters where a significant share of documents require human intervention, the effective cost of a GenAI-assisted review engagement can increase substantially depending on which exception pricing structure applies. Buyers negotiating GenAI review engagements should require explicit exception handling clauses that specify the triggering conditions, billing treatment, and quality control obligations for documents that exit the AI workflow.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/03/Review-Pricing-Accounting-for-Docs-That-Fail-To-Process-or-Require-Special-Handing-Gen-AI-Winter-2026.pdf" title="Review Pricing - Accounting for Docs That Fail To Process or Require Special Handing (Gen AI) - Winter 2026"]

Analyst Observation — GenAI-Assisted Review The GenAI pricing market is operationally engaged but structurally immature. The concentration in hybrid and per-document models reflects practitioners and providers reaching for familiar pricing analogues while the technology matures. The $0.11–$0.50 per-document zone is emerging as a competitive market range — one that creates genuine economic pressure on traditional human review for appropriate document populations. The most important near-term challenge for the market is not the headline per-document or per-GB rate, but the hidden cost variables: exception document handling, quality control overhead, model retraining requirements, and the total cost of ownership of integrating GenAI review into existing workflows. One survey respondent offered a perspective worth placing on record: many vendors are still determining their AI pricing strategies, rushing to market to capture first-mover advantage or market share — and that token-based pricing pressures may cause AI solution costs to increase materially in the future, absent significant reductions in GPU infrastructure costs. This caution deserves attention as buyers evaluate multi-year GenAI review commitments. Key Takeaways — Section 4

Hybrid and per-document models are the dominant GenAI pricing structures, each at 28.3% — the market has converged on document-level units but not uniform delivery structures.
The $0.11–$0.50 per-document range is the emerging competitive zone for GenAI-assisted review, with direct economic implications for traditional human review.
Per-token pricing has not been widely passed to buyers (5.7%) — providers are absorbing LLM cost variability for now.
Outcome-based GenAI pricing is theoretically compelling but operationally undeveloped; 79.2% of respondents have no applicable experience.
Exception document handling is an underappreciated contract risk: 39.6% don't know how their agreements address it, and no standard approach has emerged.

Conclusion and Strategic Implications

The Winter 2026 eDiscovery Pricing Survey paints a picture of a market undergoing layered transitions simultaneously: forensic services have found stable pricing floors; processing and hosting have bifurcated between commoditized infrastructure and differentiated analytics tiers; document review is experiencing pricing model fragmentation as AI alternatives create new economic reference points; and GenAI-assisted review is operationally deployed but commercially immature in its pricing structures. For eDiscovery Buyers and Legal Operations Professionals The $250–$350 per hour range for forensic collection provides a reliable negotiation baseline, but buyers should build explicit rate schedules covering analysis and testimony phases — where rates routinely exceed $350 and frequently surpass $550 per hour. Processing and hosting negotiations should move beyond per-GB benchmarks for analytics-enabled and TAR-related services, where bundled models increasingly dominate. For document review, the critical action item is requiring per-document rate transparency even when hourly billing is the primary model — enabling genuine cost modeling against AI review alternatives. Corporate legal operations professionals face a distinct version of these challenges. Unlike law firms that pass eDiscovery costs to clients, in-house legal departments absorb them entirely — making pricing transparency a budget integrity issue, not just a negotiation tactic. The hosting commoditization finding (54.7% below $10/GB/month for basic hosting) and the user licensing transition (34.0% of respondents on alternative models) both represent leverage points in enterprise vendor negotiations that legal operations teams can use directly. The project management escalation finding (26.4% above $200/hour) warrants particular attention for in-house teams managing multi-matter portfolios: as PM rates rise with engagement complexity, the cost of inadequate internal scoping and vendor coordination compounds. Corporate legal operations teams are well-positioned to offset this by investing in internal eDiscovery program management capability rather than outsourcing all coordination to vendor project managers at premium rates. For GenAI-assisted review engagements, two contractual priorities stand out: first, obtain explicit pricing for exception documents rather than accepting provider discretion; second, require specificity on what is included in per-document or per-GB GenAI rates to enable accurate total-cost modeling. The $0.11–$0.50 per-document range is commercially viable for appropriate document populations, but hidden costs can erode that advantage quickly if not addressed in the agreement. For eDiscovery Service Providers and Technology Vendors The survey data confirms that buyers are engaging with GenAI pricing at a level of sophistication that requires providers to move beyond introductory pricing structures. The dominance of hybrid models reflects buyer uncertainty as much as provider flexibility — and that uncertainty is not sustainable as GenAI review becomes a standard engagement component rather than a premium add-on. Providers who develop clear, reproducible pricing structures with transparent exception handling will differentiate themselves in a market where 39.6% of buyers currently report no visibility into this critical cost variable. The trajectory of outcome-based pricing deserves attention. While only a small minority of respondents currently have exposure to these models, the direction of the market — toward accountability for AI review quality, not just delivery — suggests that providers who invest in outcome measurement frameworks now will be better positioned as client sophistication increases.

Looking Ahead: Open Questions for the Evolving eDiscovery Pricing Landscape

Several questions worth watching in future survey cycles: Will per-token pricing migrate from provider cost basis to buyer-facing billing as LLM economics become more visible? Will outcome-based pricing develop standardized frameworks, or remain bespoke indefinitely? Will the onsite/remote premium for forensic collection and attorney review compress as remote delivery tools mature further? And will the exception document handling gap in GenAI contracts become a litigation issue that forces market standardization? The Pricing Pulse series will continue to track these dynamics. The Winter 2026 results establish a pricing baseline at a pivotal moment — one that future surveys will be measured against as generative AI transforms both the economics and the practice of eDiscovery.

Research Methodology Note

The Winter 2026 eDiscovery Pricing Survey was designed and administered by ComplexDiscovery OÜ in partnership with the Electronic Discovery Reference Model (EDRM) as part of the Pricing Pulse research series. The survey was conducted via an online form distributed through ComplexDiscovery's professional community and partner networks. The survey period ran from December 2025 through February 2026, with the data collection window closing upon reaching the final respondent cohort of 53 individuals. The survey comprised 25 pricing questions organized across four service categories — forensic collection and examination, data processing and hosting, document review, and GenAI-assisted review — plus three respondent classification questions addressing geography, business segment, and primary function. Response options were structured as defined ranges rather than open-ended numeric inputs to facilitate comparative analysis and protect respondent pricing confidentiality. All responses represent self-reported market observations and practitioner experience. Results should be interpreted as directional market intelligence reflecting current practitioner perceptions of prevailing pricing, not as verified transaction records or audited benchmarks. The U.S.-centric geographic distribution (92.5%) should be taken into account when applying findings to non-U.S. markets. ComplexDiscovery OÜ maintains editorial independence in the analysis and publication of survey results. Individual respondent data is treated as confidential; only aggregated findings are reported. ComplexDiscovery and the Electronic Discovery Reference Model (EDRM) thank the 53 practitioners and professionals who contributed their time and market knowledge to this research. Organizations and individuals interested in participating in future Pricing Pulse surveys are encouraged to connect with ComplexDiscovery at complexdiscovery.com. © 2026 ComplexDiscovery OÜ. All rights reserved. Published on ComplexDiscovery.com. Conducted in partnership with the Electronic Discovery Reference Model (EDRM). The Pricing Pulse is an ongoing research series examining pricing dynamics across the eDiscovery market. News Source

Rob Robinson and Holley Robinson, ComplexDiscovery OÜ, "Winter 2026 eDiscovery Pricing Survey," February 2026.

[the_ad_group id="12741"]

Assisted by GAI and LLM Technologies Additional Reading

Source: ComplexDiscovery OÜ

ComplexDiscovery’s mission is to enable clarity for complex decisions by providing independent, data‑driven reporting, research, and commentary that make digital risk, legal technology, and regulatory change more legible for practitioners, policymakers, and business leaders.

[exclude_from_rss]

[taq_review]

[/exclude_from_rss] Industry Research

A Complete Analysis of the Winter 2026 eDiscovery Pricing Survey

ComplexDiscovery Staff

Executive Summary

About the Survey

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Survey-Respondents-by-Primary-Function-Winter-2026.pdf" title="Survey Respondents by Primary Function - Winter 2026"]

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Survey-Respondents-by-Geographic-Region-Winter-2026.pdf" title="Survey Respondents by Geographic Region - Winter 2026"]

Section 1: Forensic Collection, Examination, and Testimony Pricing

$250–$350/hour is the market anchor for both onsite and remote forensic collection (56.6% each).
Onsite collection carries a measurable premium: 20.8% report >$350/hour vs. 5.7% for remote.
Mobile device collection rates have converged with computer collection at the upper tier (both ~50% report >$350/device).
Investigation, analysis, and report generation rates escalate to $350–$550/hour for 54.7% of respondents.
Expert witness testimony exceeds $550/hour for 26.4% — the highest proportion across all survey categories.

Section 2: Data Processing, Hosting, and Project Management Pricing

Processing at ingestion is largely below $75/GB (73.6% combined), but completion-phase pricing climbs with 24.5% reporting $100/GB or more.
Alternative pricing models account for 18.9% at ingestion and 22.6% at completion — signaling a structural shift away from per-GB processing billing.
Basic hosting has commoditized: 54.7% report sub-$10/GB/month. Analytics hosting retains differentiation with 11.3% exceeding $25/GB/month.
User licensing is migrating from per-seat to bundled models — 34.0% report alternative pricing structures.
Project management rates are well understood and rising: 26.4% now exceed $200/hour, reflecting growing engagement complexity.

Section 3: Document Review Pricing

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Review-Pricing-Per-Hour-Cost-for-Document-Review-Attorneys-to-Review-Documents-Remote-Winter-2026.pdf" title="Review Pricing - Per Hour Cost for Document Review Attorneys to Review Documents Remote - Winter 2026"]

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Review-Pricing-Per-Document-Cost-for-Document-Review-Attorneys-to-Review-Documents-Remote-Winter-2026.pdf" title="Review Pricing - Per Document Cost for Document Review Attorneys to Review Documents Remote - Winter 2026"]

TAR/predictive coding billing is migrating away from per-GB models: 35.8% report alternative pricing, 18.9% don't know — bundled platform pricing is absorbing this cost.
Onsite managed review attorney rates exceed $40/hour for 45.3% of respondents vs. 35.8% for remote — the onsite premium persists.
Per-document review rates cluster in the $0.50–$1.00 range for both onsite and remote, with significant 'do not know' responses (34% onsite, 30.2% remote) indicating a transparency gap.
The $0.50–$1.00 per-document human review baseline sets up direct economic competition with emerging GenAI-assisted review pricing.

Section 4: GenAI-Assisted Review Pricing

Hybrid and per-document models are the dominant GenAI pricing structures, each at 28.3% — the market has converged on document-level units but not uniform delivery structures.
The $0.11–$0.50 per-document range is the emerging competitive zone for GenAI-assisted review, with direct economic implications for traditional human review.
Per-token pricing has not been widely passed to buyers (5.7%) — providers are absorbing LLM cost variability for now.
Outcome-based GenAI pricing is theoretically compelling but operationally undeveloped; 79.2% of respondents have no applicable experience.
Exception document handling is an underappreciated contract risk: 39.6% don't know how their agreements address it, and no standard approach has emerged.

Conclusion and Strategic Implications

Looking Ahead: Open Questions for the Evolving eDiscovery Pricing Landscape

Research Methodology Note

Rob Robinson and Holley Robinson, ComplexDiscovery OÜ, "Winter 2026 eDiscovery Pricing Survey," February 2026.

[the_ad_group id="12741"]

Assisted by GAI and LLM Technologies Additional Reading

Source: ComplexDiscovery OÜ

ComplexDiscovery’s mission is to enable clarity for complex decisions by providing independent, data‑driven reporting, research, and commentary that make digital risk, legal technology, and regulatory change more legible for practitioners, policymakers, and business leaders.