From Platforms to Workflows: Predictive Coding Technologies and Protocols Survey – Fall 2019 Results

Sep 5, 2019

Editor’s Note: This is the third Predictive Coding and Technologies and Protocols Survey conducted by ComplexDiscovery. Initiated in the fall of 2018 and refreshed semi-annually, this iteration of the survey was graciously distributed and reviewed by the leadership team* at the Association of Certified E-Discovery Specialists (ACEDS) and had the highest responder rate of any predictive coding survey to date with 100 data and legal discovery professionals sharing their understanding and experience as part of the four question survey. Provided below are the results of the fall 2019 survey with the hope that the general, non-scientific survey results may help eDiscovery professionals as they consider predictive coding platforms, technologies, protocols, workflows, and uses.

The Predictive Coding Technologies and Protocols Fall 2019 Survey

The Predictive Coding Technologies and Protocols Survey is a non-scientific survey designed to help provide a general understanding of the use of predictive coding technologies, protocols, and workflows by data discovery and legal discovery professionals within the eDiscovery ecosystem. The fall 2019 survey was open from August 23, 2019, through September 5, 2019, with individuals invited to participate directly by ComplexDiscovery and indirectly by industry website, blog, and newsletter mentions.

Designed to provide a general understanding of predictive coding technologies and protocols, the survey had two primary educational objectives:

To provide a consolidated listing of potential predictive coding technology, protocol, and workflow definitions. While not all-inclusive or comprehensive, the listing was vetted with selected industry predictive coding experts for completeness and accuracy, thus it appears to be profitable for use in educational efforts.
To ask eDiscovery ecosystem professionals about their preferences regarding predictive coding platforms, technologies, protocols, workflows, and areas of usage.

The survey offered responders an opportunity to provide predictive coding background information, including their primary predictive coding platform, as well as posed four specific questions to responders. Those questions being:

Which predictive coding technologies are utilized by your eDiscovery platform?
Which technology-assisted review protocols are utilized in your delivery of predictive coding?
What is the primary technology-assisted review workflow utilized in your delivery of predictive coding?
What are the areas where you use technology-assisted review technologies, protocols, and workflows?

Closed on September 5, 2019, the fall 2019 survey had 100 responders.

Key Results and Observations

Primary Predictive Coding Platform (Chart 1)

86% of responders reported that they have at least one primary platform for predictive coding.
There were 31 different platforms reported as a primary predictive coding platform by responders.
Relativity was reported as a primary predictive coding platform in approximately 35% of survey responses.
The top six platforms were reported as a primary predictive coding platform in 62.5% of survey responses.
14% of responders reported they had no primary platform for predictive coding.

Predictive Coding Technology Employment (Chart 2)

All listed predictive coding technologies were reported as being used by at least one survey responder.
Active Learning was reported as the most used predictive coding technology with 86% of responders using it in their predictive coding efforts.
51% of responders reported using only one predictive coding technology in their predictive coding efforts.
48% of responders reported using more than one predictive coding technology in their predictive coding efforts.
1% of responders reported not using any predictive coding technology.

Technology-Assisted Review Protocol Employment (Chart 3)

All listed technology-assisted protocols for predictive coding were reported as being used by at least one survey responder.
Continuous Active Learning (CAL) was reported as the most used predictive coding protocol with 82% of responders using it in their predictive coding efforts.
52% of responders reported using only one predictive coding protocol in their predictive coding efforts.
45% of responders reported using more than one predictive coding protocol in their predictive coding efforts.
3% of responders reported not using any predictive coding protocol.

Technology-Assisted Review Workflow Employment (Chart 4)

66% of responders reported using Technology-Assisted Review (TAR) 2.0 as a primary workflow in the delivery of predictive coding.
12% of responders reported using TAR 1.0 and 13% of responders reported using TAR 3.0 as a primary workflow in the delivery of predictive coding.
11% of responders reported not using either TAR 1.0, TAR 2.0, or TAR 3.0 as a primary workflow in the delivery of predictive coding.

Technology-Assisted Review Uses (Chart 5)

91% of responders reported using technology-assisted review in more than one area of data and legal discovery.
89% of responders reported using technology-assisted review for the identification of relevant documents.
9% of responders reported using technology-assisted review for information governance and data disposition.

Predictive Coding Technology and Protocol Survey Responder Overview (Chart 6)

39% of responders were from law firms.
37% of responders were from software or services provider organizations.
The remaining 24% of responders were either part of a consultancy (12%), a corporation (6%), the government (3%), or another type of entity (3%).

Survey Charts

(Charts can be expanded for detailed viewing.)

Chart 1: Name of Primary Predictive Coding Platform

Chart 1 – Primary Predictive Coding Platform

Chart 2: Which predictive coding technologies are utilized by your eDiscovery platform?

Chart 2 – Predictive Coding Technology Employment

Chart 3: Which technology-assisted review protocols are utilized in your delivery of predictive coding?

Chart 3 – Technology-Assisted Review Protocol Employment

Chart 4: What is the primary technology-assisted review workflow utilized in your delivery of predictive coding?

Chart 4 – Technology-Assisted Review Workflow Employment

Chart 5: What are the areas where you use technology-assisted review technologies, protocols, and workflows?

Chart 5 – Technology-Assisted Review Uses

Chart 6: Survey Responder Overview

Chart 6 – Predictive Coding Technologies and Protocols Survey Overview

Predictive Coding Technologies and Protocols (Survey Backgrounder)

As defined in The Grossman-Cormack Glossary of Technology-Assisted Review (1), Predictive Coding is an industry-specific term generally used to describe a technology-assisted review process involving the use of a machine learning algorithm to distinguish relevant from non-relevant documents, based on a subject matter expert’s coding of a training set of documents. This definition of predictive coding provides a baseline description that identifies one particular function that a general set of commonly accepted machine learning algorithms may use in a technology-assisted review (TAR).

With the growing awareness and use of predictive coding in the legal arena today, it appears that it is increasingly more important for electronic discovery professionals to have a general understanding of the technologies that may be implemented in electronic discovery platforms to facilitate predictive coding of electronically stored information. This general understanding is essential as each potential algorithmic approach has efficiency advantages and disadvantages that may impact the efficiency and efficacy of predictive coding.

To help in developing this general understanding of predictive coding technologies and to provide an opportunity for electronic discovery providers to share the technologies and protocols they use in and with their platforms to accomplish predictive coding, the following working lists of predictive coding technologies and TAR protocols are provided for your use. Working lists on predictive coding workflows and uses are also included for your consideration as they help define how the predictive coding technologies and TAR protocols are implemented and used.

A Working List of Predictive Coding Technologies (1,2,3,4)

Aggregated from electronic discovery experts based on professional publications and personal conversations, provided below is a non-all inclusive working list of identified machine learning technologies that have been applied or have the potential to be applied to the discipline of eDiscovery to facilitate predictive coding. This working list is designed to provide a reference point for identified predictive coding technologies and may over time include additions, adjustments, and amendments based on feedback from experts and organizations applying and implementing these mainstream technologies in their specific eDiscovery platforms.

Listed in Alphabetical Order

Active Learning: A process, typically iterative, whereby an algorithm is used to select documents that should be reviewed for training based on a strategy to help the classification algorithm learn efficiently.
Decision Tree: A step-by-step method of distinguishing between relevant and non-relevant documents, depending on what combination of words (or other features) they contain. A Decision Tree to identify documents pertaining to financial derivatives might first determine whether or not a document contained the word “swap.” If it did, the Decision Tree might then determine whether or not the document contained “credit,” and so on. A Decision Tree may be created either through knowledge engineering or machine learning.
k-Nearest Neighbor Classifier (k-NN): A classification algorithm that analyzes the k example documents that are most similar (nearest) to the document being classified in order to determine the best classification for the document. If k is too small (e.g., k=1), it may be extremely difficult to achieve high recall.
Latent Semantic Analysis (LSA): A mathematical representation of documents that treats highly correlated words (i.e., words that tend to occur in the same documents) as being, in a sense, equivalent or interchangeable. This equivalency or interchangeability can allow algorithms to identify documents as being conceptually similar even when they aren’t using the same words (e.g., because synonyms may be highly correlated), though it also discards some potentially useful information and can lead to undesirable results caused by spurious correlations.
Logistic Regression: A state-of-the-art supervised learning algorithm for machine learning that estimates the probability that a document is relevant, based on the features that it contains. In contrast to the Naïve Bayes, algorithm, Logistic Regression identifies features that discriminate between relevant and non-relevant documents.
Naïve Bayesian Classifier: A system that examines the probability that each word in a new document came from the word distribution derived from a trained responsive document or trained non-responsive documents. The system is naïve in the sense that it assumes that all words are independent of one another.
Neural Network: An Artificial Neural Network (ANN) is a computational model. It is based on the structure and functions of biological neural networks. It works like the way the human brain processes information. It includes a large number of connected processing units that work together to process information.
Probabilistic Latent Semantic Analysis (PLSA): This is similar in spirit to LSA but it uses a probabilistic model to achieve results that are expected to be better.
Random Forests: An ensemble learning method for classification, regression, and other tasks, that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. Random decision forests correct for decision trees’ habit of overfitting to their training set.
Relevance Feedback: An active learning process in which the documents with the highest likelihood of relevance are coded by a human, and added to the training set.
Support Vector Machine: A mathematical approach that seeks to find a line that separates responsive from non-responsive documents so that, ideally, all of the responsive documents are on one side of the line and all of the non-responsive ones are on the other side.

General TAR Protocols (5,6,7,8)

Additionally, these technologies are generally employed as part of a TAR protocol which determines how the technologies are used. Examples of TAR protocols include:

Listed in Alphabetical Order

Continuous Active Learning (CAL): In CAL, the TAR method developed, used, and advocated by Maura R. Grossman and Gordon V. Cormack, after the initial training set, the learner repeatedly selects the next-most-likely-to-be-relevant documents (that have not yet been considered) for review, coding, and training, and continues to do so until it can no longer find any more relevant documents. There is generally no second review because, by the time the learner stops learning, all documents deemed relevant by the learner have already been identified and manually reviewed.
Hybrid Multimodal Method: An approach developed by the e-Discovery Team (Ralph Losey) that includes all types of search methods, with primary reliance placed on predictive coding and the use of high-ranked documents for continuous active training.
Scalable Continuous Active Learning (S-CAL): The essential difference between S-CAL and CAL is that for S-CAL, only a finite sample of documents from each successive batch is selected for labeling, and the process continues until the collection—or a large random sample of the collection—is exhausted. Together, the finite samples form a stratified sample of the document population, from which a statistical estimate of ρ may be derived.
Simple Active Learning (SAL): In SAL methods, after the initial training set, the learner selects the documents to be reviewed and coded by the teacher, and used as training examples, and continues to select examples until it is sufficiently trained. Typically, the documents the learner chooses are those about which the learner is least certain, and therefore from which it will learn the most. Once sufficiently trained, the learner is then used to label every document in the collection. As with SPL, the documents labeled as relevant are generally re-reviewed manually.
Simple Passive Learning (SPL): In simple passive learning (“SPL”) methods, the teacher (i.e., human operator) selects the documents to be used as training examples; the learner is trained using these examples, and once sufficiently trained, is used to label every document in the collection as relevant or non-relevant. Generally, the documents labeled as relevant by the learner are re-reviewed manually. This manual review represents a small fraction of the collection, and hence a small fraction of the time and cost of an exhaustive manual review.

TAR Workflows (9)

TAR workflows represent the practical application of predictive coding technologies and protocols to define approaches to completing predictive coding tasks. Three examples of TAR workflows include:

TAR 1.0 involves a training phase followed by a review phase with a control set being used to determine the optimal point when you should switch from training to review. The system no longer learns once the training phase is completed. The control set is a random set of documents that have been reviewed and marked as relevant or non-relevant. The control set documents are not used to train the system. They are used to assess the system’s predictions so training can be terminated when the benefits of additional training no longer outweigh the cost of additional training. Training can be with randomly selected documents, known as Simple Passive Learning (SPL), or it can involve documents chosen by the system to optimize learning efficiency, known as Simple Active Learning (SAL).
TAR 2.0 uses an approach called Continuous Active Learning (CAL), meaning that there is no separation between training and review–the system continues to learn throughout. While many approaches may be used to select documents for review, a significant component of CAL is many iterations of predicting which documents are most likely to be relevant, reviewing them, and updating the predictions. Unlike TAR 1.0, TAR 2.0 tends to be very efficient even when prevalence is low. Since there is no separation between training and review, TAR 2.0 does not require a control set. Generating a control set can involve reviewing a large (especially when prevalence is low) number of non-relevant documents, so avoiding control sets is desirable.
TAR 3.0 requires a high-quality conceptual clustering algorithm that forms narrowly focused clusters of fixed size in concept space. It applies the TAR 2.0 methodology to just the cluster centers, which ensures that a diverse set of potentially relevant documents are reviewed. Once no more relevant cluster centers can be found, the reviewed cluster centers are used as training documents to make predictions for the full document population. There is no need for a control set–the system is well-trained when no additional relevant cluster centers can be found. Analysis of the cluster centers that were reviewed provides an estimate of the prevalence and the number of non-relevant documents that would be produced if documents were produced based purely on the predictions without human review. The user can decide to produce documents (not identified as potentially privileged) without review, similar to SAL from TAR 1.0 (but without a control set), or he/she can decide to review documents that have too much risk of being non-relevant (which can be used as additional training for the system, i.e., CAL). The key point is that the user has the info he/she needs to make a decision about how to proceed after completing review of the cluster centers that are likely to be relevant, and nothing done before that point becomes invalidated by the decision (compare to starting with TAR 1.0, reviewing a control set, finding that the predictions aren’t good enough to produce documents without review, and then switching to TAR 2.0, which renders the control set virtually useless).

TAR Uses (10)

TAR technologies, protocols, and workflows can be used effectively to help eDiscovery professionals accomplish many data discovery and legal discovery tasks. Nine commonly considered examples of TAR use include:

Identification of Relevant Documents
Early Case Assessment/Investigation
Prioritization for Review
Categorization (By Issues, For Confidentiality or Privacy)
Privilege Review
Quality Control and Quality Assurance
Review of Incoming Productions
Disposition/Trial Preparation
Information Governance and Data Disposition

Survey Information (11,12,13,14, 15)

References

(1) Grossman, M. and Cormack, G. (2013). The Grossman-Cormack Glossary of Technology-Assisted Review. [ebook] Federal Courts Law Review. Available at: http://www.fclr.org/fclr/articles/html/2010/grossman.pdf [Accessed 31 Aug. 2018].

(2) Dimm, B. (2018). Expertise on Predictive Coding. [email].

(3) Roitblat, H. (2013). Introduction to Predictive Coding. [ebook] OrcaTec. Available at: https://theolp.wildapricot.org/Resources/Documents/Introduction%20to%20Predictive%20Coding%20-%20Herb%20Roitblat.pdf [Accessed 31 Aug. 2018].

(4) Tredennick, J. and Pickens, J. (2017). Deep Learning in E-Discovery: Moving Past the Hype. [online] Catalystsecure.com. Available at: https://catalystsecure.com/blog/2017/07/deep-learning-in-e-discovery-moving-past-the-hype/ [Accessed 31 Aug. 2018].

(5) Grossman, M. and Cormack, G. (2017). Technology-Assisted Review in Electronic Discovery. [ebook] Available at: https://judicialstudies.duke.edu/wp-content/uploads/2017/07/Panel-1_TECHNOLOGY-ASSISTED-REVIEW-IN-ELECTRONIC-DISCOVERY.pdf [Accessed 31 Aug. 2018].

(6) Grossman, M. and Cormack, G. (2016). Continuous Active Learning for TAR. [ebook] Practical Law. Available at: https://pdfs.semanticscholar.org/ed81/f3e1d35d459c95c7ef60b1ba0b3a202e4400.pdf [Accessed 31 Aug. 2018].

(7) Grossman, M. and Cormack, G. (2016). Scalability of Continuous Active Learning for Reliable High-Recall Text Classification. [ebook] Available at: https://plg.uwaterloo.ca/~gvcormac/scal/cormackgrossman16a.pdf [Accessed 3 Sep. 2018].

(8) Losey, R., Sullivan, J. and Reichenberger, T. (2015). e-Discovery Team at TREC 2015 Total Recall Track. [ebook] Available at: https://trec.nist.gov/pubs/trec24/papers/eDiscoveryTeam-TR.pdf[Accessed 1 Sep. 2018].

(9) Dimm, B. (2016), TAR 3.0 Performance. [online] Clustify Blog – eDiscovery, Document Clustering, Predictive Coding, Information Retrieval, and Software Development. Available at: https://blog.cluster-text.com/2016/01/28/tar-3-0-performance/ [Accessed 18 Feb. 2019].

(10) Electronic Discovery Reference Model (EDRM) (2019). Technology Assisted Review (TAR) Guidelines. [online] Available at: https://www.edrm.net/wp-content/uploads/2019/02/TAR-Guidelines-Final.pdf [Accessed 18 Feb. 2019].

(11) Dimm, B. (2018). TAR, Proportionality, and Bad Algorithms (1-NN). [online] Clustify Blog – eDiscovery, Document Clustering, Predictive Coding, Information Retrieval, and Software Development. Available at: https://blog.cluster-text.com/2018/08/13/tar-proportionality-and-bad-algorithms-1-nn/ [Accessed 31 Aug. 2018].

(12) Robinson, R. (2013). Running Results: Predictive Coding One-Question Provider Implementation Survey. [online] ComplexDiscovery: eDiscovery Information. Available at: https://complexdiscovery.com/2013/03/05/running-results-predictive-coding-one-question-provider-implementation-survey/ [Accessed 31 Aug. 2018].

(13) Robinson, R. (2018). A Running List: Top 100+ eDiscovery Providers. [online] ComplexDiscovery: eDiscovery Information. Available at: https://complexdiscovery.com/2017/01/19/28252/ [Accessed 31 Aug. 2018].

(14) Robinson, R. (2018) Relatively Speaking: Predictive Coding Technologies and Protocols Survey Results [online] ComplexDiscovery: eDiscovery Information. Available at: https://complexdiscovery.com/relatively-speaking-predictive-coding-technologies-and-protocols-survey-results/ [Accessed 18 Feb. 2019].

(15) Robinson, R. (2019) Actively Learning? Predictive Coding Technologies and Protocols Survey Results [online] ComplexDiscovery: eDiscovery Information. Available at: https://complexdiscovery.com/actively-learning-predictive-coding-technologies-and-protocols-survey-spring-2019-results/ [Accessed 22 Aug. 2019]

Click here to provide specific additions, corrections, and updates.

* Direct distribution and review collaboration and support from Mary Mack, Executive Director of ACEDS, and Kaylee Walstad, Vice President of Client Engagement of ACEDS.

Source: ComplexDiscovery

Have a Request?

If you have information or offering requests that you would like to ask us about, please let us know, and we will make our response to you a priority.

ComplexDiscovery OÜ is an independent digital publication and research organization based in Tallinn, Estonia. ComplexDiscovery covers cybersecurity, data privacy, regulatory compliance, and eDiscovery, with reporting that connects legal and business technology developments—including high-growth startup trends—to international business, policy, and global security dynamics. Focusing on technology and risk issues shaped by cross-border regulation and geopolitical complexity, ComplexDiscovery delivers editorial coverage, original analysis, and curated briefings for a global audience of legal, compliance, security, and technology professionals. Learn more at ComplexDiscovery.com.

Generative Artificial Intelligence and Large Language Model Use

ComplexDiscovery OÜ recognizes the value of GAI and LLM tools in streamlining content creation processes and enhancing the overall quality of its research, writing, and editing efforts. To this end, ComplexDiscovery OÜ regularly employs GAI tools, including ChatGPT, Claude, Gemini, Grammarly, Midjourney, and Perplexity, to assist, augment, and accelerate the development and publication of both new and revised content in posts and pages published (initiated in late 2022).

ComplexDiscovery also provides a ChatGPT-powered AI article assistant for its users. This feature leverages LLM capabilities to generate relevant and valuable insights related to specific page and post content published on ComplexDiscovery.com. By offering this AI-driven service, ComplexDiscovery OÜ aims to create a more interactive and engaging experience for its users, while highlighting the importance of responsible and ethical use of GAI and LLM technologies.

Pricing

Editor's Note: Generative AI is no longer a future-state concept in eDiscovery pricing; it is already reshaping how legal, technology, and corporate teams evaluate cost, value, and defensibility. In this Winter 2026 Pricing Pulse analysis, ComplexDiscovery OÜ, in partnership with EDRM, examines a market that is simultaneously stabilizing in traditional service categories and fragmenting in newer AI-driven ones. The findings highlight a clear divide between established pricing norms for forensic collection, processing, hosting, and document review, and the still-developing commercial models emerging around GenAI-assisted review. For cybersecurity, data privacy, regulatory compliance, and eDiscovery professionals, that divide matters. Pricing transparency now directly affects budgeting, vendor selection, matter planning, and risk management—especially as organizations weigh the promise of AI efficiency against unresolved questions around exception handling, quality control, and contract structure. This analysis offers a timely benchmark for understanding where the market stands today and where pricing pressure is likely to intensify next.

[exclude_from_rss]

[taq_review]

[/exclude_from_rss] Industry Research

A Complete Analysis of the Winter 2026 eDiscovery Pricing Survey

ComplexDiscovery Staff

Executive Summary

The Winter 2026 eDiscovery Pricing Survey, conducted by ComplexDiscovery OÜ in partnership with the Electronic Discovery Reference Model (EDRM) across December 2025 through February 2026, captures a market at a pivotal inflection point. Generative AI (GenAI) has moved into operational workflows for a significant and growing segment of the eDiscovery market — but adoption is uneven, pricing frameworks have not kept pace, and a meaningful share of practitioners have not yet engaged with AI-assisted review at any level. That bifurcation between early adopters and the rest of the market is itself one of the survey's defining findings. Drawing on 53 responses from legal professionals, technology providers, corporations, and consultancies, this survey provides a detailed pricing snapshot of the current eDiscovery market, spanning forensic collection, data processing and hosting, document review, and GenAI-assisted review. Several clear signals emerge from the data. Forensic collection and examination rates have stabilized in the $250–$350 per hour range for standard work, with premium rates for testimony and analysis. Data hosting has commoditized meaningfully at the infrastructure level, while analytics-enabled hosting retains pricing differentiation. Document review rates are stable but per-document billing remains opaque. Most critically, GenAI-assisted review pricing is experimentally diverse — hybrid models and per-document billing each claim roughly 28% of reported primary models, with the $0.11–$0.50 per-document range emerging as a competitive zone that directly challenges traditional human review economics. This report covers all 25 survey questions, organized into four thematic sections, with analyst observations and strategic implications throughout. All findings represent self-reported practitioner perceptions of prevailing market pricing — not verified transaction records — and should be read as directional market intelligence. Unlike vendor-produced or client-commissioned pricing guides, the Pricing Pulse is designed and published independently by ComplexDiscovery OÜ in partnership with the Electronic Discovery Reference Model (EDRM), with no commercial interest in any specific pricing outcome.

About the Survey

Survey Design and Purpose The Winter 2026 eDiscovery Pricing Survey was designed and administered by ComplexDiscovery OÜ in partnership with the Electronic Discovery Reference Model (EDRM) as part of its ongoing Pricing Pulse research program. The survey's primary purpose is to provide eDiscovery practitioners, technology providers, and legal operations professionals with empirically grounded pricing benchmarks across the key service categories that define the eDiscovery market. The Pricing Pulse is practitioner-reported and independently produced — it is not sponsored by, or designed to favor, any vendor, platform, or service category. Respondent comments critiquing the survey design itself are actively incorporated into future iterations, as reflected in this report's processing methodology note. This iteration of the survey placed particular emphasis on generative AI-assisted review pricing — a category first addressed formally in prior survey cycles and highlighted significantly in Winter 2026 to reflect the technology's accelerating, if uneven, integration into eDiscovery workflows. The five GenAI pricing questions (Questions 18–22) were designed to capture not just price points but pricing model structures, exception handling practices, and the nascent development of outcome-based pricing — recognizing that practitioners at very different stages of AI adoption would be responding. Respondent Profile The survey received 53 completed responses. By business segment, law firms represented the largest cohort at 43.4% (23 respondents), followed by software and/or services providers at 24.5% (13), corporations at 15.1% (8), consultancies at 9.4% (5), and media, research, or educational organizations at 7.5% (4). By primary function, 67.9% (36) identified as legal/litigation support professionals, 26.4% (14) as business or business support functions, and 5.7% (3) as IT or product development.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Survey-Respondents-by-Organizational-Segment-Winter-2026.pdf" title="Survey Respondents by Organizational Segment - Winter 2026"]

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Survey-Respondents-by-Primary-Function-Winter-2026.pdf" title="Survey Respondents by Primary Function - Winter 2026"]

Geographically, the survey is overwhelmingly U.S.-centric: 92.5% of respondents (49) indicated North America – United States as their primary eDiscovery business geography, with the remaining 7.5% distributed across Europe (United Kingdom and non-UK) and Asia/Asia Pacific. This composition reflects the survey's community of practitioners and should be taken into account when applying results to non-U.S. markets.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Survey-Respondents-by-Geographic-Region-Winter-2026.pdf" title="Survey Respondents by Geographic Region - Winter 2026"]

The respondent pool's composition — heavily weighted toward legal practitioners with meaningful technology provider and in-house corporate representation — lends credibility to the pricing data for legal use cases while also surfacing supply-side perspectives from vendors who see pricing across many client engagements.

Section 1: Forensic Collection, Examination, and Testimony Pricing

Forensic collection and digital examination form the evidentiary foundation of eDiscovery. Unlike commoditized downstream services, forensic work depends on specialized expertise, defensible chain-of-custody protocols, and increasingly complex device environments. Mobile devices, cloud-linked data ecosystems, encrypted storage, and enterprise application footprints have expanded the examiner's scope considerably over the past several years, sustaining rate levels that resist the downward pressure more commoditized services face. Expert witness testimony sits at the highest value tier of forensic work — where practitioner credentials, courtroom experience, and legal exposure command significant premium pricing. Q1 & Q2 — Per Hour Cost for Onsite and Remote Collection The $250–$350 per hour range is the clear market anchor for forensic collection, cited by 56.6% of respondents for both onsite and remote collection. However, the distributions diverge meaningfully at the premium tier: 20.8% of respondents report onsite collection rates exceeding $350 per hour, compared to just 5.7% for remote. Conversely, remote collection skews lower — 18.9% report sub-$250 rates for remote work, versus only 5.7% for onsite. This onsite premium reflects real cost structures: travel, physical access logistics, on-premises security requirements, and the coordination burden of collecting in active enterprise environments. The growth of remote forensic collection tools — driven in part by pandemic-era necessity and now institutionalized in many engagements — has introduced competitive downward pressure on remote rates that onsite services do not face to the same degree. Four respondents (7.5%) indicate alternative pricing models for remote collection, suggesting some providers are moving toward flat-fee or subscription-based remote collection arrangements.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Collection-Pricing-Per-Hour-Cost-for-an-Onsite-Collection-by-a-Forensic-Examiner-Winter-2026-.pdf" title="Collection Pricing - Per Hour Cost for an Onsite Collection by a Forensic Examiner - Winter 2026"]

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Collection-Pricing-Per-Hour-Cost-for-a-Remote-Collection-by-a-Forensic-Examiner-Winter-2026.pdf" title="Collection Pricing - Per Hour Cost for a Remote Collection by a Forensic Examiner - Winter 2026"]

Q3 & Q4 — Per Device Cost for Desktop/Laptop and Mobile Device Collection Device-based pricing skews decisively to the upper tier: 50.9% of respondents report per-device costs exceeding $350 for desktop and laptop collections, and 49.1% report the same for mobile devices. The $250–$350 mid-range captures 18.9% for computers and 24.5% for mobile devices — the higher mobile representation in the mid-range may reflect lower-complexity or volume-based mobile collection engagements where physical access is easier and device configurations are more standardized. Perhaps most notable is the convergence of mobile and computer collection pricing at the upper tier. Mobile device collection — once considered simpler than computer collection due to smaller storage capacities — now commands comparable rates as encryption, cloud sync architectures, third-party application data, and ephemeral messaging platforms have substantially increased examiner effort and risk. Practitioners seeking to budget mobile collection as a lower-cost alternative to computer collection will increasingly find the market does not support that assumption.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Collection-Pricing-Per-Device-Cost-for-a-Desktop-Laptop-Computer-Collection-by-a-Forensic-Examiner-Winter-2026.pdf" title="Collection Pricing - Per Device Cost for a Desktop Laptop Computer Collection by a Forensic Examiner - Winter 2026"]

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Collection-Pricing-Per-Device-Cost-for-a-Mobile-Device-Collection-by-a-Forensic-Examiner-Winter-2026.pdf" title="Collection Pricing - Per Device Cost for a Mobile Device Collection by a Forensic Examiner - Winter 2026"]

Q5 — Per Hour Cost for Investigation, Analysis, and Report Generation Investigation, analysis, and report generation command a higher hourly rate floor than collection itself. More than half of respondents (54.7%) report rates in the $350–$550 range for this work, compared to the $250–$350 majority for collection. Only 30.2% report rates below $350 per hour for analysis, and 5.7% exceed $550. This premium reflects the cognitive and legal weight of analytical work. Forensic examiners producing reports that will be used in litigation, regulatory proceedings, or internal investigations are exercising expert judgment that creates professional liability — and the market prices that exposure accordingly. Practitioners purchasing forensic services should anticipate that billing rates will escalate from collection through analysis, often within the same engagement.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Collection-Pricing-Per-Hour-Cost-for-Investigation-Analysis-and-Report-Generation-by-an-FE-Winter-2026.pdf" title="Collection Pricing - Per Hour Cost for Investigation Analysis and Report Generation by an FE - Winter 2026"]

Q6 — Per Hour Cost for Expert Witness Testimony Expert witness testimony carries the highest rate profile in the forensic pricing group. While 47.2% report testimony rates in the $350–$550 range — consistent with analysis rates — a notable 26.4% report rates exceeding $550 per hour, the highest proportion in any >$550 category across the survey. The elevated 'do not know' response rate (20.8%) likely reflects that many practitioners engage forensic examiners for collection and analysis but not testimony, creating a meaningful gap in their pricing awareness for this segment. Expert witness rates are driven by factors beyond standard hourly billing — including the examiner's track record, publication history, geographic availability, and the complexity of the matter at issue. The wide distribution, from below $350 to above $550, reflects a market where individual credentials create significant pricing dispersion.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Collection-Pricing-Per-Hour-Cost-for-Expert-Witness-Testimony-In-Person-and-Written-by-an-FE-Winter-2026.pdf" title="Collection Pricing - Per Hour Cost for Expert Witness Testimony (In-Person and Written) by an FE - Winter 2026"]

Analyst Observation — Forensic Collection & Examination The forensic pricing landscape shows a well-established rate structure for collection and a predictable escalation through analysis to testimony. The $250–$350 range for collection hours serves as a reliable negotiation baseline. The key risk for buyers is underbudgeting for analysis and testimony phases — where rates routinely exceed $350/hour and frequently surpass $550. Practitioners with active litigation portfolios should establish explicit rate schedules with forensic vendors for all service tiers at engagement outset, not just collection. Key Takeaways — Section 1

$250–$350/hour is the market anchor for both onsite and remote forensic collection (56.6% each).
Onsite collection carries a measurable premium: 20.8% report >$350/hour vs. 5.7% for remote.
Mobile device collection rates have converged with computer collection at the upper tier (both ~50% report >$350/device).
Investigation, analysis, and report generation rates escalate to $350–$550/hour for 54.7% of respondents.
Expert witness testimony exceeds $550/hour for 26.4% — the highest proportion across all survey categories.

Section 2: Data Processing, Hosting, and Project Management Pricing

Data processing and hosting represent the operational infrastructure of eDiscovery delivery. Processing — transforming raw electronically stored information (ESI) into a reviewable format — has historically been a significant cost driver in large matters. Hosting provides the platform on which review takes place. Both categories have experienced significant commoditization pressure from cloud infrastructure economics, but the emergence of AI-driven early culling and processing tools is beginning to reshape volume dynamics in ways that affect both pricing and billing model design. Q7 & Q8 — Per GB Cost to Process ESI at Ingestion and at Completion Processing pricing at ingestion is relatively compressed: 39.6% of respondents report rates in the $25–$75 per GB range, and 34.0% report rates below $25 per GB. A significant 18.9% indicate alternative pricing models, reflecting the market's movement away from traditional per-GB ingestion billing. Processing pricing at completion of processing tells a different story. The most commonly reported range shifts to 'less than $100 per GB' (37.7%), and the proportion reporting alternative pricing models rises to 22.6%. Another 15.1% report $100–$150 per GB at completion, and 9.4% exceed $150 per GB. The jump from ingestion to completion reflects the data expansion and enrichment that occurs through native processing, deduplication, OCR, and promotion — processes that substantially increase the per-GB cost basis for providers. One respondent offered a methodologically important observation worth acknowledging directly: the survey's two-question processing model may conflate two distinct industry billing philosophies — an 'all-in' per-GB rate that covers ingestion through promotion, versus a staged model with separate per-GB charges for ingestion and native processing or promotion to review. This is a legitimate distinction, and practitioners benchmarking against these results should clarify which model their vendor employs. Future survey iterations will address this more precisely.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Processing-Pricing-Per-GB-Cost-to-Process-ESI-Based-on-Volume-at-Ingestion-Winter-2026.pdf" title="Processing Pricing - Per GB Cost to Process ESI Based on Volume at Ingestion - Winter 2026"]

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Processing-Pricing-Per-GB-Cost-to-Process-ESI-Based-on-Volume-at-Completion-Winter-2026.pdf" title="Processing Pricing - Per GB Cost to Process ESI Based on Volume at Completion - Winter 2026"]

Q9 & Q10 — Per GB Per Month Cost to Host ESI Without and With Analytics Data hosting without analytics has substantially commoditized. More than half of respondents (54.7%) report hosting rates below $10 per GB per month, and another 30.2% fall in the $10–$20 range. Less than 2% report rates exceeding $20 per GB per month. This distribution reflects years of cloud infrastructure cost reduction passed through to buyers, as major platform providers compete on storage economics. Analytics-enabled hosting shows a wider and higher distribution. While 43.4% report rates below $15 per GB per month with analytics, 32.1% fall in the $15–$25 range, and 11.3% exceed $25 per GB per month. The premium for analytics-capable hosting reflects platform differentiation: vendors with mature AI search, conceptual clustering, visualization tools, and review workflow automation can sustain higher rates. Undifferentiated platforms — those competing primarily on storage price — face continued downward pressure as infrastructure costs decline. One respondent's comment corroborates this trajectory directly, observing that while overall eDiscovery pricing has been stable, technology costs specifically appear to be coming down — a signal consistent with the commoditization pattern visible in the hosting data.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Processing-Pricing-Per-GB-Cost-Per-Month-to-Host-ESI-without-Analytics-Winter-2026.pdf" title="Processing Pricing - Per GB Cost Per Month to Host ESI without Analytics - Winter 2026"]

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Processing-Pricing-Per-GB-Cost-Per-Month-to-Host-ESI-with-Analytics-Winter-2026.pdf" title="Processing Pricing - Per GB Cost Per Month to Host ESI with Analytics - Winter 2026"]

Q11 — User License Fee Per Month for Access to Hosted Data User licensing is in an active state of structural transition. The $50–$100 per user per month range is the most frequently cited (41.5%), but a striking 34.0% of respondents report alternative pricing models — the highest alternative-model proportion among any category in the processing and hosting section. Only 17.0% report rates below $50 per user per month. The high alternative-model rate reflects a market shift away from traditional per-seat licensing toward enterprise agreements, volume tiers, and managed service arrangements that bundle access costs into broader contract structures. For corporate legal departments and law firms managing multi-matter eDiscovery portfolios, these bundled arrangements restructure cost visibility: per-matter spend attribution becomes less granular, which may simplify budgeting at the portfolio level but reduces transparency at the individual matter level. Whether bundled arrangements represent a net financial advantage depends on volume, negotiated terms, and how closely actual usage tracks the contracted scope — variables the survey does not measure.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Processing-Pricing-User-License-Fee-Per-Month-for-Access-to-Hosted-Data-Winter-2026.pdf" title="Processing Pricing - User License Fee Per Month for Access to Hosted Data - Winter 2026"]

Q12 — Per Hour Cost of Project Management Support for eDiscovery Project management pricing is the most consistent and well-understood category in the processing and hosting group. More than half of respondents (52.8%) report rates in the $100–$200 per hour range, and 26.4% report rates exceeding $200 per hour. The low 'do not know' rate (5.7%) — tied with Q9 for the lowest across all Section 2 questions — indicates that PM pricing is well understood by practitioners and regularly visible in vendor proposals. The 26.4% reporting greater than $200 per hour for project management likely reflects the growing complexity of modern eDiscovery engagements. Today's project managers must coordinate across AI review platforms, multiple review vendor relationships, technical review workflows, and real-time quality control functions — a scope considerably broader than the data management and platform coordination role the title suggested in prior market iterations.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Processing-Pricing-Per-Hour-Cost-of-Project-Management-Support-for-eDiscovery-Winter-2026.pdf" title="Processing Pricing - Per Hour Cost of Project Management Support for eDiscovery - Winter 2026"]

Analyst Observation — Processing, Hosting & Project Management Processing pricing is bifurcating: per-GB billing at ingestion remains common, but completion-phase and analytics-related pricing is shifting toward bundled and alternative models. Practitioners anchored to traditional per-GB benchmarks for TAR, analytics hosting, or managed service arrangements may be negotiating based on outdated frameworks. Hosting has genuinely commoditized at the infrastructure level — the pricing action now lives in analytics differentiation layered above the storage tier. Key Takeaways — Section 2

Processing at ingestion is largely below $75/GB (73.6% combined), but completion-phase pricing climbs with 24.5% reporting $100/GB or more.
Alternative pricing models account for 18.9% at ingestion and 22.6% at completion — signaling a structural shift away from per-GB processing billing.
Basic hosting has commoditized: 54.7% report sub-$10/GB/month. Analytics hosting retains differentiation with 11.3% exceeding $25/GB/month.
User licensing is migrating from per-seat to bundled models — 34.0% report alternative pricing structures.
Project management rates are well understood and rising: 26.4% now exceed $200/hour, reflecting growing engagement complexity.

Section 3: Document Review Pricing

Document review sits at the commercial center of most eDiscovery engagements. It is the largest cost driver in complex litigation, the primary arena in which human expertise meets technology leverage, and the category most directly disrupted by the emergence of GenAI-assisted review. Pricing in this section spans both hourly attorney rates (the traditional billing model) and per-document rates (a model that has gained traction as technology-assisted review has enabled higher throughput). The data in this section provides critical context for interpreting the GenAI pricing data that follows in Section 4. Q13 — Per GB Cost for Predictive Coding / Technology-Assisted Review Predictive coding and technology-assisted review (TAR) pricing has largely migrated away from per-GB billing. The highest single response category (35.8%) is 'alternative pricing model' — the highest alternative-model proportion of any per-GB question in the survey. Among those who do provide per-GB TAR pricing, 30.2% report rates below $75 per GB, 13.2% report $75–$150 per GB, and only 1.9% exceed $150 per GB. The 18.9% 'do not know' rate for TAR pricing suggests that many practitioners receive predictive coding as an embedded capability within their review platform subscription rather than a separately line-itemed service. This bundling trend, combined with the high alternative-model rate, indicates that standalone per-GB TAR billing is becoming the exception rather than the rule as platforms integrate AI-driven prioritization into standard hosting fees.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Review-Pricing-Per-GB-Cost-for-Predictive-Coding-in-a-Technology-Assisted-Review-Winter-2026.pdf" title="Review Pricing - Per GB Cost for Predictive Coding in a Technology-Assisted Review - Winter 2026"]

Q14 & Q15 — Per Hour Cost for Onsite and Remote Managed Review Attorneys Hourly managed review attorney rates are well understood and show a consistent onsite premium over remote delivery. For onsite review, 45.3% of respondents report rates exceeding $40 per hour, and 32.1% report $25–$40 per hour. For remote review, the distribution shifts: 41.5% report $25–$40 per hour, and 35.8% report greater than $40 per hour. The onsite premium reflects overhead recovery for physical review facilities, security infrastructure, and on-site supervision costs. Despite the normalization of remote review following the pandemic era, onsite review commands a persistent rate premium that clients with physical review requirements should anticipate. The relatively high 'do not know' rates for both onsite (18.9%) and remote (17.0%) suggest that many practitioners engage review vendors without direct visibility into the underlying attorney billing rates — a transparency gap that can make accurate matter budgeting difficult.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Review-Pricing-Per-Hour-Cost-for-Document-Review-Attorneys-to-Review-Documents-Onsite-Winter-2026.pdf" title="Review Pricing - Per Hour Cost for Document Review Attorneys to Review Documents Onsite - Winter 2026"]

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Review-Pricing-Per-Hour-Cost-for-Document-Review-Attorneys-to-Review-Documents-Remote-Winter-2026.pdf" title="Review Pricing - Per Hour Cost for Document Review Attorneys to Review Documents Remote - Winter 2026"]

Q16 & Q17 — Cost Per Document for Onsite and Remote Managed Review Per-document billing for human document review carries significant uncertainty across the respondent pool. For onsite per-document review, 34.0% of respondents indicate they do not know the cost — the highest 'do not know' rate among all document review questions. For remote per-document review, 30.2% report not knowing. Among those with visibility, the $0.50–$1.00 per document range dominates for both onsite (30.2%) and remote (28.3%) delivery, with onsite showing a higher proportion of rates exceeding $1.00 per document (22.6% vs. 18.9% remote). Remote per-document review trends lower at the bottom of the range: 13.2% report sub-$0.50 rates for remote work versus only 3.8% for onsite. This directional difference is consistent with lower overhead costs in remote delivery environments. In this analyst's view, where the $0.50–$1.00 per-document rate for human review meets GenAI-assisted pricing in the $0.11–$0.50 range, the economic case for AI-assisted review becomes direct — provided quality and defensibility standards are met. The per-document rate distribution for human review is strategically important as a baseline against which GenAI-assisted review pricing should be evaluated. Where human review rates run $0.50–$1.00 per document and GenAI-assisted alternatives are priced in the $0.11–$0.50 range, the cost differential is substantial enough to drive adoption decisions — though the economic case ultimately depends on matter-specific quality thresholds and the degree to which AI exception handling costs are controlled.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Review-Pricing-Per-Document-Cost-for-Document-Review-Attorneys-to-Review-Documents-Onsite-Winter-2026.pdf" title="Review Pricing - Per Document Cost for Document Review Attorneys to Review Documents Onsite - Winter 2026"]

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Review-Pricing-Per-Document-Cost-for-Document-Review-Attorneys-to-Review-Documents-Remote-Winter-2026.pdf" title="Review Pricing - Per Document Cost for Document Review Attorneys to Review Documents Remote - Winter 2026"]

Analyst Observation — Document Review Traditional document review rates have held relatively stable, but the market's increasing inability to articulate per-document pricing — particularly for onsite review — signals a structural shift away from document-count-based billing toward time-based models that are less directly comparable to AI-assisted pricing. Practitioners should push for per-document rate transparency in vendor proposals to enable genuine cost modeling against AI alternatives. Key Takeaways — Section 3

TAR/predictive coding billing is migrating away from per-GB models: 35.8% report alternative pricing, 18.9% don't know — bundled platform pricing is absorbing this cost.
Onsite managed review attorney rates exceed $40/hour for 45.3% of respondents vs. 35.8% for remote — the onsite premium persists.
Per-document review rates cluster in the $0.50–$1.00 range for both onsite and remote, with significant 'do not know' responses (34% onsite, 30.2% remote) indicating a transparency gap.
The $0.50–$1.00 per-document human review baseline sets up direct economic competition with emerging GenAI-assisted review pricing.

Section 4: GenAI-Assisted Review Pricing

The Winter 2026 survey's GenAI section was designed to illuminate where pricing clarity exists, where models are still fluid, and where the industry is beginning to form conventions around AI-assisted document review. What the results reveal is not a uniformly mature market but a bifurcated one: a segment of practitioners actively deploying and pricing GenAI review, and a substantial minority — 17.0% reporting it as not applicable or unknown — who have not yet engaged with it at a pricing level. Both cohorts are represented in the data, and the analysis in this section is relevant to each in different ways. This is not surprising. GenAI-assisted review introduces fundamentally different cost economics than traditional review: provider costs are driven by token consumption, GPU infrastructure, and model licensing — not attorney hours. Translating those costs into buyer-facing pricing structures that are transparent, predictable, and defensible has proven more difficult than the technology adoption itself. Q18 — Primary Model for GenAI-Assisted Review The two leading GenAI pricing models are effectively tied: hybrid pricing (combinations of multiple models) and per-document billing each account for 28.3% of primary model responses (15 respondents each). Per-GB billing captures 11.3%, per-token billing 5.7%, flat monthly subscription 5.7%, and outcome-based pricing 3.8%. Notably, 17.0% report that GenAI-assisted review pricing is not applicable or unknown to them — suggesting a meaningful share of the practitioner community has not yet engaged with AI review at a pricing level. The dominance of hybrid models reflects the reality that many providers are constructing bespoke proposals that combine per-document minimums, per-GB infrastructure charges, and platform subscription components. This complexity makes apples-to-apples comparison difficult for buyers — and may be intentional. Per-document pricing's co-equal standing with hybrid models suggests that a document-level unit of value is widely accepted as a conceptual billing anchor, even when the final structure is more complex. One respondent's comment illustrates the breadth of emerging structures not fully captured by the five survey model options: some providers are pricing GenAI review as an hourly professional service — with consultants performing query engineering, model interaction, and attorney collaboration — billed at standard hourly rates with per-matter minimums and not-to-exceed caps. This hourly professional service model sits outside the per-document or per-GB frameworks the market most commonly discusses, and its presence signals that GenAI pricing model diversity is wider than any single survey's categories can fully contain. Per-token pricing — the underlying cost reality for large language model deployments — has not been widely passed through to buyers (5.7%). This indicates that providers are currently absorbing token cost variability and presenting buyers with higher-order pricing units. As token costs evolve with model efficiency improvements, the degree to which providers pass these economics through will be an important market dynamic to watch.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/03/Review-Pricing-Primary-Model-for-Gen-AI-Assisted-Review-in-eDiscovery-Winter-2026.pdf" title="Review Pricing - Primary Model for Gen AI-Assisted Review in eDiscovery - Winter 2026"]

Q19 — Average Cost Per Document for GenAI-Assisted Review (Per-Document Model) Among all survey respondents, the $0.26–$0.50 per-document tier is the most frequently cited GenAI price point (20.8%), followed by both the $0.11–$0.25 and $0.05–$0.10 ranges (15.1% each). Seven and a half percent report per-document GenAI rates exceeding $0.50, and 5.7% report rates below $0.05. A significant 35.8% indicate this pricing model is not applicable to them or that they do not know the cost. The broad distribution among those with pricing visibility — from under a nickel to over fifty cents per document — reflects the wide variance in task complexity, model selection, and quality control overhead that different GenAI review implementations involve. The $0.11–$0.50 range represents the most commercially active zone. At the lower end, GenAI review offers compelling cost efficiency relative to the $0.50–$1.00 range for human per-document review. At the upper end of GenAI pricing (>$0.50), the value proposition requires stronger justification — particularly around accuracy, speed, or reduced downstream review burden. Practitioners should push vendors for specificity on what the per-document fee includes: model inference costs alone, or QC, exception handling, and reporting as well.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/03/Review-Pricing-Average-Cost-Per-Document-in-Per-Document-Model-of-Gen-AI-Assisted-Review-Winter-2026.pdf" title="Review Pricing - Average Cost Per Document in Per Document Model of Gen AI-Assisted Review - Winter 2026"]

Q20 — Average Cost Range for GenAI-Assisted Review (Per-GB Model) Per-GB GenAI pricing is less prevalent in practice — 64.2% of respondents indicate this model is not applicable or unknown. Among those who do report per-GB GenAI pricing, the $25–$50 per GB range is most common (17.0%), followed by below $25 per GB (13.2%). Two respondents (3.8%) report rates exceeding $100 per GB for GenAI review — likely representing specialized, computationally intensive analytical workflows rather than standard review acceleration. Given that data processing at ingestion typically falls below $75 per GB, a per-GB GenAI review charge layered on top represents a meaningful incremental cost. Practitioners evaluating per-GB GenAI pricing should model total matter economics carefully, including whether early data culling through AI reduces the volume that reaches review — potentially offsetting the per-GB GenAI charge with reduced processing and hosting costs downstream.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/03/Review-Pricing-Average-Cost-Range-Per-GB-in-Per-GB-Model-of-Gen-AI-Assisted-Review-Winter-2026.pdf" title="Review Pricing - Average Cost Range Per GB in Per GB Model of Gen AI-Assisted Review - Winter 2026"]

Q21 — Outcome-Based Pricing Structure for GenAI-Assisted Review Outcome-based pricing for GenAI review remains largely theoretical in the current market: 79.2% of respondents report no applicable experience with it. Among the minority with exposure, custom agreements dominate (9.4%), with small numbers reporting tiered pricing based on review speed improvements (3.8%), fixed fees based on achieved accuracy rates (3.8%), a combination of performance metrics (1.9%), and percentage of cost savings compared to traditional review (1.9%). The theoretical appeal of outcome-based pricing is clear — it aligns provider incentives with client results and distributes AI benefit-sharing in a transparent way. The operational mechanisms, however, remain underdeveloped. Defining accuracy baselines, attributing speed gains to AI versus staffing decisions, and calculating savings against hypothetical traditional review costs are all methodologically complex. The custom-agreement dominance (9.4%) reflects that outcome-based structures, where they exist, are negotiated on a bespoke basis without market-standardized frameworks. In this analyst's view, this is an area where the industry is likely to see active experimentation and standardization attempts in coming survey cycles — though the timeline will depend on how quickly buyers begin demanding performance accountability in AI review contracts.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/03/Review-Pricing-Typical-Structure-of-Outcome-Based-Pricing-Models-in-Gen-AI-Assisted-Review-Winter-2026.pdf" title="Review Pricing - Typical Structure of Outcome-Based Pricing Models in Gen AI-Assisted Review - Winter 2026"]

Q22 — How Pricing Models Handle Failed or Exception Documents in GenAI Review Exception document handling — documents that fail AI processing or require human intervention — is a practical and financially significant issue that is significantly underappreciated in headline GenAI pricing discussions. Nearly 40% of respondents (39.6%) cannot speak to how their contracts address this scenario. Among those with visibility, no single approach dominates: 18.9% report that exception documents route to manual review at standard rates; 17.0% say handling depends on the specific issue encountered; 9.4% each report that exceptions are charged as additional processing time or included in the base price (no additional charge); and 5.7% report per-document exception billing. The variability of exception handling approaches — and the high proportion of respondents with no visibility — represents a meaningful contract risk for buyers. In matters where a significant share of documents require human intervention, the effective cost of a GenAI-assisted review engagement can increase substantially depending on which exception pricing structure applies. Buyers negotiating GenAI review engagements should require explicit exception handling clauses that specify the triggering conditions, billing treatment, and quality control obligations for documents that exit the AI workflow.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/03/Review-Pricing-Accounting-for-Docs-That-Fail-To-Process-or-Require-Special-Handing-Gen-AI-Winter-2026.pdf" title="Review Pricing - Accounting for Docs That Fail To Process or Require Special Handing (Gen AI) - Winter 2026"]

Analyst Observation — GenAI-Assisted Review The GenAI pricing market is operationally engaged but structurally immature. The concentration in hybrid and per-document models reflects practitioners and providers reaching for familiar pricing analogues while the technology matures. The $0.11–$0.50 per-document zone is emerging as a competitive market range — one that creates genuine economic pressure on traditional human review for appropriate document populations. The most important near-term challenge for the market is not the headline per-document or per-GB rate, but the hidden cost variables: exception document handling, quality control overhead, model retraining requirements, and the total cost of ownership of integrating GenAI review into existing workflows. One survey respondent offered a perspective worth placing on record: many vendors are still determining their AI pricing strategies, rushing to market to capture first-mover advantage or market share — and that token-based pricing pressures may cause AI solution costs to increase materially in the future, absent significant reductions in GPU infrastructure costs. This caution deserves attention as buyers evaluate multi-year GenAI review commitments. Key Takeaways — Section 4

Hybrid and per-document models are the dominant GenAI pricing structures, each at 28.3% — the market has converged on document-level units but not uniform delivery structures.
The $0.11–$0.50 per-document range is the emerging competitive zone for GenAI-assisted review, with direct economic implications for traditional human review.
Per-token pricing has not been widely passed to buyers (5.7%) — providers are absorbing LLM cost variability for now.
Outcome-based GenAI pricing is theoretically compelling but operationally undeveloped; 79.2% of respondents have no applicable experience.
Exception document handling is an underappreciated contract risk: 39.6% don't know how their agreements address it, and no standard approach has emerged.

Conclusion and Strategic Implications

The Winter 2026 eDiscovery Pricing Survey paints a picture of a market undergoing layered transitions simultaneously: forensic services have found stable pricing floors; processing and hosting have bifurcated between commoditized infrastructure and differentiated analytics tiers; document review is experiencing pricing model fragmentation as AI alternatives create new economic reference points; and GenAI-assisted review is operationally deployed but commercially immature in its pricing structures. For eDiscovery Buyers and Legal Operations Professionals The $250–$350 per hour range for forensic collection provides a reliable negotiation baseline, but buyers should build explicit rate schedules covering analysis and testimony phases — where rates routinely exceed $350 and frequently surpass $550 per hour. Processing and hosting negotiations should move beyond per-GB benchmarks for analytics-enabled and TAR-related services, where bundled models increasingly dominate. For document review, the critical action item is requiring per-document rate transparency even when hourly billing is the primary model — enabling genuine cost modeling against AI review alternatives. Corporate legal operations professionals face a distinct version of these challenges. Unlike law firms that pass eDiscovery costs to clients, in-house legal departments absorb them entirely — making pricing transparency a budget integrity issue, not just a negotiation tactic. The hosting commoditization finding (54.7% below $10/GB/month for basic hosting) and the user licensing transition (34.0% of respondents on alternative models) both represent leverage points in enterprise vendor negotiations that legal operations teams can use directly. The project management escalation finding (26.4% above $200/hour) warrants particular attention for in-house teams managing multi-matter portfolios: as PM rates rise with engagement complexity, the cost of inadequate internal scoping and vendor coordination compounds. Corporate legal operations teams are well-positioned to offset this by investing in internal eDiscovery program management capability rather than outsourcing all coordination to vendor project managers at premium rates. For GenAI-assisted review engagements, two contractual priorities stand out: first, obtain explicit pricing for exception documents rather than accepting provider discretion; second, require specificity on what is included in per-document or per-GB GenAI rates to enable accurate total-cost modeling. The $0.11–$0.50 per-document range is commercially viable for appropriate document populations, but hidden costs can erode that advantage quickly if not addressed in the agreement. For eDiscovery Service Providers and Technology Vendors The survey data confirms that buyers are engaging with GenAI pricing at a level of sophistication that requires providers to move beyond introductory pricing structures. The dominance of hybrid models reflects buyer uncertainty as much as provider flexibility — and that uncertainty is not sustainable as GenAI review becomes a standard engagement component rather than a premium add-on. Providers who develop clear, reproducible pricing structures with transparent exception handling will differentiate themselves in a market where 39.6% of buyers currently report no visibility into this critical cost variable. The trajectory of outcome-based pricing deserves attention. While only a small minority of respondents currently have exposure to these models, the direction of the market — toward accountability for AI review quality, not just delivery — suggests that providers who invest in outcome measurement frameworks now will be better positioned as client sophistication increases.

Looking Ahead: Open Questions for the Evolving eDiscovery Pricing Landscape

Several questions worth watching in future survey cycles: Will per-token pricing migrate from provider cost basis to buyer-facing billing as LLM economics become more visible? Will outcome-based pricing develop standardized frameworks, or remain bespoke indefinitely? Will the onsite/remote premium for forensic collection and attorney review compress as remote delivery tools mature further? And will the exception document handling gap in GenAI contracts become a litigation issue that forces market standardization? The Pricing Pulse series will continue to track these dynamics. The Winter 2026 results establish a pricing baseline at a pivotal moment — one that future surveys will be measured against as generative AI transforms both the economics and the practice of eDiscovery.

Research Methodology Note

The Winter 2026 eDiscovery Pricing Survey was designed and administered by ComplexDiscovery OÜ in partnership with the Electronic Discovery Reference Model (EDRM) as part of the Pricing Pulse research series. The survey was conducted via an online form distributed through ComplexDiscovery's professional community and partner networks. The survey period ran from December 2025 through February 2026, with the data collection window closing upon reaching the final respondent cohort of 53 individuals. The survey comprised 25 pricing questions organized across four service categories — forensic collection and examination, data processing and hosting, document review, and GenAI-assisted review — plus three respondent classification questions addressing geography, business segment, and primary function. Response options were structured as defined ranges rather than open-ended numeric inputs to facilitate comparative analysis and protect respondent pricing confidentiality. All responses represent self-reported market observations and practitioner experience. Results should be interpreted as directional market intelligence reflecting current practitioner perceptions of prevailing pricing, not as verified transaction records or audited benchmarks. The U.S.-centric geographic distribution (92.5%) should be taken into account when applying findings to non-U.S. markets. ComplexDiscovery OÜ maintains editorial independence in the analysis and publication of survey results. Individual respondent data is treated as confidential; only aggregated findings are reported. ComplexDiscovery and the Electronic Discovery Reference Model (EDRM) thank the 53 practitioners and professionals who contributed their time and market knowledge to this research. Organizations and individuals interested in participating in future Pricing Pulse surveys are encouraged to connect with ComplexDiscovery at complexdiscovery.com. © 2026 ComplexDiscovery OÜ. All rights reserved. Published on ComplexDiscovery.com. Conducted in partnership with the Electronic Discovery Reference Model (EDRM). The Pricing Pulse is an ongoing research series examining pricing dynamics across the eDiscovery market. News Source

Rob Robinson and Holley Robinson, ComplexDiscovery OÜ, "Winter 2026 eDiscovery Pricing Survey," February 2026.

[the_ad_group id="12741"]

Assisted by GAI and LLM Technologies Additional Reading

Source: ComplexDiscovery OÜ

ComplexDiscovery’s mission is to enable clarity for complex decisions by providing independent, data‑driven reporting, research, and commentary that make digital risk, legal technology, and regulatory change more legible for practitioners, policymakers, and business leaders.